Adapter trimming in 10x Data and CellRanger Make Counts Expected Cells
I am working with 10x data for scRNA seq and I have encountered an issue while using Cellranger count after performing trimming of poly(A) tails and low-quality reads. I specifically trimmed the low-quality reads and poly(A) tails in the R2 reads using Cutadapt. During the Cellranger count step, I received an error message stating "FASTQ header mismatch detected at line 80 of input files." Now, I am unsure if it is necessary to trim poly(A) tails and low-quality reads for 10x data.
Interestingly, when I used the raw data without any trimming, the web summary from Cellranger count showed a compromised sample with curvy patterns, as interpreted by 10x.
This suggests that :" Round curve and lack of steep cliff may indicate low sample quality or loss of single-cell behavior. This can be due to a wetting failure, premature cell lysis, or low cell viability "
Has anyone else encountered a similar issue while using Cellranger count after trimming poly(A) tails and low-quality reads in 10x data? Is trimming necessary for 10x data, and if so, what could be causing the header mismatch error during the Cellranger count step?
Additionally, I'm interested in understanding the standard '—expected_cell' count parameter for Cell Ranger. It appears that many people set it to 10,000. Could you provide more insight on this?
Any insights, suggestions, or experiences related to this issue would be greatly appreciated.
Thank you in advance for your help!
I have never tried removing adapters before alignment for single cell RNA-seq, and the fact that a vanilla CellRanger run gives you a weird barcode/UMI plot points to some issues like the documentation says. Did you try running FASTQC on the raw data? If you do, do you see abnormally high adapter representation? I haven't tried it myself on single cell RNA-seq data, but this could be a good start.
Regarding the expected cell argument, it depends on how many cells you have isolated; there's advice out there saying that anything between 5,000 and 11,000 should be fine, as it is only a first rough cut-off for the empty droplet detection algorithm. If you read the information on cell calling on the 10X genomics website there are some discussions on how older versions of CellRanger (2.0 and before) apparently under-estimated cell numbers, and the newer version of CellRanger (v 7.0) allows you to let the pipeline set this number automatically.