Reads with highly repetitive sequences not mapped by STAR but mapped by Salmon #1283

red-plant · 2021-06-29T19:26:41Z

Dear Dr. Dobin,

I am working with a sample of RNAseq from Arabidopsis' sperm-cells (SRR7945266 to SRR7945268). This same data was extremely slow to map with Salmon, but Dr. Patro kindly fixed it (COMBINE-lab/salmon#527) and maps around 85%. Now, I need to map to the genome, I trying tweaking the seeding parameters to no avail, I get over 85% on "too short" I'm assuming the aligner is having a similar problem.

Do you think this could be solved in STAR? I'm attaching a few thousand reads of the first library in case they're of any use.
sub1.fq.gz
sub2.fq.gz

Thanks
-José

red-plant · 2021-06-29T22:17:48Z

Hmm, turns out trimming with fastp -3 -4 -x solved the issue. Sorry again. Never faced such a problem with STAR, to which I usually feed untrimmed reads and works slightly better.

alexdobin · 2021-06-30T17:32:16Z

Hi José,

if the fragment insert size is < PE read length, the trimming is important, as the mappable portions of the reads are small and thus they cannot pass the filtering.

Cheers
Alex

red-plant closed this as completed Jun 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reads with highly repetitive sequences not mapped by STAR but mapped by Salmon #1283

Reads with highly repetitive sequences not mapped by STAR but mapped by Salmon #1283

red-plant commented Jun 29, 2021

red-plant commented Jun 29, 2021

alexdobin commented Jun 30, 2021

Reads with highly repetitive sequences not mapped by STAR but mapped by Salmon #1283

Reads with highly repetitive sequences not mapped by STAR but mapped by Salmon #1283

Comments

red-plant commented Jun 29, 2021

red-plant commented Jun 29, 2021

alexdobin commented Jun 30, 2021