Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reads with highly repetitive sequences not mapped by STAR but mapped by Salmon #1283

Closed
red-plant opened this issue Jun 29, 2021 · 2 comments

Comments

@red-plant
Copy link

Dear Dr. Dobin,

I am working with a sample of RNAseq from Arabidopsis' sperm-cells (SRR7945266 to SRR7945268). This same data was extremely slow to map with Salmon, but Dr. Patro kindly fixed it (COMBINE-lab/salmon#527) and maps around 85%. Now, I need to map to the genome, I trying tweaking the seeding parameters to no avail, I get over 85% on "too short" I'm assuming the aligner is having a similar problem.

Do you think this could be solved in STAR? I'm attaching a few thousand reads of the first library in case they're of any use.
sub1.fq.gz
sub2.fq.gz

Thanks
-José

@red-plant
Copy link
Author

Hmm, turns out trimming with fastp -3 -4 -x solved the issue. Sorry again. Never faced such a problem with STAR, to which I usually feed untrimmed reads and works slightly better.

@alexdobin
Copy link
Owner

Hi José,

if the fragment insert size is < PE read length, the trimming is important, as the mappable portions of the reads are small and thus they cannot pass the filtering.

Cheers
Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants