-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Much slower than previous versions #527
Comments
FInally, it took like 25 hours, I am attaching the log. |
Thanks for the detailed report. Has anything else changed on the system since you upgraded? Does this same dataset finish much faster with 1.0.0? There have been no large-scale changes to alignment since 1.0.0 that should affect the runtime, so this is definitely completely unexpected behavior. We will also try to reproduce locally on this data. |
I will try to reproduce running on V1.0.0, will come back with reports. Should've done this first, apologies. |
Ok, salmon V1.0.0 finished in 5H 15 min, so about 5 times faster, the exact same library and parameters, and achieved almost the same mapping rate (85.1058% with V1.2.0 vs 84.6341% with V1.0.0) attaching log. I must add I did not trim this library for adapters nor quality, nor did anything to it. Just mapped as is. But fastQC showed excellent levels of quality even at the ends and no or minimal adapter content. |
Hi @red-plant, Thank you for providing the follow-up here. This behavior is very strange! However, I am not sure what is stranger; that 1.2.1 takes ~5x longer for you, or that 1.0.0 is taking > 5H in the first place! I can confirm that I am getting really strange (i.e. slow) behavior on this sample under both versions. It's unexpected because I've never seen a sample take this long before. We are looking into it, and will report back. However, since digging into the data to figure out precisely what is going on may take a little bit, I wanted to ask if you can try something. I think the issue may be resulting from some highly-expressed, "pathologically" repetitive sequence. Could you run salmon with the extra command line option Thanks! |
So, it took ~30 min (impressively fast) for V1.0.0 with |
It took ~1.5H for V1.2.1 with By the insert size determined by fastp I suspect there's quite a bit of dovetailing, I did had extremely slow performance with dovetailed libraries (for example SRR7945268, which is insert size 100, and its a PE 150 [not my data]) even allowing dovetails, to the point I ended up mapping them as single end and not using one of the pairs. Even then it took its time. Edit: allowing dovetails only increased the mapping rate by 0.0277%. As an additional note, not |
Hi @red-plant, So, I have some update from our end. @mohsenzakeri dug into the data a bit (specifically To test this hypothesis, we made a small change to the mapping algorithm to special case and ignore k-mers that are purely homopolymers. I'll note that in this data, this has no effect on the mapping rate. I get the following performance profile running the trimmed version of this data (having trimmed with
I was wondering if you might test this altered version out and see if it has a similarly beneficial effect for you as well. Probably, the time will be different, since the processors themselves are, and since I elided all non-essential flags here, but I would hope this version is faster than the current (even with the altered You can find a tarball with the pre-compiled binary here. It should work on any relatively recent linux system. |
Ran with only --validateMappings and -p 4 options. Wow, it took less than 15 minutes! Thanks a lot to you and your team for looking into this. |
Awesome! Thank you so much for the detailed report and for finding this data that exposed this strange (but interesting) performance case. We'll fold these improvements into the next release as well. |
Is the bug primarily related to salmon (bulk mode) or alevin (single-cell mode)?
salmon
Describe the bug
is been running like an order of magnitude slower than when I last used it
A clear and concise description of what the bug is.
So I am aligning reads against Arabidopsis Thaliana, using Araport 11 annotation. I usually had 20 M reads aligned within an hour or 2. I am aligning a 46M reads library and it has been running for 20 hours using 4 threads of my humble i5-3210M and is barely on 38.5M, after 20 hours! Index was constructed with default kmer size and no decoys. I have had this problem with other libraries since upgraded from V1.0.0,
To Reproduce
I guess just try to align stuff against araport11, this particular problem comes with any fq.gz. It will take hours and hours and hours to align.
Specifically, please provide at least the following information:
1.2.1
downloaded executable
Araport 11, from A. thaliana
regular fastq.gz ( SRR7985407)
--validateMappings
-p 4
--seqBias
--gcBias
--posBias
Expected behavior
Much faster alignment, it is Salmon !!
Screenshots
this is the run info so far:
Version Info: This is the most recent version of salmon.
salmon (mapping-based) v1.2.1
[ program ] => salmon
[ command ] => quant
[ index ] => { /home/jaimealaniz/Documents/indexes/salmon/ara11/ }
[ libType ] => { A }
[ mates1 ] => { SRR7985407_1.fq.gz }
[ mates2 ] => { SRR7985407_2.fq.gz }
[ validateMappings ] => { }
[ threads ] => { 4 }
[ seqBias ] => { }
[ gcBias ] => { }
[ posBias ] => { }
[ output ] => { /home/jaimealaniz/Documents/salmon.embryo/SRR7985407/ }
Logs will be written to /home/jaimealaniz/Documents/salmon.embryo/SRR7985407/logs
[2020-05-29 20:14:24.283] [jointLog] [info] setting maxHashResizeThreads to 4
[2020-05-29 20:14:24.283] [jointLog] [info] Fragment incompatibility prior below threshold. Incompatible fragments will be ignored.
[2020-05-29 20:14:24.283] [jointLog] [info] Usage of --validateMappings implies use of minScoreFraction. Since not explicitly specified, it is being set to 0.65
[2020-05-29 20:14:24.283] [jointLog] [info] Usage of --validateMappings implies a default consensus slack of 0.2. Setting consensusSlack to 0.35.
[2020-05-29 20:14:24.283] [jointLog] [info] parsing read library format
[2020-05-29 20:14:24.283] [jointLog] [info] There is 1 library.
[2020-05-29 20:14:24.341] [jointLog] [info] Loading pufferfish index
[2020-05-29 20:14:24.342] [jointLog] [info] Loading dense pufferfish index.
| Loading contig table | Time = 41.693 ms
size = 357712
| Loading contig offsets | Time = 657 us
| Loading reference lengths | Time = 202.87 us
| Loading mphf table | Time = 16.335 ms
size = 64786639
Number of ones: 357711
Number of ones per inventory item: 512
Inventory entries filled: 699
| Loading contig boundaries | Time = 154.69 ms
size = 64786639
| Loading sequence | Time = 6.3275 ms
size = 54055309
| Loading positions | Time = 68.544 ms
size = 74351806
| Loading reference sequence | Time = 7.0924 ms
| Loading reference accumulative lengths | Time = 325.83 us
[2020-05-29 20:14:24.638] [jointLog] [info] done
[2020-05-29 20:14:24.638] [jointLog] [info] Index contained 57,190 targets
[2020-05-29 20:14:25.202] [jointLog] [info] Number of decoys : 0
processed 38,500,000 fragmentsntLog] [info] Automatically detected most likely library type as IU
hits: 40,221,178, hits per frag: 1.04508
Desktop (please complete the following information):
Additional context
Previously used salmon on this laptop and it was much much faster, that'ts the main reason I use salmon.
The text was updated successfully, but these errors were encountered: