-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large number of genomes #76
Comments
For the first, you can easily turn off screen output by redirecting stderr log. Just append For the second, the memory usage is proportional to the total size of references provided. Your run could be running out of memory. To resolve this, you can run the job in batches (of say 5000 reference genomes) by using a bash script. You can use this script if you want. If you have a cluster, you could also parallelise these batches across multiple compute nodes. I don't think this is happening due to any UNIX limits. |
Thanks for this. I did the batching myself, actually, and it works. The reason I did not think of this as a memory problem is that there was no "out of memory" message related to this termination. This is usually the case on the cluster. |
I can understand keeping the references all in memory when using a list of queries and a list of references (although it will limit scaling it does enable your threading support, see #140), but why does fastANI appear to keep all the references in memory even if used with a single query? This feels like a bug, or at very least undesirable behavior. |
First, a suggestion: It would be very helpful to be able to turn off the screen output. We use fastANI with a single query genome against a long list (thousands) of reference genomes (--refList option) and listing thousands of filenames each time is annoying and rather useless.
But, the main problem lies in our observation that listing 30 000+ files and provide it as input using --refList results in fastANI not producing any output! There is no error message, it starts as before, but looks like the it just gives up, and finishes without producing output. I have, by experimenting, found that 10 000 files works fine. I know several UNIX programs have a limit on how long a commandline may be. Is this the reason? I run this on an HPC, and allocate 99GB for this job. It doesn't look to me like a memory problem...?
The text was updated successfully, but these errors were encountered: