Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to save and reload index table #125

Open
sysimm opened this issue Dec 8, 2021 · 0 comments
Open

Option to save and reload index table #125

sysimm opened this issue Dec 8, 2021 · 0 comments

Comments

@sysimm
Copy link

sysimm commented Dec 8, 2021

Hello,
I was wondering if it's possible to save the index table with the k-mers generated from input sequences to disk and later retrieve it, in order to speed up clustering. My idea is to do this for large datasets, using cdhit-2d: one input dataset would be provided by the user (i.e. the index table would always be computed on the fly) and the other would come from a prepared selection of datasets. For the latter, I would like to precompute index tables to speed up the overall comparison. I don't know how much of the total runtime is spent creating the index tables but I would imagine it to be considerable for large datasets. Please correct me if I'm wrong.
Please advise if this is possible at all or can be somehow done by tweaking the code.
Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant