-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GTDB release 9 - RS220 - databases #3183
Comments
we're working on it! The only really trick part is naming the sketches. @bluegenes or maybe @ccbaumler could probably point you at the right code more easily than me; otherwise I'll track it down when I have time :) |
There have been quite a few developments with sourmash's database scripts recently. @bluegenes and I will be meeting Wednesday to discuss where we stand currently with database creation/updating. I'll add this to my notes. Should be able to create the updated release GTDB rather quickly all things considered. Check back in with this issue in a week's time if you don't hear anything. |
Update! We have a bug that we are working through. #3191 |
databases available here: #3246 |
Hi! I used the rs220 release to profile a bunch of metagenomes. In a set of samples, I found overall 8070 species, but only 6956 of them are in the lineage file (or the entire gtdb bac120_taxonomy file for that matter). The ones that are not are currently not listed as representative species in GTDB 220 (e.g. I got a hit for this one) Could there be genomes in the database that used to be reps in rs214 but aren't anymore? |
Update: we also need to update the links in the issue #3246 to the current database! Please hold! TL;DR - Two potential bugs with prefix and suffix of the idents. Two things may be at play here:
Following your link, I navigated to the representative sequence for here
Searching the actual databases:
Hope this adds some clarity. Please share an example and I can dig in more to your specific problem. |
The lineage file and databases for GTDB should now be in sync with this commit (sourmash-bio/database-releases@47cc397) I updated the scripts used to check the databases manifests for any updates as well as using the same genbank metadata to update the taxonomy file. I am running the scripts now and will then update the links with the new databases. |
I have completed and (hopefully) address the bugs noticed above. The final draft of the 220 release from GTDB is located Running a quick check to compare numbers:
@ctb Could you please replace the
I will being work on the protein databases now... sigh |
Hi, I was wondering if the gtdb reps 220 had been sketched (or is about to be)? Otherwise, would it be possible to provide/point to the command you use to do it? I imagine it's fairly straightforward, but in case there a some special considerations to obtain the same format as the ones you publish, I prefered asking.
cheers
The text was updated successfully, but these errors were encountered: