-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
running a gubaphage query #40
Comments
Hi @rchikhi, Happy to run some DNA queries first (we don't have protein running quite yet). However, this search requires Could you try installing sourmash in an isolated environment, and then recalculating sigs, please? installation: activate conda environment: signature generation:
|
Great! oh for some reason I didn't get an email notification for this reply. Thanks for the detailed instructions, here are the updated sigs: Gubaphage_genomes.dna.sig.zip |
Hi @rchikhi! I ran the query at k=21. see https://github.com/bluegenes/2021-gubaphage-magsearch for code & a notebook where I did some light processing and filtration of the results. Click on the binder if you'd like to run the notebook interactively. In that notebook, I selected results metagenome results that had at least 30% (output file: The processed results look like this:
...where What other information would be helpful, or what questions can I answer about this run? I can also search at k=31 and k=51 if you'd like - I'm not sure what (if any) additional metagenomes would be recovered. cheers, Tessa |
Thanks much Tessa! I'll have a look at the results and let you know if more info is needed on our side. |
Hi Tessa, quick question: the query was a pangenome, i.e. a collection of all known genomes for that clade. I suspect there's also some redundancy, i.e. some genomes inside this pangenome are very similar. Would it be correct to say that a containment score of 0.5 means that.. essentially 50% of the data inside that query pangenome has a hit? (With the truth being somewhere between "50% of the entries inside that FASTA file having a full hit and 50% have no hit", and "100% of the entries match over half their length"). |
Hi Luiz, Titus,
I was wondering if you could please run a Wort query across all metagenomes for the Serratus people? I seem to have understood from your talk that the best way to ask for it is through Github issues, but let me know if you prefer another channel. The goal here is to search for gubaphages in metagenomes. I'm attaching both DNA and protein signatures: Gubaphage_genomes.sigs.zip that I computed for a collection (multifasta) of gubaphages as follows:
thanks in advance!
Rayan
The text was updated successfully, but these errors were encountered: