Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unitigs with no color #74

Closed
aryakaul opened this issue Oct 5, 2023 · 2 comments
Closed

Unitigs with no color #74

aryakaul opened this issue Oct 5, 2023 · 2 comments

Comments

@aryakaul
Copy link

aryakaul commented Oct 5, 2023

Hello,

Apologies if I'm misunderstanding something but I am attempting to extract the unitigs by color matrix from Bifrost's output. My current approach is adapted from #50.

zcat bifrost_out.gfa.gz | awk '{if ($1=="S") {print ">" $2 "\n" $3}}' > unitigs.fasta
Bifrost query -v -t 32 -e 1.0 -g bifrost_out.gfa.gz -C bifrost_out.color.bfg -q unitigs.fasta -o unitigs.colors

But when I query the associated unitigs.colors.tsv I occasionally get unitigs with no colors (0 0):

 ➜ cat unitigs.colors.tsv | cut -f2- | sed 1d | sort | uniq -c
    466 0       0
   1253 0       1
    675 1       0
   1394 1       1

From my understanding, every unitig should have at least one color associated with it, otherwise it should not appear in the GFA? Any help or advice would be appreciated!

Best,
Arya

@aryakaul aryakaul changed the title Unitigs with no color? Unitigs with no color Oct 5, 2023
@GuillaumeHolley
Copy link
Collaborator

GuillaumeHolley commented Oct 5, 2023

Hi @aryakaul,

By using -e 1.0 in your search, you only report a queried unitig as "present" in one color if all of its composing k-mers have that color. So what your query file shows is that you have 466 unitigs for which not all the k-mers are colored with color 1 or color 2. An example of why this happens is if a k-mer x from sample 1 overlaps (by k-1 base pairs) a k-mer y from sample 2. The resulting colored graph will contain a unitig with x and y for which color(x)=1 and color(y)=2. Hence, not all k-mers of that unitig would have color 1 or 2.

If you redo the same experiment by querying k-mers rather than unitigs as shown in #50, all your queries should show at least one color.

If this is of interest for your use case, the Bifrost version on the robin branch (which is the next Bifrost release) adds extra search options where in addition to reporting a queried sequence as "present" or "absent" using a fixed threshold -e, you can instead reports either the number of k-mers found or the ratio of found k-mer per color. However, you would have to redo a graph as this version breaks file format compatibility with current Bifrost release,

Let me know if this helps,
Guillaume

@aryakaul
Copy link
Author

aryakaul commented Oct 9, 2023

Thank you! This is very helpful! I've been working with small toy datasets up until now, and didn't run into this so thought I had done something wrong. Noted about the robin branch, I'll take a crack at building that.

Best,
Arya

@aryakaul aryakaul closed this as completed Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants