-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
phase block number compared to Whatshap #17
Comments
Hi @MichelMoser It's also important to compare total block length. + length distribution (e.g. N50). Simply a number of blocks may be misleading, as there may be a lot of short blocks. What is your input for WhastHap? If you see examples of long WH phased blocks that are fragmented in Hapdup (Margin), I'll be happy to take a look. Mikhail |
Hi @fenderglass , Thank you for the reply. Since this is not a real software issue, i am happy to move it somewhere else if needed. Input for WhatsHap was the same alignments as for HapDup. SNPs were called using whatshap find_snv_candidates with default params. I will make a graphical overview of fragment distribution over the chromosomes and inspect some fragments which got merged in WH with IGV.
|
Sorry for the late response and thanks for the info! Do you have heterozygosity rate estimates for these genomes? Based on your table, it seems to be around 0.001 (I'm dividing 0.001 is very similar to the human genome, and for human datasets with read coverage ~30x and N50 ~30kb, we typically get phased block at about 1Mb, so similar to your Margin numbers. WhatsHap's numbers are kinda too good to be true if you have similar ONT protocol. But if you have ultra-long reads, you can definitely achieve phased block N50 of around 20Mb. So overall hard to tell exactly what is going on without some kind of ground truth or looking at the raw data.. |
Hello, thank you for the great tool!
I was just testing HapDup v0.7 on our fish genome.
Comparing the output with phasing done with WhatsHap (WH), I wondered why there is such a big difference in phased block size and block number between HapDup and the WH pipeline?
For the fish chromosomes, WH was generating 679 blocks using 2'689'114 phased SNPs.
Margin (HapDup pipeline) was generating 5352 blocks using 3'862'108 phased SNPs.
The main difference seems to be the prior read filtering and usage of MarginPhase for the phasing in HapDup, but does this explain such a big difference?
I was wondering if phase blocks of HapDup could be concatenated using whatshap SNP and block information to increase continuity?
I imagine it would be a straightforward approach overlapping SNP positions between Margin and WH with phase block ids and lift-over phase ids from WH.
I will do some visual inspections and scripting to test if there is overlap of called SNPs and agreement on block boarders.
Cheers,
Michel
The text was updated successfully, but these errors were encountered: