Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about no result from Margin #27

Open
yzhang-github-pub opened this issue Apr 5, 2023 · 7 comments
Open

Question about no result from Margin #27

yzhang-github-pub opened this issue Apr 5, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@yzhang-github-pub
Copy link

Dear author,

Thanks for developing hapdup, which works well most of time for me. But occasionally Margin failed even pepper produced as expected variant calls. What parameter(s) I can tune? Please advise.

Here is an example margin log file when 3 variant calls from pepper were expected but margin didn't keep any. Is there a way to loose the criteria?

Parsed 3 total VCF entries from /sample1/hapdup/pepper/PEPPER_VARIANT_FULL.vcf; kept 0 HETs, skipped 0 for region, 1 for not being PASS, 2 for being homozygous, 0 for being INDEL
No valid VCF entries found!

@mikolmogorov
Copy link
Collaborator

Looks like PEPPER did not find any heterozygous variants in your assembly. Could it be homozygous? If you tell us more about the genome and your dataset and provide PEPPER and Mergin logs, I should be able to help more.

@yzhang-github-pub
Copy link
Author

peper log:

[04-04-2023 13:59:45] INFO: ONT VARIANT CALLING MODE SELECTED.
[04-04-2023 13:59:45] INFO: MODE: PEPPER SNP
[04-04-2023 13:59:45] INFO: THRESHOLDS ARE SET TO:
[04-04-2023 13:59:45] INFO: MIN MAPQ: 5
[04-04-2023 13:59:45] INFO: MIN SNP BASEQ: 1
[04-04-2023 13:59:45] INFO: MIN INDEL BASEQ: 1
[04-04-2023 13:59:45] INFO: MIN SNP FREQUENCY: 0.1
[04-04-2023 13:59:45] INFO: MIN INSERT FREQUENCY: 0.15
[04-04-2023 13:59:45] INFO: MIN DELETE FREQUENCY: 0.15
[04-04-2023 13:59:45] INFO: MIN COVERAGE THRESHOLD: 3
[04-04-2023 13:59:45] INFO: MIN CANDIDATE SUPPORT: 2
[04-04-2023 13:59:45] INFO: MIN SNP CANDIDATE FREQUENCY: 0.1
[04-04-2023 13:59:45] INFO: MIN INDEL CANDIDATE FREQUENCY: 0.1
[04-04-2023 13:59:45] INFO: SKIP INDEL CANDIDATES: False
[04-04-2023 13:59:45] INFO: MAX ALLOWED CANDIDATE IN ONE SITE: 4
[04-04-2023 13:59:45] INFO: MIN SNP PREDICTIVE VALUE: 0.1
[04-04-2023 13:59:45] INFO: MIN INSERT PREDICTIVE VALUE: 0.25
[04-04-2023 13:59:45] INFO: MIN DELETE PREDICTIVE VALUE: 0.25
[04-04-2023 13:59:45] INFO: SNP QV CUTOFF FOR RE-GENOTYPING: 15
[04-04-2023 13:59:45] INFO: INDEL QV CUTOFF FOR RE-GENOTYPING: 10
[04-04-2023 13:59:45] INFO: REPORT ALL SNPs ABOVE THRESHOLD: 0
[04-04-2023 13:59:45] INFO: REPORT ALL INDELs ABOVE THRESHOLD: 0
[04-04-2023 13:59:45] INFO: CALL VARIANT MODULE SELECTED
[04-04-2023 13:59:45] INFO: RUN-ID: 04042023_135945
[04-04-2023 13:59:45] INFO: IMAGE OUTPUT: /temp/sample_test1/hapdup/pepper/images_04042023_135945/
[04-04-2023 13:59:45] INFO: STEP 1/3 GENERATING IMAGES:
[04-04-2023 13:59:45] INFO: COMMON CONTIGS FOUND: ['26530']
[04-04-2023 13:59:45] INFO: TOTAL CONTIGS: 1 TOTAL INTERVALS: 1 TOTAL BASES: 11579
[04-04-2023 13:59:46] INFO: STARTING PROCESS: 0 FOR 1 INTERVALS
[04-04-2023 13:59:46] INFO: THREAD 0 FINISHED SUCCESSFULLY.
[04-04-2023 13:59:46] INFO: FINISHED IMAGE GENERATION
[04-04-2023 13:59:46] INFO: TOTAL ELAPSED TIME FOR GENERATING IMAGES: 0 Min 0 Sec
[04-04-2023 13:59:46] INFO: STEP 2/3 RUNNING INFERENCE
[04-04-2023 13:59:46] INFO: OUTPUT: /temp/sample_test1/hapdup/pepper/predictions_04042023_135945/
[04-04-2023 13:59:46] INFO: DISTRIBUTED CPU SETUP.
[04-04-2023 13:59:46] INFO: TOTAL CALLERS: 16
[04-04-2023 13:59:46] INFO: THREADS PER CALLER: 1
[04-04-2023 13:59:46] INFO: MODEL LOADING TO ONNX
[04-04-2023 13:59:46] INFO: SAVING MODEL TO ONNX
/usr/local/lib/python3.8/dist-packages/torch/onnx/symbolic_opset9.py:2095: UserWarning: Exporting a model to ONNX with a batch_size other than 1, with a variable length with LSTM can cause an error when running the ONNX model with a different batch size. Make sure to save the model with a batch size of 1, or define the initial states (h0/c0) as inputs of the model.
warnings.warn("Exporting a model to ONNX with a batch_size other than 1, " +
[04-04-2023 13:59:47] INFO: SETTING THREADS TO: 1.
[04-04-2023 13:59:47] INFO: STARTING INFERENCE.
[04-04-2023 13:59:47] INFO: TOTAL SUMMARIES: 0.
[04-04-2023 13:59:47] INFO: THREAD 0 FINISHED SUCCESSFULLY.
[04-04-2023 13:59:47] INFO: FINISHED PREDICTION
[04-04-2023 13:59:47] INFO: ELAPSED TIME: 0 Min 0 Sec
[04-04-2023 13:59:47] INFO: PREDICTION FINISHED SUCCESSFULLY.
[04-04-2023 13:59:47] INFO: TOTAL ELAPSED TIME FOR INFERENCE: 0 Min 1 Sec
[04-04-2023 13:59:47] INFO: STEP 3/3 FINDING CANDIDATES
[04-04-2023 13:59:47] INFO: OUTPUT: /temp/sample_test1/hapdup/pepper/
[04-04-2023 13:59:47] INFO: STARTING CANDIDATE FINDING.
[04-04-2023 13:59:47] INFO: FINISHED PROCESSING, TOTAL CANDIDATES FOUND: 3
[04-04-2023 13:59:47] INFO: FINISHED PROCESSING, TOTAL VARIANTS IN PEPPER: 0
[04-04-2023 13:59:47] INFO: FINISHED PROCESSING, TOTAL VARIANTS SELECTED FOR RE-GENOTYPING: 3
[04-04-2023 13:59:47] INFO: TOTAL TIME SPENT ON CANDIDATE FINDING: 0 Min 0 Sec
[04-04-2023 13:59:47] INFO: TOTAL ELAPSED TIME FOR FINDING CANDIDATES: 0 Min 1 Sec

margin log:

_> Parsing model parameters from file: /opt/margin_params/phase/allParams.haplotag.ont-r94g507.hapDup.json

Parsed 3 total VCF entries from /temp/sample_test1/hapdup/pepper/PEPPER_VARIANT_FULL.vcf; kept 0 HETs, skipped 0 for region, 1 for not being PASS, 2 for being homozygous, 0 for being INDEL
No valid VCF entries found!_

@mikolmogorov
Copy link
Collaborator

It seems like there were no heterozygous SNPs found by PEPPER, so Marigin failed. If your genome is haploid, Hapdup is not applicable, as it is designed to phased out diploid contigs.

@yzhang-github-pub
Copy link
Author

The sample is diploid. For the same sample we detected SNPs from illumina data. And from the same nanopore data, clair3 called the expected variants. I wonder if users can loose the stringency in pepper/margin?

@mikolmogorov
Copy link
Collaborator

How many Illumina SNPs did you detect (with allele frequency above ~25%)? It may just be too few, 3 calls is not enough to phase. The expectation is human-like heterozygosity rate (e.g. 0.1%), that are distributed relatively uniformly.

@yzhang-github-pub
Copy link
Author

According to clair3 on nanopore input, the frequency is over 25%, as shown below:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
2 6760 . G T 19.84 PASS P GT:GQ:DP:AF 0/1:19:114:0.4298
2 7705 . G T 20.82 PASS P GT:GQ:DP:AF 0/1:20:110:0.4000
2 9604 . C G 15.62 PASS P GT:GQ:DP:AF 0/1:15:64:0.5312

@mikolmogorov
Copy link
Collaborator

Thanks for the info. I think 3 SNPs may be just too few for Margin, it was really designed for phasing long-ish genomic segments with relatively uniform variants distribution.

@mikolmogorov mikolmogorov added the bug Something isn't working label Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants