-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TSPS-269 Speed up CountVariantsInChunksBeagle by using bedtools #1335
TSPS-269 Speed up CountVariantsInChunksBeagle by using bedtools #1335
Conversation
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really curious as to why the 1000 sample task with the updates ran almost twice as fast as the 42 sample task with the updates
@@ -161,7 +161,7 @@ task CountVariantsInChunks { | |||
set -e -o pipefail | |||
|
|||
echo $(gatk --java-options "-Xms~{command_mem}m -Xmx~{max_heap}m" CountVariants -V ~{vcf} | sed 's/Tool returned://') > var_in_original | |||
echo $(gatk --java-options "-Xms~{command_mem}m -Xmx~{max_heap}m" CountVariants -V ~{vcf} -L ~{panel_vcf} | sed 's/Tool returned://') > var_in_reference | |||
echo $(gatk --java-options "-Xms~{command_mem}m -Xmx~{max_heap}m" CountVariants -V ~{vcf} -L ~{panel_vcf} | sed 's/Tool returned://') > var_in_reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also update this task to be more efficient too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think so, this is used in the minimac wdl and i don't really want to touch that one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
input { | ||
File ref_panel_interval_list | ||
|
||
Int disk_size_gb = ceil(2*size(ref_panel_interval_list, "GiB")) + 50 # not sure how big the disk size needs to be since we aren't downloading the entire VCF here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment is probably copy pasta'd?
oh also just double checking that workflows that used to fail the QC still continue to fail the QC check (if you use the simulated data as input it should fail the QC check) |
@@ -161,7 +161,7 @@ task CountVariantsInChunks { | |||
set -e -o pipefail | |||
|
|||
echo $(gatk --java-options "-Xms~{command_mem}m -Xmx~{max_heap}m" CountVariants -V ~{vcf} | sed 's/Tool returned://') > var_in_original | |||
echo $(gatk --java-options "-Xms~{command_mem}m -Xmx~{max_heap}m" CountVariants -V ~{vcf} -L ~{panel_vcf} | sed 's/Tool returned://') > var_in_reference | |||
echo $(gatk --java-options "-Xms~{command_mem}m -Xmx~{max_heap}m" CountVariants -V ~{vcf} -L ~{panel_vcf} | sed 's/Tool returned://') > var_in_reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
95c3941
into
TSPS-183_mma_beagle_imputation_hg38
* try creating bed files * try again * try again again * a different thing * use bedtools and bed ref panel files * oops update the correct task * fix * use the right freaking file name * remove comment
Description
Stats for this task before and after the change:
1000 sample input, 1000G ref panel:
This PR also includes an update to the Ref Panel generation wdl to include creating the ref panel bed files used in the new task.
Also tested that the QC check for overlapping sites between input and reference still works: https://app.terra.bio/#workspaces/morgan-fieldeng/Imputation_pipeline_testing/job_history/d116884a-d918-44d1-900e-b91691a2b3b1
old branch: 500 sample input, 1000G ref panel. shard 0 / ... / CountVariantsInChunksBeagle / shard 0
var_in_original: 7574
var_also_in_reference: 7076
this branch: 500 sample input, sim 10k ref panel. shard 0 / .../ CountVariantsInChunksBeagle / shard 0
var_in_original: 7574
var_also_in_reference: 426
Checklist
If you can answer "yes" to the following items, please add a checkmark next to the appropriate checklist item(s) and notify our WARP documentation team by tagging either @ekiernan or @kayleemathews in a comment on this PR.