Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plot SV counts at the end of the ClusterBatch module #389

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion inputs/templates/test/ClusterBatch/ClusterBatch.json.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -29,5 +29,7 @@
"ClusterBatch.wham_vcf_tar": {{ test_batch.std_wham_vcf_tar | tojson }},
"ClusterBatch.manta_vcf_tar": {{ test_batch.std_manta_vcf_tar | tojson }},
"ClusterBatch.melt_vcf_tar": {{ test_batch.std_melt_vcf_tar | tojson }},
"ClusterBatch.ped_file": {{ test_batch.ped_file | tojson }}
"ClusterBatch.ped_file": {{ test_batch.ped_file | tojson }},

"ClusterBatch.outlier_cutoff_nIQR": "10000"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"ClusterBatch.outlier_cutoff_nIQR": "10000"
"ClusterBatch.outlier_cutoff_nIQR": "8"

10000 is used in FilterBatch to effectively disable sample filtering. Here we should make it something statistically reasonable, otherwise the cutoffs at +/- 10000 IQR make the plots unreadable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been using 6 for plotting

}
19 changes: 19 additions & 0 deletions wdl/ClusterBatch.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import "DepthClustering.wdl" as depth
import "ClusterBatchMetrics.wdl" as metrics
import "TasksClusterBatch.wdl" as tasks
import "Utils.wdl" as util
import "PlotSVCountsPerSample.wdl" as sv_counts

workflow ClusterBatch {
input {
Expand Down Expand Up @@ -65,6 +66,8 @@ workflow ClusterBatch {

Float? java_mem_fraction

Int outlier_cutoff_nIQR
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest making this an optional. If it's provided, we run PlotSVCountsPerSample. If not, it skips the task (we don't need to run this for the single sample pipeline for example). Would need to add this input and the outputs to GATKSVPipelinePhase1 and GATKSVPipelineBatch.


RuntimeAttr? runtime_attr_ids_from_vcf_list
RuntimeAttr? runtime_attr_create_ploidy
RuntimeAttr? runtime_attr_prepare_pesr_vcfs
Expand All @@ -82,6 +85,9 @@ workflow ClusterBatch {
RuntimeAttr? runtime_attr_gatk_to_svtk_vcf_depth
RuntimeAttr? runtime_override_concat_vcfs_depth
RuntimeAttr? runtime_attr_exclude_intervals_pesr
RuntimeAttr? runtime_attr_count_svs
RuntimeAttr? runtime_attr_plot_svcounts
RuntimeAttr? runtime_attr_cat_outliers_preview
}

call util.GetSampleIdsFromVcfTar {
Expand Down Expand Up @@ -282,9 +288,22 @@ workflow ClusterBatch {
}
}

call sv_counts.PlotSVCountsPerSample {
input:
prefix = batch,
vcfs = [select_first([ClusterPESR_manta.clustered_vcf]), select_first([ClusterPESR_wham.clustered_vcf]), select_first([ClusterPESR_melt.clustered_vcf]), select_first([ClusterPESR_scramble.clustered_vcf])],
VJalili marked this conversation as resolved.
Show resolved Hide resolved
N_IQR_cutoff = outlier_cutoff_nIQR,
sv_pipeline_docker = sv_pipeline_docker,
runtime_attr_count_svs = runtime_attr_count_svs,
runtime_attr_plot_svcounts = runtime_attr_plot_svcounts,
runtime_attr_cat_outliers_preview = runtime_attr_cat_outliers_preview
}

output {
File clustered_depth_vcf = ClusterDepth.clustered_vcf
File clustered_depth_vcf_index = ClusterDepth.clustered_vcf_index
Array[File] sv_counts = PlotSVCountsPerSample.sv_counts
Array[File] sv_count_plots = PlotSVCountsPerSample.sv_count_plots
File? clustered_manta_vcf = ClusterPESR_manta.clustered_vcf
File? clustered_manta_vcf_index = ClusterPESR_manta.clustered_vcf_index
File? clustered_wham_vcf = ClusterPESR_wham.clustered_vcf
Expand Down