Plot SV counts at the end of the ClusterBatch module #389

VJalili · 2022-08-18T16:44:07Z

The plots will be used for removing outliers before training the random forest model. This is the first step in resolving the issue #44.

wdl/ClusterBatch.wdl

Co-authored-by: Mark Walker <[email protected]>

mwalker174

I think we should try and get this in since it is useful QC at this stage of the pipeline. Can you take it further and integrate with GATKSVPipelinePhase1 and GATKSVPipelineBatch, including updating json templates?

mwalker174 · 2022-10-05T17:03:27Z

inputs/templates/test/ClusterBatch/ClusterBatch.json.tmpl

-  "ClusterBatch.ped_file": {{ test_batch.ped_file | tojson }}
+  "ClusterBatch.ped_file": {{ test_batch.ped_file | tojson }},
+
+  "ClusterBatch.outlier_cutoff_nIQR": "10000"


Suggested change

"ClusterBatch.outlier_cutoff_nIQR": "10000"

"ClusterBatch.outlier_cutoff_nIQR": "8"

10000 is used in FilterBatch to effectively disable sample filtering. Here we should make it something statistically reasonable, otherwise the cutoffs at +/- 10000 IQR make the plots unreadable.

I've been using 6 for plotting

mwalker174 · 2022-10-05T17:14:12Z

wdl/ClusterBatch.wdl

@@ -65,6 +66,8 @@ workflow ClusterBatch {

    Float? java_mem_fraction

+    Int outlier_cutoff_nIQR


I'd suggest making this an optional. If it's provided, we run PlotSVCountsPerSample. If not, it skips the task (we don't need to run this for the single sample pipeline for example). Would need to add this input and the outputs to GATKSVPipelinePhase1 and GATKSVPipelineBatch.

epiercehoffman · 2023-08-17T20:59:33Z

This was completed as part of #567 so I would recommend closing this PR

VJalili requested a review from mwalker174 August 18, 2022 16:44

Plot SV counts at the end of ClusterBatch.

dafd2b2

VJalili force-pushed the remove-outliers branch from ff3ecf8 to dafd2b2 Compare August 18, 2022 16:55

mwalker174 reviewed Aug 19, 2022

View reviewed changes

wdl/ClusterBatch.wdl Outdated Show resolved Hide resolved

Update wdl/ClusterBatch.wdl

1947cfe

Co-authored-by: Mark Walker <[email protected]>

mwalker174 reviewed Oct 5, 2022

View reviewed changes

epiercehoffman mentioned this pull request Jul 12, 2023

Add PlotSVCountsPerSample subworkflow to the end of ClusterBatch and FilterBatchSites #567

Merged

VJalili closed this Aug 18, 2023

VJalili deleted the remove-outliers branch August 18, 2023 14:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plot SV counts at the end of the ClusterBatch module #389

Plot SV counts at the end of the ClusterBatch module #389

VJalili commented Aug 18, 2022

mwalker174 left a comment

mwalker174 Oct 5, 2022

epiercehoffman Nov 17, 2022

mwalker174 Oct 5, 2022

epiercehoffman commented Aug 17, 2023

	"ClusterBatch.outlier_cutoff_nIQR": "10000"
	"ClusterBatch.outlier_cutoff_nIQR": "8"

		@@ -65,6 +66,8 @@ workflow ClusterBatch {

		Float? java_mem_fraction

		Int outlier_cutoff_nIQR

Plot SV counts at the end of the ClusterBatch module #389

Plot SV counts at the end of the ClusterBatch module #389

Conversation

VJalili commented Aug 18, 2022

mwalker174 left a comment

Choose a reason for hiding this comment

mwalker174 Oct 5, 2022

Choose a reason for hiding this comment

epiercehoffman Nov 17, 2022

Choose a reason for hiding this comment

mwalker174 Oct 5, 2022

Choose a reason for hiding this comment

epiercehoffman commented Aug 17, 2023