-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plot SV counts at the end of the ClusterBatch module #389
Conversation
ff3ecf8
to
dafd2b2
Compare
Co-authored-by: Mark Walker <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should try and get this in since it is useful QC at this stage of the pipeline. Can you take it further and integrate with GATKSVPipelinePhase1 and GATKSVPipelineBatch, including updating json templates?
"ClusterBatch.ped_file": {{ test_batch.ped_file | tojson }} | ||
"ClusterBatch.ped_file": {{ test_batch.ped_file | tojson }}, | ||
|
||
"ClusterBatch.outlier_cutoff_nIQR": "10000" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"ClusterBatch.outlier_cutoff_nIQR": "10000" | |
"ClusterBatch.outlier_cutoff_nIQR": "8" |
10000 is used in FilterBatch to effectively disable sample filtering. Here we should make it something statistically reasonable, otherwise the cutoffs at +/- 10000 IQR make the plots unreadable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been using 6 for plotting
@@ -65,6 +66,8 @@ workflow ClusterBatch { | |||
|
|||
Float? java_mem_fraction | |||
|
|||
Int outlier_cutoff_nIQR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest making this an optional. If it's provided, we run PlotSVCountsPerSample. If not, it skips the task (we don't need to run this for the single sample pipeline for example). Would need to add this input and the outputs to GATKSVPipelinePhase1 and GATKSVPipelineBatch.
This was completed as part of #567 so I would recommend closing this PR |
The plots will be used for removing outliers before training the random forest model. This is the first step in resolving the issue #44.