Skip to content

Commit

Permalink
update diagrams to display recommended invocation order.
Browse files Browse the repository at this point in the history
  • Loading branch information
VJalili committed Sep 18, 2024
1 parent b9dfb39 commit 490c739
Show file tree
Hide file tree
Showing 4 changed files with 48 additions and 49 deletions.
15 changes: 8 additions & 7 deletions website/docs/modules/evidence_qc.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,10 @@ for further guidance on creating batches.
We also recommend using sex assignments generated from the ploidy
estimates and incorporating them into the PED file, with sex = 0 for sex aneuploidies.

The upstream and downstream dependencies of the EvidenceQC workflow
are illustrated in the following diagram.
The following diagram illustrates the upstream and downstream workflows of the `EvidenceQC` workflow
in the recommended invocation order. You may refer to
[this diagram](https://github.com/broadinstitute/gatk-sv/blob/main/terra_pipeline_diagram.jpg)
for the overall recommended invocation order.

<br/>

Expand All @@ -34,16 +36,15 @@ stateDiagram
classDef outModules stroke-width:0px,fill:#caf0f8,color:#00509d
gse: GatherSampleEvidence
gbe: GatherBatchEvidence
eqc: EvidenceQC
t: TrainGCNV
batching: Batching, sample QC, and sex assignment
gse --> eqc
eqc --> t
eqc --> gbe
eqc --> batching
class eqc thisModule
class gse inModules
class t, gbe outModules
class batching outModules
```

<br/>
Expand Down
20 changes: 10 additions & 10 deletions website/docs/modules/gather_batch_evidence.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ slug: gbe
Runs CNV callers ([cn.MOPS](https://academic.oup.com/nar/article/40/9/e69/1136601), GATK gCNV)
and combines single-sample raw evidence into a batch.

The following diagram illustrates the downstream workflows of the `GatherBatchEvidence` workflow
in the recommended invocation order. You may refer to
[this diagram](https://github.com/broadinstitute/gatk-sv/blob/main/terra_pipeline_diagram.jpg)
for the overall recommended invocation order.

```mermaid
Expand All @@ -18,19 +22,15 @@ stateDiagram
classDef thisModule font-weight:bold,stroke-width:0px,fill:#ff9900,color:white
classDef outModules stroke-width:0px,fill:#caf0f8,color:#00509d
gse: GatherSampleEvidence
eqc: EvidenceQC
gcnv: TrainGCNV
gbe: GatherBatchEvidence
cbe: ClusterBatch
gse --> gbe
eqc --> gbe
gcnv --> gbe
gbe --> cbe
t: TrainGCNV
cb: ClusterBatch
t --> gbe
gbe --> cb
class gbe thisModule
class gse, eqc, gcnv inModules
class cbe outModules
class t inModules
class cb outModules
```

## Inputs
Expand Down
13 changes: 6 additions & 7 deletions website/docs/modules/gather_sample_evidence.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,11 @@ Runs raw evidence collection on each sample with the following SV callers:
Manta, Wham, Scramble, and/or MELT. For guidance on pre-filtering prior to GatherSampleEvidence,
refer to the Sample Exclusion section.

The downstream dependencies of the GatherSampleEvidence workflow
are illustrated in the following diagram.
The following diagram illustrates the downstream workflows of the `GatherSampleEvidence` workflow
in the recommended invocation order. You may refer to
[this diagram](https://github.com/broadinstitute/gatk-sv/blob/main/terra_pipeline_diagram.jpg)
for the overall recommended invocation order.


```mermaid
Expand All @@ -23,14 +26,10 @@ stateDiagram
gse: GatherSampleEvidence
eqc: EvidenceQC
gcnv: TrainGCNV
gbe: GatherBatchEvidence
gse --> eqc
gse --> gcnv
gse --> gbe
class gse thisModule
class eqc, gcnv, gbe outModules
class eqc outModules
```


Expand Down
49 changes: 24 additions & 25 deletions website/docs/modules/train_gcnv.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,27 @@ is a method for detecting rare germline copy number variants (CNVs)
from short-read sequencing read-depth information.
The [TrainGCNV](https://github.com/broadinstitute/gatk-sv/blob/main/wdl/TrainGCNV.wdl)
module trains a gCNV model for use in the [GatherBatchEvidence](./gbe) workflow.
The upstream and downstream dependencies of the TrainGCNV module are illustrated in the following diagram.
The upstream and downstream dependencies of the TrainGCNV module are illustrated in the following diagram.

<br/>

The samples used for training should be homogeneous and similar
to the samples on which the model will be applied in terms of sample type,
library preparation protocol, sequencer, sequencing center, and etc.


For small, relatively homogeneous cohorts, a single gCNV model is usually sufficient.
However, for larger cohorts, especially those with multiple data sources,
it is necessary to train a separate model for each batch or group of batches
with a similar dosage score (WGD).
The model can be trained on all or a subset of the samples to which it will be applied.
A subset of 100 randomly selected samples from the batch is a reasonable
input size for training the model; also, the `TrainGCNV` workflow can automatically select
a given number of random samples through the `n_samples_subsample` parameter.

The following diagram illustrates the upstream and downstream workflows of the `TrainGCNV` workflow
in the recommended invocation order. You may refer to
[this diagram](https://github.com/broadinstitute/gatk-sv/blob/main/terra_pipeline_diagram.jpg)
for the overall recommended invocation order.

```mermaid
Expand All @@ -25,37 +43,18 @@ stateDiagram
classDef thisModule font-weight:bold,stroke-width:0px,fill:#ff9900,color:white
classDef outModules stroke-width:0px,fill:#caf0f8,color:#00509d
gse: GatherSampleEvidence
eqc: EvidenceQC
batching: Batching, sample QC, and sex assignment
t: TrainGCNV
gse --> t
eqc --> t
gbe: GatherBatchEvidence
batching --> t
t --> gbe
class t thisModule
class gse, eqc inModules
class batching inModules
class gbe outModules
```

<br/>


The samples used for training should be homogeneous and similar
to the samples on which the model will be applied in terms of sample type,
library preparation protocol, sequencer, sequencing center, and etc.


For small, relatively homogeneous cohorts, a single gCNV model is usually sufficient.
However, for larger cohorts, especially those with multiple data sources,
it is necessary to train a separate model for each batch or group of batches
with a similar dosage score (WGD).
The model can be trained on all or a subset of the samples to which it will be applied.
A subset of 100 randomly selected samples from the batch is a reasonable
input size for training the model; also, the `TrainGCNV` workflow can automatically select
a given number of random samples through the `n_samples_subsample` parameter.


## Inputs

This section provides a brief description on the _required_ inputs of the TrainGCNV workflow.
Expand Down

0 comments on commit 490c739

Please sign in to comment.