-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation for the de-novo pipeline #675
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some initial suggestions here. I think the inputs need to be greatly simplified / cleaned up for use in Terra before we commit any documentation.
slug: denovo | ||
--- | ||
|
||
The de-novo workflow operates on the annotated multi-sample VCF file created by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The de-novo workflow operates on the annotated multi-sample VCF file created by | |
The de novo SV workflow operates on the annotated multi-sample VCF file created by |
### Inputs | ||
|
||
- `vcf_file`: output of [AnnotateVcf](./av) called output_vcf. | ||
Note thatAll families in the vcf file must be included in the pedigree file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note thatAll families in the vcf file must be included in the pedigree file | |
Note that all families in the vcf file must be included in the pedigree file |
- `ped_input`: Must have a header as follows: | ||
|
||
| FamID | IndividualID | FatherID | MotherID | Gender | Affected | | ||
|-|-|-|-|-|-| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to just link to this page: https://gatk.broadinstitute.org/hc/en-us/articles/360035531972-PED-Pedigree-format
| FamID | IndividualID | FatherID | MotherID | Gender | Affected | | ||
|-|-|-|-|-|-| | ||
|
||
- `genomic_disorder_input`: a file in BED format that contains regions of genomic disorder; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `genomic_disorder_input`: a file in BED format that contains regions of genomic disorder; | |
- `genomic_disorder_input`: a file in BED format that contains genomic disorder regions; |
- `genomic_disorder_input`: a file in BED format that contains regions of genomic disorder; | ||
variants that overlap these regions will not be removed from the input VCF file. | ||
|
||
- `contigs`: Should be set to the following list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `contigs`: Should be set to the following list. | |
- `contigs`: List of reference contig names, e.g. for `hg38`: |
[ "chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr20", "chr21", "chr22", "chrX" ]` | ||
``` | ||
|
||
- `python_config`: a text file as the following. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `python_config`: a text file as the following. | |
- `python_config`: a text file defining the following parameters: |
gq_min: '0' | ||
``` | ||
|
||
Note that you value may increase the value of `cohort_AF` if the cohort is small. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How small?
a txt file with first column as batch and second column raw file generated from | ||
module05-ClusterBatch for all callers except depth (clustered_manta_vcf, clustered_melt_vcf, clustered_wham_vcf). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a txt file with first column as batch and second column raw file generated from | |
module05-ClusterBatch for all callers except depth (clustered_manta_vcf, clustered_melt_vcf, clustered_wham_vcf). | |
a txt file where the first column is the batch name and second column is the raw file generated from | |
the ClusterBatch workflow for all callers except depth (clustered_manta_vcf, clustered_melt_vcf, clustered_wham_vcf). |
- Must match batch names in batch_bincov, batch_raw_file, and batch_depth_raw_file | ||
- These batches and the samples contained in them are relevant in regards to the bincov matrices and raw files | ||
|
||
- `prefix`: choose any prefix which will become the prefix of output files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `prefix`: choose any prefix which will become the prefix of output files | |
- `prefix`: a prefix for output filenames |
@@ -0,0 +1,44 @@ | |||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is a lot of detail. IMO, it would be better to reference the scripts themselves and have those be sufficiently organized/commented that the methods are clear.
Thank you, @mwalker174, for the feedback! I agree that we need the setup of the inputs polished before we add docs; I will update docs after the inputs are updated. |
This PR extends the docs in the following areas: