-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sfitz add intervals #92
base: main
Are you sure you want to change the base?
Conversation
@@ -35,7 +35,7 @@ process { | |||
} | |||
withName: assess_coverage_mosdepth { | |||
cpus = 1 | |||
memory = 3.GB | |||
memory = 8.GB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mosdepth
uses more memory when targets are specified. Can the allocated resources be based on whether intervals_bed
is defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be updated but it has to be done in methods; the resource handler has an optional input for customized allocations
@@ -36,14 +36,22 @@ params { | |||
mosdepth_quantize_additional_options = '--mapq 20' | |||
|
|||
// Picard CollectWgsMetrics options | |||
cwm_coverage_cap = 1000 | |||
cwm_coverage_cap = 250 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
250 is the Picard default value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments
@@ -93,6 +93,20 @@ process { | |||
} | |||
} | |||
} | |||
withName: run_BedToIntervalList_picard { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
withName: run_BedToIntervalList_picard { | |
withName: run_BedToIntervalList_Picard { |
Picard should be capitalized
@@ -90,6 +101,7 @@ log.info """\ | |||
normal: ${params.samples_to_process.findAll{ it.sample_type == 'normal' }['bam']} | |||
normal read length: ${params.samples_to_process.findAll{ it.sample_type == 'normal' }['read_length']} | |||
reference: ${params.reference} | |||
intervals_bed: ${params.getOrDefault("intervals_bed", null)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (non-blocking): Consider a small description instead of null
intervals_bed: ${params.getOrDefault("intervals_bed", null)} | |
intervals_bed: ${params.getOrDefault("intervals_bed", "Not provided")} |
target_bed_per_sample_ch = target_bed_ch | ||
.combine(samples_to_process_ch) | ||
.map { it[0] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment (non-blocking): This does work; you can also directly pass a value as an input and it should get used multiple times so you don't have to duplicate it per other input channel emission like this
bed_to_interval_list_ch = target_bed_ch | ||
.map { ['target', it, params.reference_dict] } | ||
.mix(bait_bed_ch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: This block of code is duplicated in this case and the else case; consider moving this part out of the condition and having the condition only handle mixing in the bait intervals if given
chm_bait_intervals_bed = '' // if not defined, intevals_bed will be used | ||
chm_coverage_cap = 2000 | ||
chm_minimum_mapping_quality = 20 | ||
chm_minimum_base_quality = 20 | ||
chm_per_base_output = true | ||
chm_additional_options = '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment: If these are new, they should get validated in schema as well
coverage_cap_arg = (params.chm_coverage_cap != null) ? "-COVERAGE_CAP ${params.chm_coverage_cap}" : "" | ||
minimum_mapping_quality_arg = (params.chm_minimum_mapping_quality != null) ? "-MINIMUM_MAPPING_QUALITY ${params.chm_minimum_mapping_quality}" : "" | ||
minimum_base_quality_arg = (params.chm_minimum_base_quality != null) ? "-MINIMUM_BASE_QUALITY ${params.chm_minimum_base_quality}" : "" | ||
per_base_output_arg = (params.chm_per_base_output) ? "--PER_BASE_COVERAGE ${output_filename}_per-base-coverage" : "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment: Are the parameters being checked here required for the pipeline? If so, the checks shouldn't be necessary. If they are not required for the pipeline, then I suggest using getOrDefault
instead as directly trying to access the parameters when it's not defined will result in an error
Description
Add the option to run with an
intervals
bed file and addsCollectHsMetrics
.CollectHsMetrics
only runs with intervals specifiedmosdepth
coverage will be calculated by specified intervals instead of by window sizeQualimap bamQC
will use specified intervalsAlso fixes a previous error where mosdepth output filenames used the Picard version number.
Testing Results
NFTest
/hot/software/pipeline/pipeline-generate-SQC-BAM/Nextflow/development/1.0.0/sfitz-add-intervals/log-nftest-20250204T004420Z.log
/hot/software/pipeline/pipeline-generate-SQC-BAM/Nextflow/development/1.0.0/sfitz-add-intervals/log-nftest-20250204T005126Z.log
/hot/software/pipeline/pipeline-generate-SQC-BAM/Nextflow/development/1.0.0/sfitz-add-intervals/log-nftest-20250204T005533Z.log
/hot/software/pipeline/pipeline-generate-SQC-BAM/Nextflow/development/1.0.0/sfitz-add-intervals/log-nftest-20250204T010550Z.log
Checklist
I have read the code review guidelines and the code review best practice on GitHub check-list.
I have reviewed the Nextflow pipeline standards.
The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
I have set up or verified the branch protection rule following the github standards before opening this pull request.
I have added my name to the contributors listings in the
manifest
block in thenextflow.config
as part of this pull request, am listedalready, or do not wish to be listed. (This acknowledgement is optional.)
I have added the changes included in this pull request to the
CHANGELOG.md
under the next release version or unreleased, and updated the date.I have updated the version number in the
metadata.yaml
andmanifest
block of thenextflow.config
file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)I have tested the pipeline using NFTest, or I have justified why I did not need to run NFTest above.