Sfitz add intervals #92

sorelfitzgibbon · 2025-02-04T01:28:24Z

Description

Add the option to run with an intervals bed file and adds CollectHsMetrics.
- CollectHsMetrics only runs with intervals specified
- mosdepth coverage will be calculated by specified intervals instead of by window size
- Qualimap bamQC will use specified intervals
Also fixes a previous error where mosdepth output filenames used the Picard version number.

Testing Results

NFTest

hg003-all-tools
- /hot/software/pipeline/pipeline-generate-SQC-BAM/Nextflow/development/1.0.0/sfitz-add-intervals/log-nftest-20250204T004420Z.log
amini-targeted-with-bait
- /hot/software/pipeline/pipeline-generate-SQC-BAM/Nextflow/development/1.0.0/sfitz-add-intervals/log-nftest-20250204T005126Z.log
a_mini-multiple-samples-all-tools
- /hot/software/pipeline/pipeline-generate-SQC-BAM/Nextflow/development/1.0.0/sfitz-add-intervals/log-nftest-20250204T005533Z.log
amini-targeted-no-bait
- /hot/software/pipeline/pipeline-generate-SQC-BAM/Nextflow/development/1.0.0/sfitz-add-intervals/log-nftest-20250204T010550Z.log

Checklist

I have read the code review guidelines and the code review best practice on GitHub check-list.
I have reviewed the Nextflow pipeline standards.
The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
I have set up or verified the branch protection rule following the github standards before opening this pull request.
I have added my name to the contributors listings in the manifest block in the nextflow.config as part of this pull request, am listed
already, or do not wish to be listed. (This acknowledgement is optional.)
I have added the changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.
I have updated the version number in the metadata.yaml and manifest block of the nextflow.config file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)
I have tested the pipeline using NFTest, or I have justified why I did not need to run NFTest above.

sorelfitzgibbon · 2025-02-04T01:31:33Z

config/F16.config

@@ -35,7 +35,7 @@ process {
    }
    withName: assess_coverage_mosdepth {
        cpus = 1
-        memory = 3.GB
+        memory = 8.GB


mosdepth uses more memory when targets are specified. Can the allocated resources be based on whether intervals_bed is defined?

It can be updated but it has to be done in methods; the resource handler has an optional input for customized allocations

sorelfitzgibbon · 2025-02-04T01:34:47Z

config/template.config

@@ -36,14 +36,22 @@ params {
    mosdepth_quantize_additional_options = '--mapq 20'

    // Picard CollectWgsMetrics options
-    cwm_coverage_cap = 1000
+    cwm_coverage_cap = 250


250 is the Picard default value

yashpatel6

A few comments

yashpatel6 · 2025-02-04T01:35:53Z

config/F16.config

@@ -93,6 +93,20 @@ process {
            }
        }
    }
+    withName: run_BedToIntervalList_picard {


Suggested change

withName: run_BedToIntervalList_picard {

withName: run_BedToIntervalList_Picard {

Picard should be capitalized

yashpatel6 · 2025-02-04T01:40:54Z

main.nf

@@ -90,6 +101,7 @@ log.info """\
        normal: ${params.samples_to_process.findAll{ it.sample_type == 'normal' }['bam']}
        normal read length: ${params.samples_to_process.findAll{ it.sample_type == 'normal' }['read_length']}
        reference: ${params.reference}
+        intervals_bed: ${params.getOrDefault("intervals_bed", null)}


suggestion (non-blocking): Consider a small description instead of null

Suggested change

intervals_bed: ${params.getOrDefault("intervals_bed", null)}

intervals_bed: ${params.getOrDefault("intervals_bed", "Not provided")}

yashpatel6 · 2025-02-04T01:44:15Z

main.nf

+    target_bed_per_sample_ch = target_bed_ch
+        .combine(samples_to_process_ch)
+        .map { it[0] }


comment (non-blocking): This does work; you can also directly pass a value as an input and it should get used multiple times so you don't have to duplicate it per other input channel emission like this

yashpatel6 · 2025-02-04T01:47:39Z

main.nf

+        bed_to_interval_list_ch = target_bed_ch
+            .map { ['target', it, params.reference_dict] }
+            .mix(bait_bed_ch)


suggestion: This block of code is duplicated in this case and the else case; consider moving this part out of the condition and having the condition only handle mixing in the bait intervals if given

yashpatel6 · 2025-02-04T01:52:53Z

config/template.config

+    chm_bait_intervals_bed = '' // if not defined, intevals_bed will be used
+    chm_coverage_cap = 2000
+    chm_minimum_mapping_quality = 20
+    chm_minimum_base_quality = 20
+    chm_per_base_output = true
+    chm_additional_options = ''


comment: If these are new, they should get validated in schema as well

yashpatel6 · 2025-02-04T01:54:32Z

module/collectHsMetrics_picard.nf

+    coverage_cap_arg = (params.chm_coverage_cap != null) ? "-COVERAGE_CAP ${params.chm_coverage_cap}" : ""
+    minimum_mapping_quality_arg = (params.chm_minimum_mapping_quality != null) ? "-MINIMUM_MAPPING_QUALITY ${params.chm_minimum_mapping_quality}" : ""
+    minimum_base_quality_arg = (params.chm_minimum_base_quality != null) ? "-MINIMUM_BASE_QUALITY ${params.chm_minimum_base_quality}" : ""
+    per_base_output_arg = (params.chm_per_base_output) ? "--PER_BASE_COVERAGE ${output_filename}_per-base-coverage" : ""


comment: Are the parameters being checked here required for the pipeline? If so, the checks shouldn't be necessary. If they are not required for the pipeline, then I suggest using getOrDefault instead as directly trying to access the parameters when it's not defined will result in an error

sorelfitzgibbon added 10 commits January 29, 2025 15:12

update resource configs

34bc0a8

update schema and template.config

3d4c795

update resource configs

b6329d1

new modules

3f48a09

updated modules

744bb36

update template.config

e02b74f

update nftest

05761cd

nftest typo

5517c97

add comment

5b4022b

update changelog

1f43887

sorelfitzgibbon requested a review from a team as a code owner February 4, 2025 01:28

sorelfitzgibbon commented Feb 4, 2025

View reviewed changes

sorelfitzgibbon requested a review from yashpatel6 February 4, 2025 01:39

yashpatel6 reviewed Feb 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sfitz add intervals #92

Sfitz add intervals #92

sorelfitzgibbon commented Feb 4, 2025

sorelfitzgibbon Feb 4, 2025

yashpatel6 Feb 4, 2025

sorelfitzgibbon Feb 4, 2025

yashpatel6 left a comment

yashpatel6 Feb 4, 2025

yashpatel6 Feb 4, 2025

yashpatel6 Feb 4, 2025

yashpatel6 Feb 4, 2025

yashpatel6 Feb 4, 2025

yashpatel6 Feb 4, 2025

	withName: run_BedToIntervalList_picard {
	withName: run_BedToIntervalList_Picard {

	intervals_bed: ${params.getOrDefault("intervals_bed", null)}
	intervals_bed: ${params.getOrDefault("intervals_bed", "Not provided")}

Sfitz add intervals #92

Are you sure you want to change the base?

Sfitz add intervals #92

Conversation

sorelfitzgibbon commented Feb 4, 2025

Description

Testing Results

NFTest

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yashpatel6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment