Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sfitz add intervals #92

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Sfitz add intervals #92

wants to merge 10 commits into from

Conversation

sorelfitzgibbon
Copy link
Collaborator

Description

  • Add the option to run with an intervals bed file and adds CollectHsMetrics.

    • CollectHsMetrics only runs with intervals specified
    • mosdepth coverage will be calculated by specified intervals instead of by window size
    • Qualimap bamQC will use specified intervals
  • Also fixes a previous error where mosdepth output filenames used the Picard version number.

Testing Results

NFTest
  • hg003-all-tools
    • /hot/software/pipeline/pipeline-generate-SQC-BAM/Nextflow/development/1.0.0/sfitz-add-intervals/log-nftest-20250204T004420Z.log
  • amini-targeted-with-bait
    • /hot/software/pipeline/pipeline-generate-SQC-BAM/Nextflow/development/1.0.0/sfitz-add-intervals/log-nftest-20250204T005126Z.log
  • a_mini-multiple-samples-all-tools
    • /hot/software/pipeline/pipeline-generate-SQC-BAM/Nextflow/development/1.0.0/sfitz-add-intervals/log-nftest-20250204T005533Z.log
  • amini-targeted-no-bait
    • /hot/software/pipeline/pipeline-generate-SQC-BAM/Nextflow/development/1.0.0/sfitz-add-intervals/log-nftest-20250204T010550Z.log

Checklist

  • I have read the code review guidelines and the code review best practice on GitHub check-list.

  • I have reviewed the Nextflow pipeline standards.

  • The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].

  • I have set up or verified the branch protection rule following the github standards before opening this pull request.

  • I have added my name to the contributors listings in the manifest block in the nextflow.config as part of this pull request, am listed
    already, or do not wish to be listed. (This acknowledgement is optional.)

  • I have added the changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.

  • I have updated the version number in the metadata.yaml and manifest block of the nextflow.config file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)

  • I have tested the pipeline using NFTest, or I have justified why I did not need to run NFTest above.

@sorelfitzgibbon sorelfitzgibbon requested a review from a team as a code owner February 4, 2025 01:28
@@ -35,7 +35,7 @@ process {
}
withName: assess_coverage_mosdepth {
cpus = 1
memory = 3.GB
memory = 8.GB
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mosdepth uses more memory when targets are specified. Can the allocated resources be based on whether intervals_bed is defined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be updated but it has to be done in methods; the resource handler has an optional input for customized allocations

@@ -36,14 +36,22 @@ params {
mosdepth_quantize_additional_options = '--mapq 20'

// Picard CollectWgsMetrics options
cwm_coverage_cap = 1000
cwm_coverage_cap = 250
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

250 is the Picard default value

Copy link
Contributor

@yashpatel6 yashpatel6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments

@@ -93,6 +93,20 @@ process {
}
}
}
withName: run_BedToIntervalList_picard {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
withName: run_BedToIntervalList_picard {
withName: run_BedToIntervalList_Picard {

Picard should be capitalized

@@ -90,6 +101,7 @@ log.info """\
normal: ${params.samples_to_process.findAll{ it.sample_type == 'normal' }['bam']}
normal read length: ${params.samples_to_process.findAll{ it.sample_type == 'normal' }['read_length']}
reference: ${params.reference}
intervals_bed: ${params.getOrDefault("intervals_bed", null)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (non-blocking): Consider a small description instead of null

Suggested change
intervals_bed: ${params.getOrDefault("intervals_bed", null)}
intervals_bed: ${params.getOrDefault("intervals_bed", "Not provided")}

Comment on lines +236 to +238
target_bed_per_sample_ch = target_bed_ch
.combine(samples_to_process_ch)
.map { it[0] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment (non-blocking): This does work; you can also directly pass a value as an input and it should get used multiple times so you don't have to duplicate it per other input channel emission like this

Comment on lines +273 to +275
bed_to_interval_list_ch = target_bed_ch
.map { ['target', it, params.reference_dict] }
.mix(bait_bed_ch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: This block of code is duplicated in this case and the else case; consider moving this part out of the condition and having the condition only handle mixing in the bait intervals if given

Comment on lines +46 to +51
chm_bait_intervals_bed = '' // if not defined, intevals_bed will be used
chm_coverage_cap = 2000
chm_minimum_mapping_quality = 20
chm_minimum_base_quality = 20
chm_per_base_output = true
chm_additional_options = ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment: If these are new, they should get validated in schema as well

Comment on lines +41 to +44
coverage_cap_arg = (params.chm_coverage_cap != null) ? "-COVERAGE_CAP ${params.chm_coverage_cap}" : ""
minimum_mapping_quality_arg = (params.chm_minimum_mapping_quality != null) ? "-MINIMUM_MAPPING_QUALITY ${params.chm_minimum_mapping_quality}" : ""
minimum_base_quality_arg = (params.chm_minimum_base_quality != null) ? "-MINIMUM_BASE_QUALITY ${params.chm_minimum_base_quality}" : ""
per_base_output_arg = (params.chm_per_base_output) ? "--PER_BASE_COVERAGE ${output_filename}_per-base-coverage" : ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment: Are the parameters being checked here required for the pipeline? If so, the checks shouldn't be necessary. If they are not required for the pipeline, then I suggest using getOrDefault instead as directly trying to access the parameters when it's not defined will result in an error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants