Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GATK tool SVAnnotate for functional consequence annotation #342

Merged
merged 4 commits into from
May 4, 2022

Conversation

epiercehoffman
Copy link
Collaborator

Updates

Replace svtk annotate with GATK tool SVAnnotate for functional consequence annotation during AnnotateVcf.wdl.

  • Create standalone workflow AnnotateFunctionalConsequences.wdl that runs SVAnnotate with a protein-coding GTF and optionally a BED file of noncoding elements, with optional tool parameters exposed.
  • Call AnnotateFunctionalConsequences as a subworkflow during AnnotateVcf.wdl. Scattering should not be necessary for a small / medium sized cohort as SVAnnotate runs in ~25-30 mins on an unfiltered VCF with ~2500 samples, so scattering was not implemented, but it may need to be run separately per contig for very large cohorts.
  • Update single-sample WDL as needed
  • Update JSON & Terra templates as needed
  • Remove old annotation WDLs
  • Change custom canonical protein-coding GTF to MANE Select Plus Clinical GTF

Testing

  • Ran AnnotateVcf.wdl successfully with 1KG ref panel with cromshell
  • Validated all WDLs and JSONs with womtool (decrease to 27 test JSONs) & Terra validation script

Copy link
Collaborator

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Always happy to net 800 fewer lines and simplify the WDLs while fixing bugs and improving performance! A couple of minor suggestions.

Just checking: does the GATK docker need to be updated in dockers.json?

-V ~{vcf} \
-O ~{outfile} \
--protein-coding-gtf ~{protein_coding_gtf} \
~{if defined(noncoding_bed) then "--non-coding-bed " + noncoding_bed else ""} \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
~{if defined(noncoding_bed) then "--non-coding-bed " + noncoding_bed else ""} \
~{"--non-coding-bed " + noncoding_bed} \

Same for the others. Unless there was a miniwdl issue with this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this was the issue with miniwdl


set -euo pipefail

gatk --java-options "-Xmx~{java_mem_mb}m" SVAnnotate \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding an String additional_args input that would let the user specify arbitrary arguments not built into the wdl (e.g. a requester pays project id)?

@epiercehoffman
Copy link
Collaborator Author

Just checking: does the GATK docker need to be updated in dockers.json?

No, the current gatk_docker (the nightly snapshot) has SVAnnotate and that's what I used for testing.

Updated the way optional arguments are handled, added an optional additional_args parameter, validated with womtool, and tested successfully with default inputs (did not try with providing additional_args).

@epiercehoffman epiercehoffman merged commit daf14a9 into master May 4, 2022
@epiercehoffman epiercehoffman deleted the eph_integrate_svannotate branch May 4, 2022 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants