Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grouped MEI's with insertions in splitvariants.py #687

Merged
merged 1 commit into from
Jul 2, 2024

Conversation

kirtanav98
Copy link
Contributor

@kirtanav98 kirtanav98 commented Jun 6, 2024

This address issue 649. svtk vcf2bed uses the ALT field to produce the svtype column in the output BED file. This means that the svtype column includes BND alt alleles and values like INS:ME for MEIs. However, the current and previous SplitVariants tasks in GenotypeBatch matched exactly on the string "INS" when creating insertion-specific BED files, so the MEIs were grouped with BCAs instead. Here the MEI's are grouped together with the insertions when creating the insertion-specific BED files instead of the BCA's. This can allow for further evaluation the impact of this on genotyping. This has been successfully been validated with womtool and cromshell using the 1kgp reference panel inputs. The results of the previous script and docker and the results of the updated script and docker can be found here

@kirtanav98 kirtanav98 requested a review from epiercehoffman June 6, 2024 15:37
@kirtanav98 kirtanav98 self-assigned this Jun 6, 2024
@kirtanav98 kirtanav98 marked this pull request as draft June 6, 2024 15:48
@kirtanav98 kirtanav98 requested a review from mwalker174 June 11, 2024 18:30
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert these changes - the dockers will get built and updated automatically when this is merged!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these were reverted

Copy link
Collaborator

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching this bug @kirtanav98. The genotyping from your tests looks good. Here is a summary of fraction of variants in HWE before -> after the changes by ALT allele:

  • <INS>: 0.50 -> 0.50
  • <INS:ME:ALU>: 0.52 -> 0.71
  • <INS:ME:SVA>: 0.56 -> 0.76
  • <INS:ME:LINE1>: 0.59 -> 0.75

Just one small request.

}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should revert this trivial change as well. You can be sure by using git checkout main inputs/values/dockers.json

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed. Thank you!

@kirtanav98 kirtanav98 force-pushed the kv_splitvariants_ins branch from d009188 to 7a4d796 Compare July 2, 2024 11:55
@kirtanav98 kirtanav98 force-pushed the kv_splitvariants_ins branch from 7a4d796 to 6cd0a63 Compare July 2, 2024 16:32
@kirtanav98 kirtanav98 marked this pull request as ready for review July 2, 2024 17:02
@kirtanav98 kirtanav98 merged commit d953026 into main Jul 2, 2024
8 checks passed
@kirtanav98 kirtanav98 deleted the kv_splitvariants_ins branch July 2, 2024 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants