Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add subset ped step at start of CleanVcf #250

Merged
merged 1 commit into from
Dec 21, 2021

Conversation

epiercehoffman
Copy link
Collaborator

@epiercehoffman epiercehoffman commented Oct 29, 2021

Updates

Add step to subset ped file to only sample IDs that are in the input VCF at the start of CleanVcf. This is a patch to avoid a bug that can occur in CleanVcf1_20 when there are extra samples in the ped file (that are not in the VCF) whose sex is listed as a value other than 1 or 2 (under some circumstances). This subset step will become unnecessary and should be removed once the rewrite of CleanVcf, which uses the sample IDs in the VCF as a starting point, is merged.

Testing

  • Validated all WDLs and JSONs with womtool
  • Successfully ran MakeCohortVcf.wdl with test_large dataset with extra sex=0 samples in the PED file.
    • Caveat: based on the outputs of CleanVcf1_5, CleanVcf1_6 and subsequently CleanVcf1_20 were skipped. Please advise if further testing is recommended. It has already been confirmed that manually subsetting the PED file to remove extra samples solved the problem for the user that encountered it.

Copy link
Collaborator

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. Let's let the user test the change after you cut another Terra release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants