Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SV split-read strand validation and clustering #8378

Merged
merged 3 commits into from
Jun 26, 2023

Conversation

mwalker174
Copy link
Contributor

Adds some flexibility to the allowed split-read strand annotations on SV records:

  • Allow INS -+ strands
  • Allow INV null strands
  • When clustering, only require that strands match for INV/BND records

The changes affecting INS variants are commensurate with changing going into gatk-sv (see broadinstitute/gatk-sv#553).

Added tests for the new functionality and improved some unit test coverage for a few related cases.

Copy link
Contributor

@ldgauthier ldgauthier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor possible fix and a few questions.

@@ -386,8 +386,8 @@ public static SVCallRecord create(final VariantContext variant, boolean keepVari
} else {
contigB = contigA;
// Force reset of END coordinate
if (type.equals(StructuralVariantType.INS)) {
positionB = positionA + 1;
if (type == GATKSVVCFConstants.StructuralVariantAnnotationType.INS) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the .equals for string matching -- any reason you changed it?
Also why the END coordinate change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually causing a bug where .equals(StructuralVariantType.INS) is always false because the type is different. Somehow I missed this when switching to use GATKSVVCFConstants.StructuralVariantAnnotationType in an earlier PR. Using == is safer since it won't compile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to END is a correction. In the spec, POS=END for insertions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem a little confusing to have == here and .equals in other parts of the code like on lines 337 and 349 above. Is there a reason for that? Should we change it everywhere to be consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwhelan Agreed we should be consistent. I think == is a slightly better choice for the type checking so I've changed all the comparisons I could find to that.

@@ -27,8 +27,6 @@ public class SVCallRecordUtilsUnitTest {
private static final Map<String, Object> TEST_ATTRIBUTES = Collections.singletonMap("TEST_KEY", "TEST_VAL");
private static final Map<String, Object> TEST_ATTRIBUTES_CPX = Lists.newArrayList(
new AbstractMap.SimpleImmutableEntry<String, Object>("TEST_KEY", "TEST_VAL"),
new AbstractMap.SimpleImmutableEntry<String, Object>(GATKSVVCFConstants.END2_ATTRIBUTE, 2000),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the complex variant format changing too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this was just a redundant entry in that map. The helper function in SVUtils sets this already and I wanted to test cases where END2 wasn't present.

@@ -457,54 +474,99 @@ public Object[][] testCreateData() {
ALLELES_DEL, Lists.newArrayList(GENOTYPE_DEL_1, GENOTYPE_DEL_2), 1000, "+-",
GATKSVVCFConstants.StructuralVariantAnnotationType.DEL, Collections.singletonList(GATKSVVCFConstants.DEPTH_ALGORITHM),
null, null,
TEST_ATTRIBUTES),
TEST_ATTRIBUTES, -10.),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is QUAL = 100?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we actually have our variant qualities scaled 0-999 still. In the future we will probably scale them down (we have already done so with GQs). I'm actually going to up this to 90. to ensure there isn't a problem with that.

Recover test files
Copy link
Member

@cwhelan cwhelan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine, just a couple of small comments.

@@ -212,10 +212,6 @@ private static Pair<Boolean, Boolean> inferStrands(final GATKSVVCFConstants.Stru
}
return Pair.of(Boolean.FALSE, Boolean.TRUE);
} else {
if (type.equals(GATKSVVCFConstants.StructuralVariantAnnotationType.INV)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The javadoc comment for the method seems to be out of date now, can you update it?

@@ -386,8 +386,8 @@ public static SVCallRecord create(final VariantContext variant, boolean keepVari
} else {
contigB = contigA;
// Force reset of END coordinate
if (type.equals(StructuralVariantType.INS)) {
positionB = positionA + 1;
if (type == GATKSVVCFConstants.StructuralVariantAnnotationType.INS) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem a little confusing to have == here and .equals in other parts of the code like on lines 337 and 349 above. Is there a reason for that? Should we change it everywhere to be consistent?

@mwalker174 mwalker174 merged commit aea2bb4 into master Jun 26, 2023
@mwalker174 mwalker174 deleted the mw_sv_ins_strands branch June 26, 2023 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants