Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug with MateDistantReadFilter not being configurable #7701

Merged
merged 4 commits into from
Mar 1, 2022

Conversation

jamesemery
Copy link
Collaborator

Fixes #7696

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👎

public MateDistantReadFilter() {
}

public MateDistantReadFilter( final int minMappingQualityScore ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a mapping quality score.

@lbergelson
Copy link
Member

In all seriousness though, fix the typo and merge :)

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jamesemery jamesemery merged commit 8c5e0e8 into master Mar 1, 2022
@jamesemery jamesemery deleted the je_fixMateDistantArgument branch March 1, 2022 15:36
@RWilton
Copy link

RWilton commented Apr 19, 2022

Sorry, but this bug still isn't fixed as of v4.2.6.1. Reproduce as follows:

  --read-filter MateDistantReadFilter
  --mate-too-distant-length 1500

Instead of a run-time exception (as in v4.2.5.0), HaplotypeCaller simply produces no variant calls at all.

Expected behavior would be to exclude paired-end mappings whose TLEN exceeds the parameterized value. Perhaps there is an implementation bug, unrelated to the original problem, that contains faulty logic for doing this.

Thanks...

@jamesemery
Copy link
Collaborator Author

Hi @RWilton i'm sorry to hear that. I suspect its unrelated given how simple this PR is but its quite possible that filter has been broken since evidently nobody was able to use and adjust it for a long time. This is the logic here that the filter uses:
return read.isPaired() && !read.isUnmapped() && !read.mateIsUnmapped() && (Math.abs(read.getStart() - read.getMateStart()) >= mateTooDistantLength || !read.getContig().equals(read.getMateContig()));

That logic is correct for what the filter is doing. It should be noted that the filter does the opposite of what you expect it to (since its intended for our SV pipeline) in that it filters out all reads that are NOT having distant mates. This means if you try to run HaplotypeCaller with this setting you will be throwing away every read pair EXCEPT the distant ones which results in mostly no reads. We should perhaps rename this filter to be a little less confusing.

@RWilton
Copy link

RWilton commented Apr 19, 2022

Well, that explains that, sort of.

The code snippet you're providing looks like it ought to do what you say it does (i.e., the mates have to be paired, not unmapped, mapped to the same contig, and have a difference in their start positions that is at least mateTooDistantLength).

But there are two problems with this:

  1. This filter's behavior is unexpected wrt HaplotypeCaller. It seems to me that an inclusive filter (i.e., process only paired-end mappings whose TLEN falls within a specified range) would be more usable. That would imply a filter implementation that accepts a pair of integers, but the expected behavior would be more obvious and in line with GATK's other range-limited parameterizations (e.g., MappingQualityReadFilter comes immediately to mind).

  2. I can't tell from where I sit, but the code snippet looks correct only if getStart() and getMateStart() return a zero-based start position of each mate relative to the start of the strand to which the mate is mapped. If the code is just computing the difference between POS for the mates, the computation is incorrect for forward + reverse-complement (Illumina-style) pairs. In addition, computing TLEN requires not only that you consider the orientation of the individual mate mappings, but also that you make an arbitrary decision about how to handle soft-clipped reads.

I hate to say this, but I think this parameter needs some attention. Its potential utility with HaplotypeCaller seems evident to me (i.e., it would be good to be able to exclude outliers with unreasonable TLENs) but its implementation and frugal documentation make it unusable in practice.

@jamesemery
Copy link
Collaborator Author

@RWilton I agree that this is a confusing filter and maybe should be renamed... As I have said this is a specific utility is for SVs where they might only want to look at long range read pairs not HaplotypeCaller. In that context I would say that the nuance about TLEN vs left aligned start position is a relatively small one. I agree with you that it would be useful to have some functionality for filtering out highly distant read pairs form our short variant calling as an option. I would have to think on whether its better to refactor this filter to be more general or to make a second filter that is the reverse of this one.

@RWilton can you make an issue ticket and describe what you would want out of such a filter for HaplotypeCaller? I would put some thought into the case where the TLEN field has not been populated since it often isn't (especially since the cases where TLEN might not get computed overlaps with discordant/distant mates which are part of the target for such a filter).

@RWilton
Copy link

RWilton commented Apr 21, 2022

Since the intent and implementation of this filter are not essential to my work -- I can simply filter pairs on TLEN prior to variant calling -- perhaps it's best to just let this one be.

Thanks for the discussion and for helping me think things through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in HaplotypeCaller: parameter mate-too-distant-length is readonly
3 participants