Permutect dataset engine outputs contig and read group indices, not names #8860

davidbenjamin · 2024-06-04T05:00:43Z

This is important for Infogain and other potential hybrid data because it will allow Permutect to separately normalize within each read group and otherwise keep track of different read groups downstream.

It also makes the data structures in Permutect much more convenient because numeric data is easier to write to a memory map.

meganshand

One quick question about if sequence_dictionary can be null, but otherwise looks good 👍

meganshand · 2024-06-04T13:31:09Z

src/main/java/org/broadinstitute/hellbender/tools/walkers/mutect/Mutect3DatasetEngine.java

@@ -106,7 +128,7 @@ public void addData(final ReferenceContext ref, final VariantContext vc, Optiona
                        final M2ArgumentCollection.Mutect3DatasetMode mutect3DatasetMode) {
        final String refBases = ReferenceBases.annotate(ref, vc);
        final String refAllele = vc.getReference().getBaseString();
-        final String contig = vc.getContig();
+        final int contigIndex = sequenceDictionary.getSequenceIndex(vc.getContig());


What happens if the sequence dictionary is null? Is that checked for somewhere? Or does GATK not allow input BAMs with no sequence dictionary?

The sequence dictionary ultimately comes from Mutect2::getBestAvailableSequenceDictionary(), so it's not going to be null even if the BAM doesn't have one. Would you like me to put in some Utils.nonNull checks somewhere? (for future-proofing and/or peace of mind)

Nope, it sounds like getBestAvailableSequenceDictionary() is already handling that check, so that should be fine.

…names

davidbenjamin requested a review from meganshand June 4, 2024 05:00

davidbenjamin assigned meganshand Jun 4, 2024

meganshand approved these changes Jun 4, 2024

View reviewed changes

davidbenjamin added 2 commits June 4, 2024 09:50

passing around header and sequence dictionary

2d6225f

Permutect dataset sengine outputs contig and read group indices, not …

0f3f51a

…names

davidbenjamin force-pushed the db_permutect_tensor_read_groups branch from 416e0d0 to 0f3f51a Compare June 4, 2024 14:04

davidbenjamin merged commit 2a420e4 into master Jun 4, 2024
21 checks passed

davidbenjamin deleted the db_permutect_tensor_read_groups branch June 4, 2024 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Permutect dataset engine outputs contig and read group indices, not names #8860

Permutect dataset engine outputs contig and read group indices, not names #8860

davidbenjamin commented Jun 4, 2024

meganshand left a comment

meganshand Jun 4, 2024

davidbenjamin Jun 4, 2024 •

edited

Loading

meganshand Jun 4, 2024

Permutect dataset engine outputs contig and read group indices, not names #8860

Permutect dataset engine outputs contig and read group indices, not names #8860

Conversation

davidbenjamin commented Jun 4, 2024

meganshand left a comment

Choose a reason for hiding this comment

meganshand Jun 4, 2024

Choose a reason for hiding this comment

davidbenjamin Jun 4, 2024 • edited Loading

Choose a reason for hiding this comment

meganshand Jun 4, 2024

Choose a reason for hiding this comment

davidbenjamin Jun 4, 2024 •

edited

Loading