Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #7159 Create tool for producing genomic regions (as a BED file) #8942

Merged
merged 49 commits into from
Aug 29, 2024
Merged
Changes from 1 commit
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
4436f85
Initial commit and basic code to read gtf
sanashah007 Jul 10, 2024
f4a84be
add: code to write to bed & integration test
sanashah007 Jul 12, 2024
8139a96
fix: make getAllFeatures public and use the nesting of features to ge…
sanashah007 Jul 15, 2024
c35c829
add: filtering transcripts by basic tag
sanashah007 Jul 17, 2024
0366e35
add: sorts by contig and start (need to fix - sorting lexicographically)
sanashah007 Jul 18, 2024
d2323ff
fix: now sorts by contig then start & output is correct
sanashah007 Jul 22, 2024
9b6e27a
fix: make dictionary an arg
sanashah007 Jul 22, 2024
bd5a019
add: comments + simplified CompareGtfInfo
sanashah007 Jul 22, 2024
0ef5498
refactor: apply method
sanashah007 Jul 24, 2024
cdbe336
refactor: onTraversalSuccess and writeToBed
sanashah007 Jul 24, 2024
924216b
add: more tests
sanashah007 Jul 25, 2024
2d80fa0
fix: test files in correct dir pt1. (files are too large)
sanashah007 Jul 25, 2024
db3a24e
fix: test files in correct dir pt2.
sanashah007 Jul 25, 2024
7dc019d
add: compareFiles and ground truth bed files
sanashah007 Jul 30, 2024
4b35136
fix: runGtfToBed assert
sanashah007 Jul 30, 2024
799fee5
add: comments to GtfToBed
sanashah007 Jul 30, 2024
f8be0b2
fix: error handling for different versions of gtf and dictionary
sanashah007 Jul 31, 2024
3d2570a
fix: edited some bad conventions
sanashah007 Jul 31, 2024
1dec98b
fix: remove spaces from input file fullName
sanashah007 Aug 1, 2024
3d976f9
add: gtf file with MYT1L and MAPK1
sanashah007 Aug 1, 2024
22a99af
add: many transcripts unit test and refactoring
sanashah007 Aug 2, 2024
e1835c8
add: tiebreaker sorting by id
sanashah007 Aug 5, 2024
2aae3ae
add: make sort by basic optional
sanashah007 Aug 5, 2024
87e10bc
add: html doc comment
sanashah007 Aug 5, 2024
32bb1dd
fix: dictionary arg
sanashah007 Aug 5, 2024
fecbec2
fix: add "Gencode" to description
sanashah007 Aug 7, 2024
f0fc352
add: sample mouse gencode testing
sanashah007 Aug 7, 2024
161f040
fix: Remove arg shortnames
sanashah007 Aug 9, 2024
705cea9
fix: rename and move CompareGtfInfo
sanashah007 Aug 9, 2024
920ed22
fix: kebab-case args
sanashah007 Aug 9, 2024
33dd8ea
fix: update html doc
sanashah007 Aug 9, 2024
e7ea45f
fix: use IntegrationTestSpec.assertEqualTextFiles()
sanashah007 Aug 9, 2024
df07f5b
fix: remove unnecessary test of pik3ca
sanashah007 Aug 9, 2024
865ffd4
fix: remove set functions in GtfInfo
sanashah007 Aug 9, 2024
9080b18
fix: style of comparator
sanashah007 Aug 9, 2024
8a6dcf6
fix: style of comparator
sanashah007 Aug 9, 2024
829b8f3
fix: use Files.newOutputStream() to write and logger for errors
sanashah007 Aug 13, 2024
6c8f3dc
fix: use getBestAvailableSequenceDictionary()
sanashah007 Aug 13, 2024
c6dade5
fix: use dataProvider for integration tests
sanashah007 Aug 13, 2024
050dc49
fix: better encapsulation
sanashah007 Aug 13, 2024
ebaf095
fix: move mapk1.gtf to large dir
sanashah007 Aug 22, 2024
bc72121
fix: arg names
sanashah007 Aug 22, 2024
0ee3b76
fix: rename reference dict.
sanashah007 Aug 22, 2024
3eee529
fix: sequence-dictionary arg javadoc
sanashah007 Aug 22, 2024
7a7a7d0
add: javadoc to GtfInfo
sanashah007 Aug 22, 2024
4c573e0
add: dictionary exception and corresponding test
sanashah007 Aug 26, 2024
d30ed84
add: test with fasta file as reference arg
sanashah007 Aug 26, 2024
15e308c
add: javadoc for fasta file
sanashah007 Aug 27, 2024
167d5fd
fix: javadoc and onTraversalStart exception
sanashah007 Aug 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix: update html doc
sanashah007 committed Aug 9, 2024
commit 33dd8ea1a0b24ffc374ac8e4a0099b2fa8d4f9a2
Original file line number Diff line number Diff line change
@@ -39,35 +39,32 @@
*
* <pre>
* java -jar GtfToBed.jar \
* -G input.gtf \
* -SD dictionary.dict \
* -T False \
* -B False \
* -O output.bed \
* -gtf-path input.gtf \
* -gtf-sequence-dictionary dictionary.dict \
= * -output output.bed \
* </pre>
*
* <h4>(ii) Convert GTF to BED with transcript level data</h4>
* <p>This mode extracts and converts transcript data from the input GTF file to BED format:</p>
*
* <pre>
* java -jar GtfToBed.jar \
* -G input.gtf \
* -SD dictionary.dict \
* -T True \
* -B False \
* -O output.bed \
* -gtf-path input.gtf \
* -gtf-sequence-dictionary dictionary.dict \
* -sort-by-transcript \
* -output output.bed \
* </pre>
*
* <h4>(iii) Convert GTF to BED with transcript level data filtering for only those with the basic tag</h4>
* * <p>This mode extracts and converts basic transcript data from the input GTF file to BED format:</p>
* *
* * <pre>
* java -jar GtfToBed.jar \
* -G input.gtf \
* -SD dictionary.dict \
* -T True \
* -B True \
* -O output.bed \
* -gtf-path input.gtf \
* -gtf-sequence-dictionary dictionary.dict \
* -sort-by-transcript \
* -sort-by-basic \
* -output output.bed \
* * </pre>
*/

@@ -88,7 +85,7 @@ public class GtfToBed extends FeatureWalker<GencodeGtfFeature> {
@Argument(fullName = INPUT_LONG_NAME, doc = "Path to Gencode GTF file")
public GATKPath inputFile;

@Argument(fullName = StandardArgumentDefinitions.OUTPUT_LONG_NAME, doc = "Output BED file")
@Argument(fullName = StandardArgumentDefinitions.OUTPUT_LONG_NAME , doc = "Output BED file")
public GATKPath outputFile;

@Argument(fullName = SORT_BY_TRANSCRIPT_LONG_NAME, doc = "Make each row of BED file sorted by transcript", optional = true)