Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UMI-tool 1.1.5 not working with --per-gene --per-contig --gene-transcript-map #646

Open
pclavell opened this issue May 31, 2024 · 6 comments
Assignees

Comments

@pclavell
Copy link

pclavell commented May 31, 2024

Hello,
I run this code with UMI-tools 1.0.0 to deduplicate based on UMI+gene mapping (but mapping to a pantranscriptome with several transcripts/gene) and it worked:
umi_tools group
--method adjacency
--edit-distance-threshold=$EDIT_DISTANCE
--per-contig
--per-gene
--gene-transcript-map gencodev44_transcript_map.tsv
-I $QUERY
--group-out "$NAME"_percontig.tsv
--log "$NAME"_percontig.log

The output in group-out was showing in the gene column the geneID but now it only repeats the transcriptID
EDIT: I've just installed version 1.0.0 and it works using exactly the same code and inputs, so there is a problem between 1.0.0 and 1.1.5

@IanSudbery
Copy link
Member

Can you include a snippet of your gencodev44_transcript_map.tsv file?

@pclavell
Copy link
Author

pclavell commented Jun 3, 2024

It is tab separated

ENSG00000290825.1 ENST00000456328.2
ENSG00000223972.6 ENST00000450305.2
ENSG00000227232.5 ENST00000488147.1
ENSG00000278267.1 ENST00000619216.1
ENSG00000243485.5 ENST00000473358.1
ENSG00000243485.5 ENST00000469289.1
ENSG00000284332.1 ENST00000607096.1
ENSG00000237613.2 ENST00000417324.1
ENSG00000237613.2 ENST00000461467.1
ENSG00000268020.3 ENST00000606857.1
ENSG00000290826.1 ENST00000642116.1

@TomSmithCGAT
Copy link
Member

Ah, I see what's happened here. #577 fixed an issue with group but didn't cover the --gene-transcript-map use case, for which the implications of the fix were not clear to see, and we don't have tests to cover that option either so it wasn't picked up! 🤦

I'll try an issue a patch today/tomorrow.

Note to self: Add switch back to using read tag for gene id when using tx2gene map here: https://github.com/CGATOxford/UMI-tools/blame/9ce3a70a8b35ff9a066d73716680136be71cc70d/umi_tools/group.py#L289-L292. Also add a test to cover!

@TomSmithCGAT TomSmithCGAT self-assigned this Jun 10, 2024
@TomSmithCGAT
Copy link
Member

@pclavell - Could you please try installing the ts_debug_issue646 branch to check this resolves the issue. You can install with e.g pip install https://github.com/CGATOxford/UMI-tools/archive/ts_debug_issue646.zip

@IanSudbery
Copy link
Member

Any update on this?

@pclavell
Copy link
Author

pclavell commented Aug 8, 2024

I'm sorry I missed the last comment. I just ran it with version 1.0.0. This step is now buried in the middle of a snakemake pipeline full of temporary intermediate files and the inputs have been archived so testing this would mean that everything had to be recovered and rerun.
If you really need it to be tested I could try doing it in the future weeks, but I am a little bit swamped atm.
Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants