We introduce Hera-T, a fast and accurate tool for estimating gene abundances in single cell data generated by the 10X-Chromium protocol. By devising a new strategy for aligning reads to both transcriptome and genome references, Hera-T reduces both running time and memory consumption from 10 to 100 folds while giving similar results compared to CellRanger’s. Hera-T also addresses some difficult splicing alignment scenarios that CellRanger fails to address, and therefore, obtains better accuracy compared to CellRanger. Excluding the reads in those scenarios, Hera-T and CellRanger results have correlation scores > 0.99.
Hera-T is distributed under BioTuring License. See the LICENSE file for details.
- mm10: https://www.dropbox.com/s/jtonx86nyzf6tq1/cr_mm10_210.zip?dl=1
- hg19: https://www.dropbox.com/s/ibkwo3uzqjri59m/cr_hg19_120.zip?dl=1
- grch38: https://www.dropbox.com/s/2tcpvkyj58s4vly/cr_grch38_120.zip?dl=1
sh ./build.sh
Usage: ./hera-T count [options] -x <idx_name> -1 <R1> -2 <R2>
Option:
-t : Number of threads
-o : Output directory name
-p : Output file prefix
-l : Library types
0: 10X-Chromium 3' (v2) protocol
1: 10X-Chromium 3' (v3) protocol
Example: ./hera-T count -t 32 -o ./result -x index/grch37 -l 0 -1 lane_0.read_1.fq lane_1.read_1.fq -2 lane_0.read_2.fq lane_1.read_2.fq
Download link: http://cf.10xgenomics.com/samples/cell-exp/3.0.0/neuron_1k_v2/neuron_1k_v2_fastqs.tar
~ » ls -lah cr_mm10_210/*
-rw-rw-r--@ 1 bioturing staff 2.5G Nov 14 2018 cr_mm10_210/cr_mm10_210.bwt
-rw-rw-r--@ 1 bioturing staff 176M Nov 14 2018 cr_mm10_210/cr_mm10_210.fasta
-rw-rw-r--@ 1 bioturing staff 1.8G Nov 14 2018 cr_mm10_210/cr_mm10_210.hash
-rw-rw-r--@ 1 bioturing staff 862M Nov 14 2018 cr_mm10_210/cr_mm10_210.info
-rw-rw-r--@ 1 bioturing staff 356B Nov 14 2018 cr_mm10_210/cr_mm10_210.log
~ » ./hera-T count -t 32 -o tmp -x cr_mm10_210/cr_mm10_210 \
-l 0 \
-1 neuron_1k_v2_fastqs/neuron_1k_v2_S1_L001_R1_001.fastq.gz \
neuron_1k_v2_fastqs/neuron_1k_v2_S1_L002_R1_001.fastq.gz \
-2 neuron_1k_v2_fastqs/neuron_1k_v2_S1_L001_R2_001.fastq.gz \
neuron_1k_v2_fastqs/neuron_1k_v2_S1_L002_R2_001.fastq.gz
Hera-T is developed and maintained in BioTuring INC. by:
- Thang Tran [email protected]
- Thao Truong [email protected]
- Hy Vuong [email protected]
- Tan Phan [email protected]
- Son Pham [email protected]
Thang Tran, Thao Truong, Hy Vuong, Son Pham, “Hera-T: an efficient and accurate approach for quantifying gene abundances from 10X-Chromium data with high rates of non-exonic reads”, biorXiv, 2019 doi: https://doi.org/10.1101/530501
A preferred way to report any problems or ask questions about Hera-T is the issue tracker. Before posting an issue/question, consider to look through the FAQs and existing issues (opened and closed) - it is possible that your question has already been answered.
If you reporting a problem, please include the HeraT.log file and provide some details about your dataset (if possible).
In case you prefer personal communication, please send an email to [email protected].
2018-12-24 (0.1.2) (deprecated):
* Init repo
2018-12-25 (0.1.3) (deprecated):
* Add library types selection
* Write program description to matrix.mtx file
2018-12-27 (0.1.4) (deprecated):
* Fix memory leak in version 0.1.3
2018-12-28 (0.2.0) (deprecated):
* Support Chromium 3' v3 library
2019-03-20 (0.2.1) (release candidate):
* Fix random crash (change from buggy semaphore to lock)
2019-03-25 (0.2.2) (release candidate):
* Fix open all files at once