Mycobacterium tuberculosis genomic analysis from Nanopore sequencing data
tbpore
is a tool with two main goals.
First is to process Nanopore Mycobacterium tuberculosis sequencing data to describe
variants with respect to the
canonical TB strain H37Rv and predict antibiotic resistance (command tbpore process
).
Variant description is done by decontaminating reads, calling variants with
bcftools and filtering variants.
Antibiotic resistance is predicted
with mykrobe.
Second, tbpore
can be used to cluster TB samples based on their genotyping and a given
distance threshold (command
tbpore cluster
).
TBpore is a slimmed-down version of the full pipeline used in our paper 👇
Hall, M. B. et al. Evaluation of Nanopore sequencing for Mycobacterium tuberculosis drug susceptibility testing and outbreak investigation: a genomic analysis. The Lancet Microbe 0, (2022) doi: 10.1016/S2666-5247(22)00301-9.
Prerequisite: conda
(and bioconda channel correctly set up)
$ conda install tbpore
The python components of tbpore
are availble to install through PyPI.
pip install tbpore
However, you will need to install the following dependencies, which cannot be installed through PyPI.
rasusa
version 2.xpsdm
version 0.1.xsamtools
version 1.13bcftools
version 1.13mykrobe
version 0.12.xminimap2
version 2.22seqkit
version 2.xnanoq
version 0.9.x
We make no guarentees about the performance of tbpore
with versions other than those
specified above. In particular, the bcftools
version is very important. The latest
versions of the other dependencies can likely be used.
Docker images are provided through biocontainers.
Prerequisite: singularity
$ URI="docker://quay.io/biocontainers/tbpore:<tag>"
$ singularity exec "$URI" tbpore --help
see here for valid values for <tag>
.
Prerequisite: Docker
$ docker pull quay.io/biocontainers/tbpore:<tag>
$ docker run quay.io/biocontainers/tbpore:<tag> tbpore --help
see here for valid values for <tag>
.
After installing TBpore, you will need to download the decontamination database index.
$ tbpore download
By default, this will download the index
to ${HOME}/.tbpore/decontamination_db/remove_contam.map-ont.mmi
, as this is the
default location tbpore process
will search for.
If you prefer to download the index to another location, this can be done with
$ tbpore download -o other/location/db.mmi
Keep in mind, if you specify a non-default location, you will need to use the --db
option when running tbpore process
.
Benchmarked on 151 TB ONT samples with 1 thread:
- Runtime:
2103
s avg,4048
s max (s = seconds); - RAM:
12.4
GB avg,13.1
GB max (GB = Gigabytes);
Clustering 151 TB ONT samples:
- Runtime:
286
s; - RAM:
<1
GB;
Usage: tbpore [OPTIONS] COMMAND [ARGS]...
Options:
-h, --help Show this message and exit.
-V, --version Show the version and exit.
-v, --verbose Turns on debug-level logger. Option is mutually exclusive
with quiet.
-q, --quiet Turns off all logging except errors. Option is mutually
exclusive with verbose.
Commands:
cluster Cluster consensus sequences
download Download and validate the decontamination database
process Single-sample TB genomic analysis from Nanopore sequencing data
Usage: tbpore process [OPTIONS] [INPUTS]...
Single-sample TB genomic analysis from Nanopore sequencing data
INPUTS: Fastq file(s) and/or a directory containing fastq files. All files
will be joined into a single fastq file, so ensure they're all part of the
same sample/isolate.
Options:
-h, --help Show this message and exit.
-r, --recursive Recursively search INPUTS for fastq files
-S, --name TEXT Name of the sample. By default, will use the
first INPUT file with fastq extensions
removed
-A, --report_all_mykrobe_calls Report all mykrobe calls (turn on flag -A,
--report_all_calls when calling mykrobe)
--db PATH Path to the decontaminaton database
[default: /home/mihall/.tbpore/decontaminati
on_db/remove_contam.map-ont.mmi]
-m, --metadata PATH Path to the decontaminaton database metadata
file [default: /data/scratch/projects/punim
1703/tmp/outliers/tbpore/data/decontaminatio
n_db/remove_contam.tsv.gz]
-c, --coverage INTEGER Depth of coverage to subsample to. Use 0 to
disable
-o, --outdir DIRECTORY Directory to place output files [default:
.]
--tmp DIRECTORY Specify where to write all (tbpore)
temporary files. [default: <outdir>/.tbpore]
-t, --threads INTEGER Number of threads to use in multithreaded
tools [default: 1]
-d, --cleanup / -D, --no-cleanup
Remove all temporary files on *successful*
completion [default: no-cleanup]
--cache DIRECTORY Path to use for the cache [default:
/home/mihall/.cache]
Usage: tbpore cluster [OPTIONS] [INPUTS]...
Cluster consensus sequences
Preferably input consensus sequences previously generated with tbpore
process.
INPUTS: Two or more consensus fasta sequences. Use glob patterns to input
several easily (e.g. output/sample_*/*.consensus.fa).
Options:
-h, --help Show this message and exit.
-T, --threshold INTEGER Clustering threshold [default: 6]
-o, --outdir DIRECTORY Directory to place output files [default:
.]
--tmp DIRECTORY Specify where to write all (tbpore)
temporary files. [default: <outdir>/.tbpore]
-t, --threads INTEGER Number of threads to use in multithreaded
tools [default: 1]
-d, --cleanup / -D, --no-cleanup
Remove all temporary files on *successful*
completion [default: no-cleanup]
--cache DIRECTORY Path to use for the cache [default:
/Users/michaelhall/.cache]
Usage: tbpore download [OPTIONS]
Download and validate the decontamination database
Options:
-h, --help Show this message and exit.
-o, --output PATH Download database to a specified filepath [default: ${HOME}/
.tbpore/decontamination_db/remove_contam.map-ont.mmi]
-f, --force Force overwrite if the database already exists