YEAT, Your Everday Assembly Tool, is an update to asm_tools
. It uses a Snakemake workflow to preprocess paired-end, pacbio, and nanopore fastq reads and assemble them with various assembly algorithms.
git clone https://github.com/bioforensics/yeat.git
cd yeat
conda env create -f environment.yml
conda create -n yeat-velvet -c bioconda "perl-velvetoptimiser>=2.2" -y
conda activate yeat
pip install .
In order to run Bandage
in YEAT's workflow, users will need to manually download the pre-built binaries files here or build it from the source code here. Instructions for building Bandage from the source code can be found here.
Once the binary file has been obtained, users will need to update their environment's $PATH
to include the path to the directory containing the binary file.
Example for Ubuntu:
export PATH=~/projects/Bandage:$PATH
Example for MacOS:
export PATH=~/projects/Bandage/Bandage.app/Contents/MacOS:$PATH
In order to run the pre-built binary files successfully, ensure that the binary file is kept in the same directory with all of the other files that came with it.
To enable metaMDBG
into YEAT's workflow, users will need to create a conda environment, install, and build from the source. Once created, YEAT will activate the environment whenever metaMDBG
is called.
git clone https://github.com/GaetanBenoitDev/metaMDBG.git
cd metaMDBG
conda env create -f conda_env.yml
conda activate metaMDBG
conda env config vars set CPATH=${CONDA_PREFIX}/include:${CPATH}
conda deactivate
# Activate metaMDBG environment
conda activate metaMDBG
# Compile the software
mkdir build
cd build
cmake ..
make -j 3
After successful installation, an executable named metaMDBG will appear in ./build/bin.
ln -s ~/projects/metaMDBG/build/bin/metaMDBG /home/danejo/anaconda3/envs/metaMDBG/bin/metaMDBG
Before running the test suite, download the test data by calling the following make commands.
make hifidata
make nanodata
make metadata
To run YEAT with DRMAA, install DRMAA python bindings, set up environment variables, and append grid flags to the YEAT command. (Note: This has only been tested with SGE.)
# Install DRMAA python bindings
conda install drmaa
# Set environment variables
export DRMAA_LIBRARY_PATH=/usr/lib/libdrmaa.so.1.0
export SGE_ROOT=/path/to/qsub/bin
export SGE_CELL=default
To run YEAT with SLURM, set up environment variables and append grid flags to the YEAT command.
# Set environment variables
export NFSTMP=$HOME/gridtest
export LD_LIBRARY_PATH=/lib
grid configuration:
--grid [GRID] process input in batches using parallel processing on a grid. By default, if `--grid` is invoked with no following arguments, DRMAA
will be used to configure jobs on the grid. However, if the scheduler being used is SLURM, users must provide `slurm` as a following
argument to `--grid`
--grid-limit N limit on the number of concurrent jobs to submit to the grid scheduler; by default, N=1024
--grid-args A additional arguments passed to the scheduler to configure grid execution; " -V " is passed by default, or " -V -pe threads <T> "
("sbatch -c <T> " if using SLURM) if --threads is set; this can be used for example to configure grid queue or priority, e.g., " -q
largemem -p -1000 " ("sbatch -p largemem --priority -1000 "); note that when overriding the defaults, the user must explicitly add
the " -V " ("sbatch") and threads configuration if those are still desired
$ yeat --outdir {path} {config}
To run a simple paired-end assembly, use the just-yeat-it
command. Otherwise, use the normal yeat
command for more complicated assembly instructions.
$ just-yeat-it --outdir {path} {read1} {read2}
Auto-populate config file with a directory of paired-end reads.
$ yeat-auto short_reads --seq-path data/
Auto-populate config file with a list of paired-end read files.
$ yeat-auto short_reads --files short_reads_1.fastq.gz short_reads_2.fastq.gz
Readtype | Algorithms |
---|---|
paired | spades, megahit, unicycler, penguin, velvet |
single | spades, megahit, unicycler, penguin, velvet |
pacbio-raw | flye, canu, unicycler |
pacbio-corr | flye, canu, unicycler |
pacbio-hifi | flye, canu, hifiasm, hifiasm_meta, unicycler, metamdbg* |
nano-raw | flye, canu, unicycler |
nano-corr | flye, canu, unicycler |
nano-hq | flye, canu, unicycler |
* - Available on Linux OS only
To enable hybrid assembly, add the following to the configuration file.
- Add two read types to a sample:
- Paired-end reads
- Any type of long read
- Set the Unicycler assembly
"mode"
to"hybrid"
.
{
"samples": {
"Shigella_sonnei_53G": {
"paired": [
["yeat/tests/data/short_reads_1.fastq.gz", "yeat/tests/data/short_reads_2.fastq.gz"]
],
"pacbio-corr": [
"yeat/tests/data/long_reads_high_depth.fastq.gz"
],
"downsample": 0,
"genome_size": 0,
"coverage_depth": 150
}
},
"assemblies": {
"unicycler-hybrid": {
"algorithm": "unicycler",
"extra_args": "",
"samples": [
"Shigella_sonnei_53G"
],
"mode": "hybrid"
}
}
}
{
"samples": {
"sample1": {
"paired": [
[
"yeat/tests/data/short_reads_1.fastq.gz",
"yeat/tests/data/short_reads_2.fastq.gz"
]
],
"downsample": 0,
"genome_size": 0,
"coverage_depth": 150
},
"sample2": {
"paired": [
[
"yeat/tests/data/Animal_289_R1.fq.gz",
"yeat/tests/data/Animal_289_R2.fq.gz"
]
],
"downsample": 0,
"genome_size": 0,
"coverage_depth": 150
},
"sample3": {
"pacbio-hifi": [
"yeat/tests/data/ecoli.fastq.gz"
],
"downsample": 0,
"genome_size": 0,
"coverage_depth": 150
},
"sample4": {
"nano-hq": [
"yeat/tests/data/ecolk12mg1655_R10_3_guppy_345_HAC.fastq.gz"
],
"downsample": 0,
"genome_size": 0,
"coverage_depth": 150
}
},
"assemblies": {
"spades-default": {
"algorithm": "spades",
"extra_args": "",
"samples": [
"sample1",
"sample2"
],
"mode": "paired"
},
"hicanu": {
"algorithm": "canu",
"extra_args": "genomeSize=4.8m",
"samples": [
"sample3"
],
"mode": "pacbio"
},
"flye_ONT": {
"algorithm": "flye",
"extra_args": "",
"samples": [
"sample4"
],
"mode": "oxford"
}
}
}