Assembly Tools

Snakemake workflows used to assemble bacterial isolates.

Workflows were used to assemble five historical Bacillus anthracis isolates soon to be published in Microbiology Resource Annoucements.

The Bacillus anthracis assemblies have been deposited in DDBJ/ENA/GenBank under BioSample accession numbers SAMN12620928, SAMN12620929, SAMN12620930, SAMN12620931, and SAMN12620932. The raw Illumina paired-end sequencing reads have been deposited in the Sequence Read Archive under accession numbers SRR10019497, SRR10019498, SRR10019499, SRR10019500, and SRR10019501.

Installation

Read preprocessing workflow installation

Install Anaconda

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Download asm_tools

git clone git://github.com/bioforensics/asm_tools

OR

Download a Release

Setup python environment and use conda to install required packages (mash, fastp, etc).

   cd asm_tools/preprocess
   conda create -f preprocess_env.yml
   conda activate bmap_preprocess

(Optional) Download databases for "mash screen" to check for contaminants.
Mash Sketch databases for RefSeq release 88:

RefSeq88n.msh.gz: Genomes (k=21, s=1000), 1.2Gb uncompressed
RefSeq88p.msh.gz: Proteomes (k=9, s=1000), 1.1Gb uncompressed

Edit preprocess/config.yml with path to mash database

mashdb: path/to/mashdb

Run the read preprocessing workflow

path/to/asm_tools/preprocess/bmap_preprocess -r1 test/seq/test_R1.fastq.gz -r2 test/seq/test_R2.fastq.gz -s sample_name

Singularity Container installation

singularity pull bmap_preprocess.sif library://dsommer/default/bmap/bmap_preprocess singularity exec bmap_preprocess.sif -r1 test/seq/test_R1.fastq.gz -r2 test/seq/test_R2.fastq.gz -s test1

Preprocessing Paired-End Reads

Snakemake DAG

Workflow outline

The preprocessing.smk Snakemake workflow prepares Illumina reads to be assembled.

Run fastp to trim adapter sequence, low quality bases, and very short reads. By default, bases below Q20 at ends of reads will be trimmed. Any reads below length 75 and/or containing Ns will be removed.
Run "mash screen" against RefSeq to check for contaminents.
Estimate genome size by building a k-mer profile on the reads.
Randomly downsample reads to 150× coverage of the estimated genome size using sample-reads program.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
annotate		annotate
assembly		assembly
preprocess		preprocess
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Untitled Diagram.drawio		Untitled Diagram.drawio

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assembly Tools

Installation

Read preprocessing workflow installation

Singularity Container installation

Preprocessing Paired-End Reads

Snakemake DAG

Workflow outline

About

Releases 1

Packages

Contributors 2

Languages

License

bioforensics/asm_tools

Folders and files

Latest commit

History

Repository files navigation

Assembly Tools

Installation

Read preprocessing workflow installation

Singularity Container installation

Preprocessing Paired-End Reads

Snakemake DAG

Workflow outline

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages