NECAT is an error correction and de-novo assembly tool for Nanopore long noisy reads.
If you are interested in calling Structural Variants from Nanopore reads, you are welcome to have a try our necatsv.
We have sucessfully tested NECAT
on
- Ubuntu 16.04 (GCC 5.4.0, Perl v5.22.1)
- CentOS 7.3.1611 (GCC 4.8.5, Perl v5.26.2)
If you meet problems in running NECAT
like
Syntax error at NECAT/Linuax-amd64/bin/Plgd/Project.pm line 46, near "${cfg{"
Please update your perl
to a newer version (such as v5.26).
There are two ways to install NECAT
.
$ wget https://github.com/xiaochuanle/NECAT/releases/download/v0.0.1_update20200803/necat_20200803_Linux-amd64.tar.gz
$ tar xzvf necat_20200803_Linux-amd64.tar.gz
$ cd NECAT/Linux-amd64/bin
$ export PATH=$PATH:$(pwd)
$ git clone https://github.com/xiaochuanle/NECAT.git
$ cd NECAT/src/
$ make
$ cd ../Linux-amd64/bin
$ export PATH=$PATH:$(pwd)
After installation, all the executable files can be found in NECAT/Linux-amd64/bin
. The command line
export PATH=$PATH:$(pwd)
above is used for adding NECAT/Linux-amd64/bin
to the system PATH
.
Before running NECAT
please do not forget to add NECAT/Linux-amd64/bin
to the system PATH
.
Create a config file template using the following command:
$ necat.pl config ecoli_config.txt
The template looks like
PROJECT=
ONT_READ_LIST=
GENOME_SIZE=
THREADS=4
MIN_READ_LENGTH=3000
PREP_OUTPUT_COVERAGE=40
OVLP_FAST_OPTIONS=-n 500 -z 20 -b 2000 -e 0.5 -j 0 -u 1 -a 1000
OVLP_SENSITIVE_OPTIONS=-n 500 -z 10 -e 0.5 -j 0 -u 1 -a 1000
CNS_FAST_OPTIONS=-a 2000 -x 4 -y 12 -l 1000 -e 0.5 -p 0.8 -u 0
CNS_SENSITIVE_OPTIONS=-a 2000 -x 4 -y 12 -l 1000 -e 0.5 -p 0.8 -u 0
TRIM_OVLP_OPTIONS=-n 100 -z 10 -b 2000 -e 0.5 -j 1 -u 1 -a 400
ASM_OVLP_OPTIONS=-n 100 -z 10 -b 2000 -e 0.5 -j 1 -u 0 -a 400
NUM_ITER=2
CNS_OUTPUT_COVERAGE=30
CLEANUP=1
USE_GRID=false
GRID_NODE=0
GRID_OPTIONS=
SMALL_MEMORY=0
FSA_OL_FILTER_OPTIONS=
FSA_ASSEMBLE_OPTIONS=
FSA_CTG_BRIDGE_OPTIONS=
POLISH_CONTIGS=true
Filling and modifying the relative information, we have
PROJECT=ecoli
ONT_READ_LIST=read_list.txt
GENOME_SIZE=4600000
THREADS=20
MIN_READ_LENGTH=3000
......
read_list.txt
in the second line above contains the full paths of all read files. It looks like
$ cat read_list.txt
/share/home/chuanlex/xiaochuanle/data/testdata/tomato/20161027_Spenn_001_001_all.fastq
/share/home/chuanlex/xiaochuanle/data/testdata/tomato/20161101_Spenn_002_002_all.fastq
/share/home/chuanlex/xiaochuanle/data/testdata/tomato/20161103_Spenn_003_003_all.fastq
/share/home/chuanlex/xiaochuanle/data/testdata/tomato/20161108_Spenn_004_004_all.fastq
/share/home/chuanlex/xiaochuanle/data/testdata/tomato/20161108_Spenn_004_005_all.fastq
Please note that files in read_list.txt
need not be the same format. Each file can independently be either FASTA
or FASTQ
, and can further be compressed in GNU Zip (gzip) format.
Correct the raw noisy reads using the following command:
$ necat.pl correct ecoli_config.txt
The pipeline only corrects longest 40X (PREP_OUTPUT_COVERAGE
) raw reads. The corrected reads are in the files ./ecoli/1-consensus/cns_iter${NUM_ITER}/cns.fasta
.
The longest 30X (CNS_OUTPUT_COVERAGE
) corrected reads are extracted for assembly, which are in the file ./ecoli/1-consensus/cns_final.fasta
After correcting the raw reads, we assemble the contigs using the following command. If the correcting-step is not done, the command automatically runs the correcting-step first.
$ necat.pl assemble ecoli_config.txt
The assembled contigs are in the file ./ecoli/4-fsa/contigs.fasta
.
After assembling the contigs, we run the bridging-step using the following command. The command checks and runs the preceding steps first.
$ necat.pl bridge ecoli_config.txt
The bridged contigs are in the file ./ecoli/6-bridge_contigs/bridged_contigs.fasta
.
If POLISH_CONTIGS
is set, the pipeline uses the corrected reads to polish the bridged contigs. The polished contigs are in the file ./ecoli/6-bridge_contigs/polished_contigs.fasta
On PBS and SGE systems, users may plan to run NECAT
with multiple computation nodes. This is done by setting the config file (Step 1 of Quick Start) like
USE_GRID=true
GRID_NODE=4
In the above example, 4
computation nodes will be used and each computation node will run with THREADS
CPU threads.
Chen Y, Nie F, Xie S Q, et al. Efficient assembly of nanopore reads via highly accurate and intact error correction[J]. Nature Communications, 2021, 12(1): 1-10.
- Chuan-Le Xiao, [email protected]