Skip to content

Latest commit

 

History

History
executable file
·
91 lines (76 loc) · 4.24 KB

README.md

File metadata and controls

executable file
·
91 lines (76 loc) · 4.24 KB

How compressible are genetic sequences?


This repository provides information-reproducibility on how compressible different sequences are using different data compressors.

Data compression tools


Data Compressor Repository Description
bsc-m03 v0.2.1 code article
bzip2 1.0.8 code article
DMCompress code article
GeCo2 code article
GeCo3 code article
JARVIS2 code article
JARVIS3 code under review
lzma 5.2.5 code article
MemRGC code article
MFCompress code article
NAF code article
paq8l code article

Reproducibility:

Change directory and give permitions:

cd scripts/
chmod +x *.sh
./Main.sh

Alternatively:

#
./InstallTools.sh      # install listed compressors, GTO, and AlcoR
./DownloadFASTA.sh     # downloads FASTA files
./GetCassava.sh        # gunzip cassava files
./GetAlcoRFASTA.sh     # simulates and stores 2 synthetic FASTA sequences
./FASTA2seq.sh         # cleans FASTA files and stores raw sequence files
./DownloadDNAcorpus.sh # download raw sequences from a balanced sequence corpus
./GetDSinfo.sh         # map sequences into their ids, sorted by size; view sequences info
#
./RunTestsExample.sh   # run bench
./ProcessBenchRes.sh   # sort results by BPS and time
./Plot.sh              # plot sorted results

Use case: Run Bench only for Human Chromosome Y (CY) and Escherichia Coli

#
./InstallTools.sh                                   # install listed compressors, GTO, and AlcoR
./DownloadFASTA.sh -id NC_000024.1 -id NC_000913.3  # downloads CY and Escherichia Coli FASTA files
./FASTA2seq.sh                                      # cleans FASTA files and stores raw sequence files
./GetDSinfo.sh                                      # map sequences into their ids, sorted by size; view sequences info
#
./RunTestsExample.sh                                # run bench
./ProcessBenchRes.sh                                # sort results by BPS and time
./Plot.sh                                           # plot sorted results

See Features:

The implemented features are listed in the following scripts:

./Main.sh -h          
./CleanCandDfiles.sh -h
./DownloadDNAcorpus.sh -h
./DownloadFASTA.sh -h
./FASTA2seq.sh -h
./GetAlcorFASTA.sh -h
./GetCassava.sh -h
./GetDSinfo.sh -h
./InstallTools.sh -h
./Plot.sh -h
./ProcessBenchRes.sh -h
./Run.sh -h
./RunTestsExample.sh -h