Challenge
The pipelines for different sequencing platforms use BLAST extensively to query sequences against a given database. One of the steps, in an earlier version of a pipeline, heavily relied on BLAST to eliminate primer and adaptor sequences from the reads to generate clean and manageable datasets. You are provided with FASTA and Quality files from a dataset that were generated using the 454 Sequencing Platform. You are required to blast the dataset against the given primer and adaptor sequences and generate output in m8 format.
Inputs
- FASTA file
- Quality file
- Primer Sequence
- Adaptor Sequence
Outputs
- Total number of reads in the dataset
- Total number of reads greater than 100 bp
- Total number of reads with average quality scores greater than 20
In addition, the program generates the following file:
- Blast output file in m8 format.
Usage
Fetch container from Docker Hub and inspect contents:
docker run --rm -it yraghav97/blast-biopython-challenge:1.0
NOTE: If you have BLAST command line installed, you can follow the steps in the Dockerfile
.
Contact Information
Yogindra Raghav
[email protected]