Skip to content

Challenge to efficiently BLAST files (FASTA file & quality scores) and report key metrics about FASTA back to user.

License

Notifications You must be signed in to change notification settings

YogiOnBioinformatics/BLAST-Biopython-Challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BLAST/Biopython Challenge

NCBI Blast Biopython

Challenge

The pipelines for different sequencing platforms use BLAST extensively to query sequences against a given database. One of the steps, in an earlier version of a pipeline, heavily relied on BLAST to eliminate primer and adaptor sequences from the reads to generate clean and manageable datasets. You are provided with FASTA and Quality files from a dataset that were generated using the 454 Sequencing Platform. You are required to blast the dataset against the given primer and adaptor sequences and generate output in m8 format.

Inputs

  1. FASTA file
  2. Quality file
  3. Primer Sequence
  4. Adaptor Sequence

Outputs

  1. Total number of reads in the dataset
  2. Total number of reads greater than 100 bp
  3. Total number of reads with average quality scores greater than 20

In addition, the program generates the following file:

  1. Blast output file in m8 format.

Usage

Fetch container from Docker Hub and inspect contents:
docker run --rm -it yraghav97/blast-biopython-challenge:1.0

NOTE: If you have BLAST command line installed, you can follow the steps in the Dockerfile.

Contact Information

interests

Yogindra Raghav
[email protected]

About

Challenge to efficiently BLAST files (FASTA file & quality scores) and report key metrics about FASTA back to user.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published