Skip to content

HuffordLab/GenomeQC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

How to cite GenomeQC

GenomeQC: A quality assessment tool for genome assemblies and gene structure annotations Nancy Manchanda, John L. Portwood II, Margaret R. Woodhouse, Arun S. Seetharam, Carolyn J. Lawrence-Dill, Carson M. Andorf, Matthew B. Hufford

https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-020-6568-2

GenomeQC: Genome Assembly and Annotation Metrics

GenomeQC generates descriptive summaries with intuitive graphics for genome assembly and structural annotations. It also benchmarks user supplied assemblies and annotations against the publicly available reference genomes of their choice. It is optimized for small and medium sized genomes (<2.5 Gb) and has pre-computed results for several maize genomes.

There is a Dockerfile available (with the associated scripts) to run the pipeline without installing any dependencies.

Installation

Bioinformatics software dependencies

GenomeQC web application calls upon the following bioinformatics tools and database to perform computation. These tools needs to be installed and configured in the path of the working directory.

At the time of release, this application was tested with:

GenomeQC components:

GenomeQC is a collection of R and Python scripts. These R scripts need to be placed in the directory of R Shiny package.

The two main scripts necessary to run the application are ui.R and server.R.

ui.R : This script is the source of user interface definition which lays out the user interface.

server.R: This script, which can be found in the scripts folder of the GenomeQC Github repository, calls various packages and python and bash scripts for calculating different metrics.

Running GenomeQC requires a Linux server, R shiny (version 1.5.9) and Python (version 3.6). Furthermore, it requires the following packages:

R packages
tools R.utils shinyWidgets
seqinr tidyverse shinyBS
Biostrings gridExtra reshape
stringr grid cowplot
Python packages
sys traceback Bio.Blast.Applications
os subprocess iglob
Bio Statistics pandas
re Numpy plotly.offline
argparse collections plotly.graph_objs

Operating Instructions

Three modes are available:

Compare reference genomes:

This section outputs various pre-computed assembly and annotation metrics from a user-selected list of reference genomes.

Analyze your genome assembly:

This section provides the user the option to perform analysis on their genome assembly as well as benchmark their analysis with the pre-computed reference genomes.

Analyze your genome annotations:

This section provides the user the option to perform analysis on their genome annotations as well as benchmark their analysis with the pre-computed reference annotations.

See also an online version of the manual for more details: GenomeQC_userguide.pdf

Licensing

GNU GPL V3.

Acknowledgements

Funding: This work was supported by the United State Department of Agriculture (USDA).

Please send questions to: [email protected]