-
Notifications
You must be signed in to change notification settings - Fork 0
Home
FERMI is used to identify mutations at an extremely rare frequency. FERMI contains set of tools to analyze unique molecular identifier (UMI) tagged, amplicon captured, genomic DNA sequence data. Tools are included both for rapid identification of variants within amplicon sequencing, and for further analysis of patterns and trends within the identified variant pool.
FERMI will continue to be updated. To update if using the Manual Install Method, simply redo all the installation steps.
1. cd FERMI 2. git pull
This should typically be enough to update FERMI, but occasionally I may add a new dependency through Anaconda. If errors are encountered simply reinstall in the same way originally installed.
Most of this information can be accessed by running:
./fermi -h
FERMI must be run with a few required input flags. Below is an example of the minimum required input.
fermi -i /inputDirectory -o .outputDirectory -b 'freebayes' -y '/referenceGenome.fa'
1. The input directory should contain unzipped paired end fastq files.
2. The output directory can be any directory that can be written to with your given permissions.
3. The -b flag specifies the command to be used to run the variant caller freebayes. If you don't have a different freebayes or other aligner you would like to use 'freebayes' will use the one installed automatically during the install process.
4. No reference genome is included with this pipeline by default. All testing was done with hg19 downloaded from the UCSC Genome Browser, but other reference genomes should work just fine.
-h, --help show this help message and exit
--nfo NFO, -n NFO Info writeup about a particular run that will be output in the run directory.
--largefiles, -l Outputs all generated fastq files generated during analysis.
--avoidalign, -a Only runs through initial analysis of input fastq files, and does not align to reference or call variants.
--outdir OUTDIR, -o OUTDIR Specifies output directory where all analysis files will be written.
--indir INDIR, -i INDIR Specifies the input directory that contains the fastq files to be analyzed.
--single, -s Only process a single set of paired end reads.
--prevdict PREVDICT, -p PREVDICT Specify a previously output pickle file containing collapsed fastq data as an input instead of raw fastq files.
--umimismatch UMIMISMATCH, -u UMIMISMATCH Specify the number of mismatches allowed in a UMI pair to still consider as the same UMI.
--varthresh VARTHRESH, -v VARTHRESH Specify the percentage of reads that must contain a particular base for that base to be used in the final consensus read.
--readsupport READSUPPORT, -r READSUPPORT Specifies the number of reads that must have a given UMI sequence in order to be binned as a true capture event, and not be thrown out.
--clustersubmit, -c Submit run to cluster computing rather than running locally.
--filterao FILTERAO, -f FILTERAO Specifies the AO cuttoff for reported variants, where -f 5 would eliminate all variants that are seen 5 times or less. Default == 5.
--dpfilter DPFILTER, -d DPFILTER Read depth elimination threshold. If specified as -d 500 only variants found in a locus read greater than 500 times will be reported. Default == 500.
--freebayes FREEBAYES, -b FREEBAYES Location of freebayes in the format of /dir/freebayes
--errorrate, -e Overall pcr amplification + sequencing error rates will be estimated and returned.
--readLength READLENGTH, -q READLENGTH Manually set the read length. If this is not set, length will be automatically set as the number of bases found between the two UMI sequences.
--badBaseSubstitute, -x This flag will trigger replacement of bad bases with N instead of invalidating an entire capture.
--reference REFERENCE, -y REFERENCE Set the location of the human reference genome hg19.fa and supporting files.
--duplexcollapse, -w This will run duplex collapsing instead of the original collapsing that treats two complementary strands as different captures.
--minimaloutput, -z This will suppress the output of most files, and only include the final vcf files and some of the info files.
--getsamplesautomatically, -g This will try and automatically grab and sort all files in a specified input directory so they dont need to be manually specified. Samples should fit the pattern x1.fastq (r1) and x2.fastq (r2) where x can be any string.
--realvsmock, -j This flag will trigger elimination of potential errors found by duplex sequencing, if duplex collapsing is flagged without this flag, mock duplex sequencing will be performed in order to compare with the effects of eliminating potential errors.