Skip to content

Releases: liggettla/FERMI

v1.10.0

24 Feb 20:22
Compare
Choose a tag to compare

A new algorithm now automatically excludes any variants falling outside of the probed region of the genome, which appears to be a significant number of variants that are of high allele frequency.

A new algorithm improves collapsing by ignoring only the insufficient base, rather than an entire capture. This may increase resolution, but still needs to be tested.

Read lengths can now be adjusted to only look at a subset of each read rather than the entire sequence.

Now all flags are output into the parametersUsed file.

Human genome reference files is now input with the y flag rather than being hardcoded in.

Variants that are not aligned within the probed regions of the genome are automatically eliminated in final vcf file. It may be helpful to find out why reads are being aligned outside of the probed region as these may be good reads that can be used in analysis.

Installation is now simplified by including an anaconda environment file along with a bash script for some automated installation.

Wrote an algorithm in donorStrands.py found in the fastqHandling directory that can take advantage of linked SNPs in order to calculate lower detection limits.

Added R-squared computing to the vafRepeatability pipeline. This is also now automatically computed and output with the associated plots.

Added mutationsPerProbe script that allows for analysis of how many mutations show up in a given probed region to look for any particular hotspots. This will hopefully be automatically run in the future.

Wrote a new probe bias algorithm that should improve the accuracy of this calculation. This is included in mutationsPerProbe in the main directory, and is also used to normalize the numbers of variants per probed region. This also now outputs plots to a pdf file automatically when run.

For some reason in the collapsing [-7:0] is needed to grab the full 6bp of the 3' UMI, and yet in ipython only [-6:0] is needed. I have no idea why this is the case, but clearly missing something. This needs to be understood, but for now the full UMI is grabbed whereas previously one base was eliminated.

Duplex Collapsing has been introduced, which allows for further collapsing while using the information about each of the two complementary strands in a capture to reduce false positive signals.

v1.9.0

07 Dec 00:18
Compare
Choose a tag to compare

Added a method of estimating the total error rate made during pcr amplification and base calling to get a global error rate.

Added method for understanding base biases in the total probed region of the genome.

Added script to show base bias in theoretical probed regions.

Changed the length of the quality score in goodcollapsedictionary.py to match the size of the read. This may have been causing problems.

Changed the 5' UMI length to include intervening captured sequence. This has caused problems in the past; still troubleshooting.

Added a more robust coverage calculation algorithm such that instead of taking a global average of what coverage would be if every read were included within the final set of filtered reads, now the coverage average is computed only for the reads that pass all filter parameters and are included in the final output fastq file.

Added the ability to include variants in VAF comparison plots that are not found in both samples that are being analyzed.

VAF plots are now automatically displayed.

VAF comparison is now plotted both with a regression curve and with a straight line y = x.

New algorithm written in multiSampleVAFRep.py to derive and average AF from a number of samples that can be used to compare to a single individual.

AF plotting improved to make labeling cutoffs dependent on the AFs of all samples, and labels no longer overlap or get cutoff by the axis.

AF plotting now labels the actual variant change.

A new algorithm for dealing with bases that do not meet the specific varThresh has been added that ignores the just the base that does not meet the varThresh, but maintains the rest of the adequate data in the capture. This substantially increases the number of putative variants that are output by the algorithm.

v1.8.1

19 Sep 17:08
Compare
Choose a tag to compare

Just some bugfixes and being used as a stable release before some potentially code breaking modifications.

v1.8.0

20 Jul 17:03
Compare
Choose a tag to compare

Finally organized into some relatively intelligent filetree.

Added flag to pass freebayes location at runtime.

Info writeup created at runtime is now functional.

Added vafRepeatability.py to plot out how repeatable the AF of the same variants are in repeated sequencing of samples.

v1.7.0

19 Jun 23:15
Compare
Choose a tag to compare

dilutionTesting.py now handles output that comes from new fermi processing smoothly.

No longer prompts while entering files for when user is done entering files. This makes entering files a little faster, but still would be nice to have automatic import at some point.

Extension of the 5' UMI to include sequence has been causing problems and has been omitted. This will be included in the future.

Decomposition is now done in a way that outputs the corresponding info with a given read. This now allows proper AO, AF, and DP calls for a variant.

Changed the way memory is allocated during cluster running. This is now manually set and job must not exceed memory constraints.

Created new plotting method for base mutation type bias. As of now this is not automated.

v1.6.0

02 May 21:15
Compare
Choose a tag to compare

This release now brings flags to running the program; no more user prompts for all input parameters!

Automated handing of files is now much better, such that names are now just entered at runtime, and separate jobs are created to process each paired end read.

VCF file output is now vastly improved. Variants are now decomposed, forcing correct loci calling and allowing robust variant matching between runs.

The code is also much cleaner, and main.py has been considerably cleaned up, where code has been pushed to separate files and methods where possible.

Cluster/local run switching is now automated and handled by a flag input.

v1.5.0

22 Apr 16:45
Compare
Choose a tag to compare

Pipeline now uses lambda function rather than nested dictionary data structure. This runs much more quickly than the nested dictionary data structure.

The program now also allows for multiple file handing meaning the program will not need to be run for every file getting analyzed. This doesn't seem to be working when submitted for cluster processing however.

Of note there may still be an issue with proper AF calling. When running test files I cannot get the error to repeat, perhaps meaning the discrepancy with AF calls observed before is something on the biology side of things.

1.4.0

02 Feb 20:23
Compare
Choose a tag to compare

Program is now entirely autonomous through variant calling. Any analysis after the variant file has been produced has not been automated yet, but will be in the future.

1.3.0

27 Jan 17:52
Compare
Choose a tag to compare

Lots of new analysis code has been added that helps parse through and make sense of the generated data. These analysis scripts have yet to be integrated into the main pipeline, and must be run individually, however the goal in the future is to integrate them into the main code to run automatically. Ideally this should all then be plotted in R automatically, but this is a goal for a much later time. At the moment some of the new scripts will generate data and output as tab separated in a text file to be manually used in excel etc.

1.2.0

14 Dec 20:11
Compare
Choose a tag to compare

Code is now far more automated. Still scripts are not autonomously called for alignment, variant calling, and plotting, but the code is included and should be close to functional if not functional. LRS cluster support is also included now.