GitHub - flalix/mia: Mutual Information Analyzer, a software that clusters molecular sequences based on entropy and mutual information

title	author	date	output
MIA	Flavio Lichtenstein	December 11, 2015	html_document

Mutual Information Analyzer, a software that clusters molecular sequences based on entropy and mutual information

Federal University of Sao Paulo (UNIFESP), DIS-Bioinformatics

Motivation: Here we propose a method to discriminate closely related species at the molecular level using entropy and mutual information. Sequences of orthologous genes in the same species might contain species-specific covariation patterns that can be identified through mutual information.

Summary: Mutual Information Analyzer (MIA) is a pipeline written in Python with the intent to calculate Vertical Entropy, Vertical and Horizontal Mutual Information. From VH, VMI and HMI distributions, Jensen-Shannon Divergence (JSD) is calculated to estimate the distances between species sequences. Each pair of mutual information distribution distances with their respective standard errors are calculated and stored in distance matrices. These distances between distributions can be presented as histograms or hierarchical cluster dendrograms.

How to install:

see install document

We did an executable for Windows and Linux, with PyInstaller, but at the end it didn't work (I don't know why). We apologize.

Since we open the source code you can execute by command line, but first you must install some libraries and set a python path.

Methods:

Mutual Information Analyzer (MIA) is a pipeline written in Pytho with the following algorithms:

A1) NCBI: gathers data in NCBI and stores them in GBK file format;
A2) Gbk to Fasta: analyze GBK file and organizes in fasta files per species;
A3) Alignment: aligns all sequences and at the end creates two fasta files: "mincut" cutting out columns and sequences with large gaps, and "maxmer" maintaining the maximum possible gaps;
A4) Purging: replaces ambiguous nucleotides via IUPAC nucleotide ambiguity table, and eliminates sequences with undesirable words in their names like "synthetic";
A5) Consensus: replaces gaps by their vertical consensus nucleotide;
A6) VMI: calculates and stores Vertical Entropy (VH) and Vertical Mutual Information (VMI) distributions, and plots the respective histograms and heat maps;
A7) HMI: calculates and stores Horizontal Mutual Information (HMI) distributions, and plots the histograms;
A8) JSD: calculates Jensen-Shannon Divergence, storing distances and their SEs in distance matrix files, and plots the histograms;
A9) HC: calculates hierarchical cluster and present it as a dendrogram; A10) Entropy: simulates Shannon Entropy.

HMI and VMI are calculated with and without bias corrections, therefore, the gain or loss of information for "mincut" versus "maxmer", with or without bias correction, can be compared. Distances between distributions are calculated via the square root of JSD. Since Mutual Information and JSD are not linear functions of the data their standard errors are calculated by empirical propagation.

####Images:

Vertical Shannon Entropy

Vertical Mutual Information 2D Heatmap

Vertical Mutual Information 3D Heatmap

Horizontal Entropy

JSD Histogram

Hierarchical Cluster

Soon: a nonparametric classifier

We developed also a MI_Classifier, a non parametric classifier. You give all sequences in a fasta file, it calculates MI spectra and JSD distances, and export the Informational Dendrogram to Figtree. It is in test.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
doc		doc
image		image
src		src
install.md		install.md
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mutual Information Analyzer, a software that clusters molecular sequences based on entropy and mutual information

How to install:

Methods:

About

Releases

Packages

Languages

flalix/mia

Folders and files

Latest commit

History

Repository files navigation

Mutual Information Analyzer, a software that clusters molecular sequences based on entropy and mutual information

How to install:

Methods:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages