Massive Associative K-biclustering -- MAK

Installation

Requirements:

Java (tested with version 11.0.16)
R, compiled with --enable-R-shlib (tested with 4.2.1)
Four R packages including rJava which installs JRI, the Java-to-R interface

The installation has been tested in the bash shell on Linux and Mac OS.

Installation steps:

Use the MAK.jar file found in this repo

MAK_build/MAK.jar

or build it from scratch

cd MAK_build/
source antbuild_git.sh

Install R package dependencies with these R commands on the R command line:

> install.packages("fields")
> install.packages("amap")
> install.packages("irr")
> install.packages("rJava")

Once rJava is installed you can find the location of the JRI.jar file using this R command on the R command line:

> system.file("jri",package="rJava")

Add the JRI and MAK JAR paths to your system CLASSPATH variable, e.g in linux bash:

export CLASSPATH=$CLASSPATH:path/to/JRI/JRI.jar:path/to/MAK.jar

Check/set your R_HOME variable, e.g. in linux bash: To find you R home dir use this R command on the R command line:

> R.home()

And use this path to set the $R_HOME environment variable:

export R_HOME=path/to/R

Add JRI shared library files and R shared library files to the LD_LIBRARY PATH variable:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:path/to/JRI/:$R_HOME/lib

Running a single bicluster search using precomputed null distributions

Download the MAK yeast example tar ball: https://github.com/realmarcin/MAK_results/raw/master/results/yeast/input/example/MAK_example.tar.gz
Place this tar ball in the top level MAK repo directory and extract it:

tar zxf MAK_example.tar.gz

Run a single MAK trajectory (about 30 min. on a MacBook Pro 2.4Ghz Intel Core i9):

cd MAK_example
source run.sh

If you get a java library path error, you can specify this path in the java command:

java -Djava.library.path=path/to/JRI/ -Xmx2G DataMining.RunMiner -param MAK_parameters.txt -debug 0

You can perform searches for any starting point in this dataset by changing the value of the INIT_BLOCKS field in the MAK_parameters.txt parameter input file. The indices are 1-offset and based on the row and column labels in the yeast_cmonkey.txt input file. The format of the value is:

{a,b,c/x,y,z}

where a,b,c,x,y,z are integers in the range of the dimensions of the input matrix.

Running the MAK HPC bicluster discovery pipeline

Input data:

Simple TSV file(s) with row and column labels (with shared identifier axes if multiple data layers)

Edit fields in MAKflow input parameter file:

E.g:

Input TSV data path(s) (up to four)
Bicluster coherence criterion (column = MSEC_KENDALLC_FEM)
null size boundaries (y=(2,200), x=(2,100))
starting point size range (y=(10,100), x=(10,50))
HCL starting point metric and linkage (Pearson correlation and Complete)
move mode sequence (BS)
HPC parameters
number of CPUs (600)
maximum wall time (48 hours)

Run:

java DataMining.MAKflow ...

Syntax: -parameters = input parameter file -server = HPC cluster -account = HPC user account -qos = HPC queue

Output:

A TSV file of nonredundant, highest scoring biclusters in the input data
A set of directories with all intermediate data and results
Includes all executed commands and optional logging

Name		Name	Last commit message	Last commit date
Latest commit History 231 Commits
DataMining		DataMining
DataMining_R		DataMining_R
MAK_build		MAK_build
bioobj		bioobj
dialog		dialog
dtype		dtype
mathy		mathy
util		util
LICENSE.txt		LICENSE.txt
README.md		README.md
README.txt		README.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Massive Associative K-biclustering -- MAK

Installation

Requirements:

Installation steps:

Running a single bicluster search using precomputed null distributions

Running the MAK HPC bicluster discovery pipeline

About

Releases

Packages

Languages

License

realmarcin/MAK

Folders and files

Latest commit

History

Repository files navigation

Massive Associative K-biclustering -- MAK

Installation

Requirements:

Installation steps:

Running a single bicluster search using precomputed null distributions

Running the MAK HPC bicluster discovery pipeline

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages