If you use this software for research, please cite the following paper:
Madrigal P, Krajewski P (2015) Uncovering correlated variability in epigenomic datasets using the Karhunen-Loeve transform. BioData Mining 8:20. DOI: http://dx.doi.org/10.1186/s13040-015-0051-7
- R package fda - R package ggplot2 - R package genomation - R package RColorBrewer - Bedtools - UCSC bigWigSummary[1]: bigWig Formatted File
[2]: File with regions in BED Format (columns 1,2,3,6 are required: chr, start, end, strand)
[3]: Length (base pairs) of genomic regions to analyze (integer)
[4]: Number of B-spline basis (integer)
[5]: Check integrity of the files (T/F). If True (T), requires a *.chrom.sizes file in the folder.
[6]: Remove ENCODE Blacklisted regions (T/F)
[7]: Prefix of output files
[8]: Number of functional principal components to compute
[9]: Number of functional principal components to plot
[10]: Number of bins to be used in the heatmap
$ Rscript KLTepigenome.r H3K4me3.bw regions.bed 5000 100 T T H3K4me3_mark 50 5 100
{prefix}_intersect.bed: Final set of genomic regions (if input parameters [5,6] are FALSE, then this file is the same as the initial BED file)
{prefix}_heatmap.pdf: Heat map showing the genomic regions (read-enrichment profiles) considered in the analysis
{prefix}_varprop.txt: Proportions of variance explained by the functional principal components
{prefix}_scores.txt: Matrix of functional principal component scores for each genomic region
{prefix}_components.txt: Value of principal components at each nucleotide
{prefix}_components.pdf: Plot of functional principal components as indicated in input parameter [9]
{prefix}_mean_sd.png: Plot of the functional mean of the data, and the interval indicating the functional standard deviation.
{prefix}_barplot.pdf: Barplot of proportion (%) of variance explained by the components computed (input parameter [8])
[1...N]: List of N *_varprop.txt files with proportions of variance, obtained after running KLTepigenome.r
$ Rscript propVarPlot.r H3K4me3_mark_varprop.txt H3K27me3_mark_varprop.txt H2A.Z_mark_varprop.txt
propVarPlot.pdf: A scatterplot of the Component number vs the Cumulative sum of variance explained (%)
[1...N]: List of N prefixes of *_pc_scores.txt files with principal component scores, obtained after running KLTepigenome.r
$ Rscript KLTmaxCorrelation.r H3K4me1_mark H3K4me2_mark H3K4me3_mark
cor_Scores.csv: matrix with maximum values of pairwise Pearson correlation coefficients between functional principal component scores
cor_Scores_#Eigenfunctions.csv: order of the eigenfunctions in which the maximum correlation takes place
cor_Eigenfunctions.csv: Pearson correlation coefficients between the eigenfunctions in cor_Scores_#Eigenfunctions.csv. This value is used to measure the co-localization of the eigenfunctions