Skip to content

Uncovering correlated variability in epigenomic datasets using the Karhunen-Loeve Transform

License

Notifications You must be signed in to change notification settings

BioinformaticsArchive/KLTepigenome

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KLTepigenome

Uncovering correlated variability in epigenomic datasets using the Karhunen-Loeve Transform

Next-generation sequencing is enabling the scientific community to go a step further in the understanding of molecular mechanisms controlling transcriptional and epigenetic regulation. Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is commonly applied to map histone modifications or transcription factor binding sites for a protein of interest. **KLTepigenome** is a set of R scripts allowing to explore patterns of epigenomic variability and covariability in next-generation sequencing data sets by means of a functional eigenvalue decomposition of genomic data. The script KLTepigenome.r must be run first on each bigWig file, before using the rest of scripts.

If you use this software for research, please cite the following paper:

Madrigal P, Krajewski P (2015) Uncovering correlated variability in epigenomic datasets using the Karhunen-Loeve transform. BioData Mining 8:20. DOI: http://dx.doi.org/10.1186/s13040-015-0051-7

Dependencies
- R package fda - R package ggplot2 - R package genomation - R package RColorBrewer - Bedtools - UCSC bigWigSummary

KLTepigenome.r

Parameters required

[1]: bigWig Formatted File

[2]: File with regions in BED Format (columns 1,2,3,6 are required: chr, start, end, strand)

[3]: Length (base pairs) of genomic regions to analyze (integer)

[4]: Number of B-spline basis (integer)

[5]: Check integrity of the files (T/F). If True (T), requires a *.chrom.sizes file in the folder.

[6]: Remove ENCODE Blacklisted regions (T/F)

[7]: Prefix of output files

[8]: Number of functional principal components to compute

[9]: Number of functional principal components to plot

[10]: Number of bins to be used in the heatmap

Example
$ Rscript KLTepigenome.r H3K4me3.bw regions.bed 5000 100 T T H3K4me3_mark 50 5 100
Output files

{prefix}_intersect.bed: Final set of genomic regions (if input parameters [5,6] are FALSE, then this file is the same as the initial BED file)

{prefix}_heatmap.pdf: Heat map showing the genomic regions (read-enrichment profiles) considered in the analysis

{prefix}_varprop.txt: Proportions of variance explained by the functional principal components

{prefix}_scores.txt: Matrix of functional principal component scores for each genomic region

{prefix}_components.txt: Value of principal components at each nucleotide

{prefix}_components.pdf: Plot of functional principal components as indicated in input parameter [9]

{prefix}_mean_sd.png: Plot of the functional mean of the data, and the interval indicating the functional standard deviation.

{prefix}_barplot.pdf: Barplot of proportion (%) of variance explained by the components computed (input parameter [8])

propVarPlot.r

Parameters required

[1...N]: List of N *_varprop.txt files with proportions of variance, obtained after running KLTepigenome.r

Example
$ Rscript propVarPlot.r H3K4me3_mark_varprop.txt H3K27me3_mark_varprop.txt H2A.Z_mark_varprop.txt
Output files

propVarPlot.pdf: A scatterplot of the Component number vs the Cumulative sum of variance explained (%)

KLTmaxCorrelation.r

Parameters required

[1...N]: List of N prefixes of *_pc_scores.txt files with principal component scores, obtained after running KLTepigenome.r

Example
$ Rscript KLTmaxCorrelation.r H3K4me1_mark H3K4me2_mark H3K4me3_mark
Output files

cor_Scores.csv: matrix with maximum values of pairwise Pearson correlation coefficients between functional principal component scores

cor_Scores_#Eigenfunctions.csv: order of the eigenfunctions in which the maximum correlation takes place

cor_Eigenfunctions.csv: Pearson correlation coefficients between the eigenfunctions in cor_Scores_#Eigenfunctions.csv. This value is used to measure the co-localization of the eigenfunctions

About

Uncovering correlated variability in epigenomic datasets using the Karhunen-Loeve Transform

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 91.5%
  • Shell 8.5%