Analysis code for cryptic polyadenylation manuscript, currently on biorxiv:
TDP-43 loss induces extensive cryptic polyadenylation in ALS/FTD
Sam Bryce-Smith, Anna-Leigh Brown, Puja R. Mehta, Francesca Mattedi, Alla Mikheenko, Simone Barattucci, Matteo Zanovello, Dario Dattilo, Matthew Yome, Sarah E. Hill, Yue A. Qi, Oscar G. Wilkins, Kai Sun, Eugeni Ryadnov, Yixuan Wan, NYGC ALS Consortium, Jose Norberto S. Vargas, Nicol Birsa, Towfique Raj, Jack Humphrey, Matthew Keuss, Michael Ward, Maria Secrier, Pietro Fratta
bioRxiv 2024.01.22.576625; doi: https://doi.org/10.1101/2024.01.22.576625
All code has been tested/run in Linux based environments (local = Ubuntu 22.04.1 LTS via Windows Subsystem for Linux 2, remote = UCL Computer Science SGE cluster). We cannot guarantee correct function/installation in other environments.
For R-based analysis, RStudio is strongly recommended to make proper use of R projects and Renv files
The minimal pre-requesites are:
- git
- conda/mamba (mamba recommended as much quicker!)
- R 4.3.2
- renv - run
install.packages("renv")
in the R console. We used renv version 1.0.3
- renv - run
- RStudio (you may be able to use another IDE compatible with R projects, but I have no experience of these)
Once these are satisfied, clone and enter the repo locally using the following commands:
git clone https://github.com/frattalab/tdp43-apa.git
cd tdp43-apa
Assuming you have conda/mamba available on your system, you can install all the non-R based dependencies with the following command:
<conda/mamba> env create -f py_bioinfo_full.yaml
Once installation is complete, you can activate the environment with the following command:
conda activate pybioinfo
Every individual subdirectory has its own RStudio project and Renv files.
- Open R project of interest (i.e. subdirectory here in repo)
- See guide here, can open via RStudio's 'Open Project' command or by opening the '.Rproject' file in your system's file browser.
- The correct version renv should automatically be downloaded and installed upon opening the project (if not already installed). You can then run
renv::restore()
in the console to install the required packages for the given project.
- Subdirectories contain a mix of essential and WIP/experimental scripts. Check the README in each subdirectory for a description of the scripts required to reproduce analyses in the main manuscript. Assume any other analysis scripts not listed in the README are non-essential to reproduce analyses presented in the manuscript.
- The input dependencies of each script are not well documented (apologies). Generally, should assume that will need to run
preprocessing
scripts first to generate minimal outputs for other steps, andmisc
subdirectory contains scripts that should be run last (to generate supplementary tables). Eventual plan is to produce a Snakemake pipeline to automate running different steps (and document the required running order). - Scripts have varying levels of generalisability. If you'd like to adapt some of the code to your inputs, please do get in touch and I'll do my best to give you some advice/help out.