This is cRegulon software: an optimization model an optimization model for modeling combinatorial regulation from single-cell multi-omics provides units underpinning cell type landscape.
- Homer in your environment PATH then install hg38 and mm10
- Python >=3.0 and packages below:
- numpy-groupies==0.9.15
- numpy==1.26.3
- scipy==1.12.0
- pandas==1.4.3
- ismember==1.0.2
- mpmath==1.2.1
- pybedtools==0.9.0
- scikit-learn==1.4.2
wget https://github.com/SUwonglab/cRegulon/archive/master.zip
unzip master.zip
cd cRegulon-master
wget -O cRegulonData.tar.gz https://figshare.com/ndownloader/files/47503895
tar -xzvf cRegulonData.tar.gz
We also provide an example for cRegulon analysis, which can be downloaded from the following command:
wget -O cRegulonExample.tar.gz https://figshare.com/ndownloader/files/49856427
tar -xzvf cRegulonExample.tar.gz
After downloading, there will be 5 folders that store raw data, intermediate results, and final results of example:
./example_data/
./CSI/
./PseudoBulk/
./Networks/
./Results/
The typic input file of scRNA-seq data is a gene by cell count matrix:
scRNA | RNACellID1 | RNACellID2 | RNACellID3 |
Gene1 | 5 | 0 | 3 |
Gene2 | 0 | 2 | 0 |
Gene3 | 1 | 0 | 0 |
scATAC | ATACellID1 | ATACellID2 | ATACellID3 | ATACellID4 |
Peak1 | 1 | 0 | 1 | 0 |
Peak2 | 0 | 1 | 0 | 1 |
Peak3 | 1 | 0 | 0 | 0 |
In practice, the file of single cell dataset can be the 10x folder format (example dataset: ./example_data/RA/) or a matrix txt format (example dataset: ./example_data/CL/). In this tutorial, we use RA dataset for demonstration.
The typic cell type meta file (./example_data/RA/RA_scRNA_Cluster.txt) of scRNA-seq data is as follows:
RNACellID1 | RAC1 |
RNACellID2 | RAC2 |
RNACellID3 | RAC3 |
ATACellID1 | RAC2 |
ATACellID2 | RAC3 |
ATACellID3 | RAC1 |
ATACellID4 | RAC2 |
python3 cRegulon.py {prep,grn,model} ...
prep: Preprocessing mode
grn: GRN mode
model: Model mode
We run the following script to create pseudo bulk RNA-seq and ATAC-seq data for each cell cluster:
#cRegulon.py prep [-h] [--name NAME] --rna RNA --rna_meta RNA_META --atac ATAC --atac_meta ATAC_META --species SPECIES (human or mouse)
python3 cRegulon.py prep --name RA --rna ./example_data/RA/scRNA/ --rna_meta ./example_data/RA/RA_scRNA_Cluster.txt --atac ./example_data/RA/scATAC/ --atac_meta ./example_data/RA/RA_scATAC_Cluster.txt -g mouse
This process will produce pseudo bulk files (*PS_RNA.txt, *PS_ATAC.txt, *CellType.txt) for each cell cluster in the PseudoBulk folder.
We run the following script to construct regulatory network for each cell cluster (current we support hg38 and mm10):
#python3 cRegulon.py grn [-h] --name NAME --celltype CELLTYPE --genome GENOME --cores CORES
for c in `cat ./PseudoBulk/RA_CellType.txt`
do
python3 cRegulon.py grn -n RA -ct ${c} -g mm10 -p 20
done
This process will produce GRN files (*network.txt, TFTG_regulationScore.txt, TFName.txt, TGName.txt) for each cell cluster in the Networks folder (The GRN construction is independent for each cell cluster, we can do it parallelly).
We run the following script of cRegulon model: If we already know or have some expection of the cRegulon number, we can provide this number to cRegulon. For example, we have 9 cRegulons for RA, then we run this script:
#python3 cRegulon.py model [-h] --name NAME --module_number MODULE_NUMBER
python3 cRegulon.py model -n RA -mn 9
If we don't know the cRegulon number, we can provide a range of numbers and cRegulon will use elbow rule to select an optimal number. For example, we guess there may be 4-20 cRegulons for RA, then we run this script:
#python3 cRegulon.py model [-h] --name NAME --module_max MODULE_MAX --module_min MODULE_MIN
python3 cRegulon.py model -n RA -mmin 4 -mmax 20
This will output a folder in "Results" with name you specify: "./RA/"
- TF combinatorial effects in each cRegulon: ./Results/RA/X.txt
- Association matrix between cell clusters and cRegulons: ./Results/RA/A.txt
- TF module of each cRegulon: ./Results/RA/*TFModule.txt
- Annotation of each cell cluster with cRegulons: ./Results/RA/Annotation/*subnetwork.txt
If you only have scRNA-seq data, We run the following script to annotate cells with our pre-computed cRegulons from atlas-level dataset:
#python3 cRegulon.py annot [-h] --name NAME --path_rna PATH_RNA --module_number MODULE_NUMBER
python3 cRegulon.py annot --name PBMC --path_rna ./example_data/PBMC/PBMC_scRNA.txt --module_number 12
If you use cRegulon software or cRegulon associated concepts, please cite:
Zhanying Feng, et al. Modeling combinatorial regulation from single-cell multi-omics provides units underpinning cell type landscape. 2024.