DeepGOMeta

This repository contains the scripts and datafiles used in the DeepGOmeta manuscript.

Dependencies

The code was developed and tested using python 3.10.
Clone the repository: git clone https://github.com/bio-ontology-research-group/deepgometa.git
Create virtual environment with Conda or python3-venv module.
Install PyTorch: pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
Install DGL: pip install dgl==1.1.2+cu117 -f https://data.dgl.ai/wheels/cu117/repo.html
Install other requirements: pip install -r requirements.txt

Running DeepGOMeta model

Follow these instructions to obtain predictions for your proteins. You'll need around 30Gb storage and a GPU with >16Gb memory (or you can use CPU)

Download the data.tar.gz
Extract tar xvzf data.tar.gz
Run the model python predict.py -if data/example.fa

Docker container

We also provide a docker container with all dependencies installed: docker pull coolmaksat/deepgometa
This repository is installed at /deepgometa directory. To run the scripts you'll need to mount the data directory. Example:
docker run --gpus all -v $(pwd)/data:/workspace/deepgometa/data coolmaksat/deepgometa python predict.py -if data/example.fa

Nextflow

DeepGOMeta can be run as a Nextflow workflow using the docker image for easier execution.

Requirements:

For amplicon data: OTU table of relative abundance, where OTUs are classified using the RDP database
For WGS data: Protein sequences in FASTA format

After cloning the repository, navigate to the Nextflow directory: cd Nextflow
Update the runOptions paths in nextflow.config
Navigate to the data directory cd data_and_scripts and download the genome annotations
Run workflow. Example: nextflow run DeepGOMeta.nf -profile docker/singularity --amplicon true --OTU_table otu_relative_abd.tsv --pkl_dir /PATH/TO/PKL/DIR/

Paired Datasets

Data and metadata: download from SRA and MG-RAST using sample accessions
Processing reads:
- 16S reads - generate OTU tables using the Nextflow 16SProcessing workflow
- WGS reads - obtain protein sequences using the assembly pipeline
Functional annotation:
- OTU tables - generate a weighted functional profile for each OTU table using DeepGOmeta predictions
- Protein fasta - run DeepGOmeta on Prodigal output from metagenome assemblies, and generate a binary functional profile for each dataset
Clustering and Purity: use a metadata file and the functional profile to apply PCA, k-means clustering, calculating purity, and generating plots for 16S datasets and WGS datasets
Information Content Calculation: create a .txt file for each sample containing the 16S predicted functions and WGS predicted functions on separate lines (e.g. 16Ssample'\t'GO1'\t'GO2'\n'WGSsample'\t'GO2'\t'GO3), and get IC for each function, then run a t-test

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
Nextflow		Nextflow
PairedDatasets		PairedDatasets
.gitignore		.gitignore
IC.groovy		IC.groovy
LICENSE		LICENSE
Normalizer.groovy		Normalizer.groovy
README.md		README.md
Sim.groovy		Sim.groovy
Sim_matrix_plot.py		Sim_matrix_plot.py
aminoacids.py		aminoacids.py
annots_data.py		annots_data.py
data.py		data.py
deepgo_esm.py		deepgo_esm.py
deepgo_esm_dgse.py		deepgo_esm_dgse.py
deepgo_gat.py		deepgo_gat.py
dgg.py		dgg.py
diamond_data.py		diamond_data.py
diamond_preds.py		diamond_preds.py
esm_missing.py		esm_missing.py
evaluate.py		evaluate.py
evaluate_entailment.py		evaluate_entailment.py
evaluate_tsv.py		evaluate_tsv.py
extract_esm.py		extract_esm.py
filter_tax.py		filter_tax.py
get_specific_terms.py		get_specific_terms.py
interpro2go.py		interpro2go.py
mlp_esm.py		mlp_esm.py
naive.py		naive.py
netgo.py		netgo.py
predict.py		predict.py
requirements.txt		requirements.txt
run_diamond.sh		run_diamond.sh
save_graphs.py		save_graphs.py
sprof.py		sprof.py
tale.py		tale.py
torch_utils.py		torch_utils.py
transfun.py		transfun.py
uni2pandas.py		uni2pandas.py
upload_data.sh		upload_data.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepGOMeta

Dependencies

Running DeepGOMeta model

Docker container

Nextflow

Paired Datasets

About

Releases

Packages

Contributors 2

Languages

License

bio-ontology-research-group/deepgometa

Folders and files

Latest commit

History

Repository files navigation

DeepGOMeta

Dependencies

Running DeepGOMeta model

Docker container

Nextflow

Paired Datasets

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages