Skip to content

bio-ontology-research-group/deepgometa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepGOMeta

This repository contains the scripts and datafiles used in the DeepGOmeta manuscript.

Dependencies

  • The code was developed and tested using python 3.10.
  • Clone the repository: git clone https://github.com/bio-ontology-research-group/deepgometa.git
  • Create virtual environment with Conda or python3-venv module.
  • Install PyTorch: pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
  • Install DGL: pip install dgl==1.1.2+cu117 -f https://data.dgl.ai/wheels/cu117/repo.html
  • Install other requirements: pip install -r requirements.txt

Running DeepGOMeta model

Follow these instructions to obtain predictions for your proteins. You'll need around 30Gb storage and a GPU with >16Gb memory (or you can use CPU)

  • Download the data.tar.gz
  • Extract tar xvzf data.tar.gz
  • Run the model python predict.py -if data/example.fa

Docker container

We also provide a docker container with all dependencies installed: docker pull coolmaksat/deepgometa
This repository is installed at /deepgometa directory. To run the scripts you'll need to mount the data directory. Example:
docker run --gpus all -v $(pwd)/data:/workspace/deepgometa/data coolmaksat/deepgometa python predict.py -if data/example.fa

Nextflow

DeepGOMeta can be run as a Nextflow workflow using the docker image for easier execution.

Requirements:

  • For amplicon data: OTU table of relative abundance, where OTUs are classified using the RDP database
  • For WGS data: Protein sequences in FASTA format
  1. After cloning the repository, navigate to the Nextflow directory: cd Nextflow
  2. Update the runOptions paths in nextflow.config
  3. Navigate to the data directory cd data_and_scripts and download the genome annotations
  4. Run workflow. Example: nextflow run DeepGOMeta.nf -profile docker/singularity --amplicon true --OTU_table otu_relative_abd.tsv --pkl_dir /PATH/TO/PKL/DIR/

Paired Datasets

  1. Data and metadata: download from SRA and MG-RAST using sample accessions
  2. Processing reads:
  3. Functional annotation:
    • OTU tables - generate a weighted functional profile for each OTU table using DeepGOmeta predictions
    • Protein fasta - run DeepGOmeta on Prodigal output from metagenome assemblies, and generate a binary functional profile for each dataset
  4. Clustering and Purity: use a metadata file and the functional profile to apply PCA, k-means clustering, calculating purity, and generating plots for 16S datasets and WGS datasets
  5. Information Content Calculation: create a .txt file for each sample containing the 16S predicted functions and WGS predicted functions on separate lines (e.g. 16Ssample'\t'GO1'\t'GO2'\n'WGSsample'\t'GO2'\t'GO3), and get IC for each function, then run a t-test

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published