-
Notifications
You must be signed in to change notification settings - Fork 6
1 Setup process using a Conda environment
- Open a shell in the KEMET directory. Scripts execution should be enabled. If the opposite is true use
chmod +x ./*.py
-
Only for the first time, run the
set_kemet_working-directory.py
script (if genome-scale models functionalities are wanted, add the-G
parameter).
This will populateKEMET
folder with other different subfolders, where input and outputs are to be stored.
- Set input files into proper paths (IMPORTANT):
-
Copy MAG/Genome sequences to be analysed in
KEMET/genomes/
folder, which is created after the setup process.
NOTE:
Only ".fa",".fna" or ".fasta" sequence file extensions are supported
No FASTA header repetitions are allowed in a single MAG/Genome.
If necessary, rename MAGs/Genomes and FASTA headers accordingly, using:
awk '/^>/{print ">"++i; next}{print}' < original.fasta > new.fasta
-
Copy KEGG KOs annotations (derived from different sources) in
KEMET/KEGG_annotations/
folder, created in the setup process.
The script requires an indication of the program used to generate input KEGG annotation (eggNOG, KofamKOALA -both web server and command line-, KAAS and KAAS-like format are supported up to January 2023).
Do not change annotations format from their original output (truncated example files can be found inKEMET/toy/
folder) -
Check
KEMET/KEGG_MODULES/
folder presence as in GitHub. This is necessary for script usage as it contains KEGG Modules structure files (REF: KEGG MODULE resource); missing KO orthologs are deduced from these structures. Other "custom" Modules could be added to that folder, if formatted in the proper way (see wiki about this topic). -
(Optional) Pre-existing genome-scale models (GSMM or GEMs) using BiGG namespace (".xml" files) can be copied in the
KEMET/models/
folder, created after the setup process. These files can be used to expand existing GSMM, which is one of the two possible GSMMs options; the other viable option is de novo GSMM creation with extra protein coding genes discovered via HMMs.Files extensions are not to be modified.
The same is valid for the rest of file names, unless there is no correspondance between KEGG KOs annotations and input MAG/genome:e.g. bin1.fasta/.fa/.fna MAG/Genome should be paired with KEGG annotations from file bin1.emapper.annotations, and these should be used with the bin1.xml genome-scale model file.
-
(ONLY mandatory if HMM and GSMM steps are needed) Fill in the textual file called "genomes.instruction", generated after the setup process.
Excluding the header, each line should have a tab-separated indication of:MAG/Genome FASTA Taxonomic indication Metabolic model universe
-
The MAG/Genome FASTA indicate the MAG/Genome of interest file name (e.g. bin1.fasta)
-
The taxonomic indication should be taken from the KEGG Brite taxonomic indication (specifically from the C-level, that most of the times coincide with NCBI phylum level taxonomy) (REF: BRITE Organism table) (e.g. Actinobacteria)
-
Metabolic model universe comprehend grampos, gramneg, archaea or other custom universe (this is an optional indication needed for GSMM de-novo reconstruction)
A handy script to do so is included (add_taxonomy_from_gtdb-tk.py
). Using that, it's possible to speed up the process while converting the taxonomy normally obtained with the popular tool GTDB-tk, which assign the more complete and up-to-date Genome Taxonomy database (GTDB) taxonomy to MAGs.
The output of the GTDB-tk gtdb_to_ncbi_majority_vote.py
script is needed. This way GTDB taxonomy is converted to NCBI standards, which are further converted to the requested KEGG BRITE taxonomy.
-
(ONLY mandatory if HMM and GSMM steps are needed) Fill in other instruction text files.
If HMM-analyses are desired, these need either themodule_file.instruction
or theko_file.instruction
files as follows, depending on the desired MODE OF USE (which needs to be specified with the--hmm_mode MODE
parameter).MODE Analysis Instructions onebm KOs from KEGG Modules missing 1 block (No need to fill instruction files) modules KOs from a fixed list of KEGG Modules (One per line indication in the module_file.instruction
file)kos KOs from a fixed list of orthologs (One per line indication in the ko_file.instruction
file)
- Launch the
kemet.py
command line script with your arguments of choice! See the help page for details, or the initial Readme page for base usage.