Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifiy modules to work with models not using BiGG identifiers #36

Open
Tracked by #58
famosab opened this issue Oct 21, 2022 · 8 comments
Open
Tracked by #58

Modifiy modules to work with models not using BiGG identifiers #36

famosab opened this issue Oct 21, 2022 · 8 comments
Labels
enhancement New feature or request

Comments

@famosab
Copy link
Collaborator

famosab commented Oct 21, 2022

So far only models with BiGG identifiers can be used for growth simulation. It is possible to modify but needs a bit more thought. The user will have to know the structure of their identifiers (for example the underlying database) and use them in their custom media definition. The easiest way to achieve this change might be to abstract the function that are already there and providing a jupyter notebook in the documentation which shows how to 1. define the medium and 2. use this medium to run a growth simulation. This might need some code changes though.

For version 2.1 ❓

@famosab famosab added the enhancement New feature or request label Oct 21, 2022
@famosab famosab added this to the Refactor as Python package milestone Oct 21, 2022
@famosab
Copy link
Collaborator Author

famosab commented Oct 28, 2022

This also applies to the charges module - here it is mainly a documentation and variable renaming issue. We need to make the user know that certain functions only work if the models identifiers are based on a certain syntax that is then consisten with other files (such as a dataframe which holds information on charges).

@famosab famosab changed the title Modifiy growth module to work with models not using BiGG identifiers Modifiy modules to work with models not using BiGG identifiers Oct 28, 2022
@famosab
Copy link
Collaborator Author

famosab commented Nov 3, 2022

This also applies to the polish_carveme module. Mainly the two functions add_bigg_metab and add_bigg_reac need to be modified. They either need to be disabled or we need some kind of test whether the ID of a metabolite is a valid BiGG ID.

@draeger
Copy link
Member

draeger commented Nov 3, 2022

Are these functions going to run ModelPolisher?

@famosab
Copy link
Collaborator Author

famosab commented Nov 3, 2022

Are these functions going to run ModelPolisher?

@draeger I am unsure whether that would be desired by a user. Since the main.py script is targeted at people that are not as experienced with python we could think about a way of implementing a call to ModelPolisher. However within the python module (to be used for their own scripts) I do not think it would add anything.

GwennyGit added a commit that referenced this issue Nov 8, 2022
1. Changed parameters in `config.yaml`:
- Renamed parameter `polish_carveme` to `polish`
- Added parameter `BiGG_IDs`
2. Changed `main.py` according to 'new' parameters
GwennyGit added a commit that referenced this issue Nov 17, 2022
Additionally, sorted all dictionary entries alphabetically.
GwennyGit added a commit that referenced this issue Nov 17, 2022
Renamed add_bigg_metab and add_bigg_reac to add_metab and add_reac and generalised the code.
GwennyGit added a commit that referenced this issue Nov 23, 2022
Most of the code was adjusted to be more general. Additionally, the functionalities requested in issue #38 were added.
@GwennyGit
Copy link
Collaborator

GwennyGit commented Jan 31, 2023

Currently, the function cv_ncbiprotein is hard coded for the identifiers produced by CarveMe as well as for the parameter protein_fasta for the file format obtained from NCBI. Basically, the function requires that the CarveMe identifiers were obtained from the header lines of a NCBI FASTA file and the parameter protein_fasta requires the NCBI FASTA format used for the protein FASTA for the coding sequences (CDS).

Additionally, the function now adds the RefSeq identifiers if the model contains these instead of NCBI Protein identifiers.

Improvement of this function is currently in progress to add KEGG identifiers if possible as well as RefSeq identifiers from the RefSeq.gff file obtainable via the NCBI assembly page for the organism. If KEGG identifiers can be added, the plan is to add UniProt identifiers additionally with the KEGG API. Otherwise the UniProt identifiers can maybe be added via the UniProt API with the RefSeq identifiers. For progress updates see: #53.

@GwennyGit
Copy link
Collaborator

Are these functions going to run ModelPolisher?

This will be added in the refineGEMs pipeline in SPECIMEN. See issue draeger-lab/SPECIMEN#8

@cb-Hades
Copy link
Collaborator

cb-Hades commented May 31, 2024

TODO: keep a list of functions working with the namespace param, to always know, which parts need extension, if a new namespace should be available / curate-able with the models

@cb-Hades
Copy link
Collaborator

Add VMH namespace (Tasklist):

  • entities.match_id_to_namespace : add a VMH option
  • write a function to get from VMH to BiGG (Pseudo) IDs
  • write a polish function like the one for CarveMe models for VMH model to make the above easier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants