-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update major raw data flat files to latest versions pulled from EcoCyc #1065
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
It's nice to see fewer places using the location tags and more places using f-strings.
"MUTHLS-CPLX_RXN" [{"molecule": "MUTHLS-CPLX", "coeff": 1.0, "type": "proteincomplex", "location": "c", "form": "mature"}, {"molecule": "EG11281-MONOMER", "coeff": -1.0, "type": "proteinmonomer", "location": "c", "form": "mature"}, {"molecule": "EG10625-MONOMER", "coeff": -2.0, "type": "proteinmonomer", "location": "c", "form": "mature"}, {"molecule": "EG10624-MONOMER", "coeff": -1.0, "type": "proteinmonomer", "location": "c", "form": "mature"}] | ||
"N-ACETYLTRANSFER-CPLX_RXN" [{"molecule": "N-ACETYLTRANSFER-CPLX", "coeff": 1.0, "type": "proteincomplex", "location": "c", "form": "mature"}, {"molecule": "N-ACETYLTRANSFER-MONOMER", "coeff": -6.0, "type": "proteinmonomer", "location": "c", "form": "mature"}] | ||
"NAD-SYNTH-CPLX_RXN" [{"molecule": "NAD-SYNTH-CPLX", "coeff": 1.0, "type": "proteincomplex", "location": "c", "form": "mature"}, {"molecule": "NAD-SYNTH-MONOMER", "coeff": -2.0, "type": "proteinmonomer", "location": "c", "form": "mature"}] | ||
"NADH-DHI-CPLX_RXN" [{"molecule": "NADH-DHI-CPLX", "coeff": 1.0, "type": "proteincomplex", "location": "i", "form": "mature"}, {"molecule": "NUOA-MONOMER", "coeff": -1.0, "type": "proteinmonomer", "location": "i", "form": "mature"}, {"molecule": "NUOH-MONOMER", "coeff": -1.0, "type": "proteinmonomer", "location": "i", "form": "mature"}, {"molecule": "NUOJ-MONOMER", "coeff": -1.0, "type": "proteinmonomer", "location": "i", "form": "mature"}, {"molecule": "NUOK-MONOMER", "coeff": -1.0, "type": "proteinmonomer", "location": "i", "form": "mature"}, {"molecule": "NUOL-MONOMER", "coeff": -1.0, "type": "proteinmonomer", "location": "i", "form": "mature"}, {"molecule": "NUOM-MONOMER", "coeff": -1.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to move these additions to a separate file (something like complexation_reactions_added.tsv
) so we don't have to add them each time we have another pull from EcoCyc? I think a couple other files are like this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we should definitely do something like this if we think these manual additions would be something that's permanent. Right now I wasn't sure if these were reactions were removed from EcoCyc's database completely or were just excluded for other reasons. I was planning on sending them a list to confirm this.
This PR updates major raw data flat files to the latest versions of those files pulled from EcoCyc, and makes sweeping changes across the reconstruction process in an attempt to make future updates of these files easier. I apologize for the overblown size of the PR - much more changes were needed to make this possible than I had anticipated.
I've temporarily disabled some transcription factors that were leading to solver errors in the ParCa, and have not yet tested some of the existing analysis scripts. Some manual edits needed to be made to the imported files in order to get the model to run normally, and I'll be working to eliminate as much of these manual changes as possible in future PRs.
In
reconstruction/ecoli/flat/amino_acid_pathways.tsv
, an enzyme involved in proline biosynthesis (PROLINEMULTI-CPLX[c]
) was removed because the enzyme no longer existed in the latest files. @tahorst let me know if this should give you any issues.Here are some remaining TODOs for this PR: