Add complexed enzymes as possible catalysts while building sim_data #1088

ggsun · 2021-06-12T00:51:57Z

This PR makes changes to the building of sim_data.metabolism such that protein complexes that contain an enzyme as part of the complex always retain the enzymatic activity of the said enzyme. This makes the manual changes made to the metabolic_reactions.tsv file unnecessary. I also pulled out two hard-coded parameters in the process/metabolism.py file as a flat file of its own.

tahorst

Looks good! It's great automating out manual changes but this also makes it more important to have a way of easily looking at data produced from sim_data or create intermediate files from raw_data before sim_data as we've talked about before. In this case, seeing all enzymes for each reaction could be helpful for troubleshooting or analysis but we won't see that in the flat files.

Do you know how many reactions are affected by these changes? Is it just the ones you manually changed or a lot more?

Should we also do the same for equilibrium reactions in addition to complexation? Before #1065 we had to add PUTA-CPLXBND to some reactions catalyzed by PUTA-CPLX which would be included in equilibrium.

tahorst · 2021-06-12T15:38:43Z

reconstruction/ecoli/flat/metabolism_parameters.tsv

@@ -0,0 +1,3 @@
+name	value	units	_source	_comments
+"ppi_concentration"	5e-4	"units.mol/units.L"	"multiple sources"
+"pH"	7.2	""


Should we move other metabolism parameters from parameters.tsv here or is it better to just stick these in parameters.tsv? Not sure if we want to tradeoff a bunch of parameter files for more distinction between the parameters in them. We could also use comment rows to distinguish groups like metabolism, charging, ppGpp etc parameters in a single parameters.tsv file.

Ahh that's a good point. I like the idea of just differentiating between different parameter types within a single file. I can also think of having just two different parameter files - one for parameters that are experimentally measured, and the other for "modeling parameters" like the kinetic objective weight.

I can also think of having just two different parameter files - one for parameters that are experimentally measured, and the other for "modeling parameters" like the kinetic objective weight

That does seem like a good distinction to make

ggsun · 2021-06-14T21:56:18Z

Looks good! It's great automating out manual changes but this also makes it more important to have a way of easily looking at data produced from sim_data or create intermediate files from raw_data before sim_data as we've talked about before. In this case, seeing all enzymes for each reaction could be helpful for troubleshooting or analysis but we won't see that in the flat files.

I totally agree with this, this could also be a good undergrad/rotation starter project. I'll post an issue proposing for this.

Do you know how many reactions are affected by these changes? Is it just the ones you manually changed or a lot more?

There are a total of 118 reactions that gets more enzymes associated to it than what the original flat file suggests with this change.

Should we also do the same for equilibrium reactions in addition to complexation? Before #1065 we had to add PUTA-CPLXBND to some reactions catalyzed by PUTA-CPLX which would be included in equilibrium.

I didn't expect to find any equilibrium proteins acting as enzymes so thanks for letting me know - do you know if this is a rare case or is more common (TFs also having enzymatic activity)? Would it be safe to assume metabolite-bound proteins have the same enzymatic activity?

tahorst · 2021-06-14T22:14:28Z

There are a total of 118 reactions that gets more enzymes associated to it than what the original flat file suggests with this change.

Wow quite a bit! I wonder if any of them would help with cases where we have enzymes go to 0 counts. I think I've checked most of those enzymes on EcoCyc and they weren't part of larger complexes but I'd imagine this change makes things more robust!

do you know if this is a rare case or is more common (TFs also having enzymatic activity)?

I would imagine fairly rare but several more cases probably exist. I know AlaS is also supposed to regulate it's own expression as well as act as a synthetase but we don't model it as a TF in the model and the Ala binding is also disabled to make handling it as a synthetase easier.

Would it be safe to assume metabolite-bound proteins have the same enzymatic activity?

That is a great question. I feel like binding would change the conformation enough to change kinetics of reactions if not completely remove the ability to catalyze a reaction but it's also probably dependent on each molecule. We can probably assume the same activity for now unless we find an example otherwise.

1fish2 · 2021-06-16T01:00:13Z

... this also makes it more important to have a way of easily looking at data produced from sim_data or create intermediate files from raw_data before sim_data as we've talked about before. In this case, seeing all enzymes for each reaction could be helpful for troubleshooting or analysis but we won't see that in the flat files.

You can look through sim_data by running this a Python Console in PyCharm:

import pickle
path = 'out/manual/kb/simData.cPickle'
sim_data = pickle.load(open(path, 'rb'))

Then click the "Show Variables" eyeglasses icon if needed and expand the sim_data tree.

Do you know how many reactions are affected by these changes? Is it just the ones you manually changed or a lot more?

Is compareParca useful for this? We could add a way to filter the comparison.

tahorst · 2021-06-16T02:58:27Z

Then click the "Show Variables" eyeglasses icon if needed and expand the sim_data tree.

That looks super useful! Might be a little overwhelming but much easier than doing things from the command line. There still might be some cases where an exported table format would be preferred but this definitely can make exploring sim_data a lot easier!

Is compareParca useful for this? We could add a way to filter the comparison.

Probably. I forget exactly how the enzymes get stored - I think a dict maybe reaction and enzymes. I'd imagine it should pick up on all the differences and hopefully in an easy to read way.

ggsun added 2 commits June 11, 2021 17:09

Add complexed enzymes as catalysts while building sim_data

be852d1

Add concentration parameters to new metabolism_parameters.tsv file

8c69b2f

tahorst approved these changes Jun 12, 2021

View reviewed changes

ggsun added 2 commits June 14, 2021 16:06

Fix mypy type error

3356eb1

Move metabolism parameters to existing parameters.tsv file

df45b8d

ggsun force-pushed the ecocyc-cleanup2 branch from 808b77e to df45b8d Compare June 14, 2021 23:06

ggsun merged commit 7a8eac7 into master Jun 15, 2021

ggsun deleted the ecocyc-cleanup2 branch June 15, 2021 00:49

ggsun mentioned this pull request Jun 15, 2021

Remove rows in all *_removed.tsv files when building raw data #1087

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add complexed enzymes as possible catalysts while building sim_data #1088

Add complexed enzymes as possible catalysts while building sim_data #1088

ggsun commented Jun 12, 2021

tahorst left a comment

tahorst Jun 12, 2021

ggsun Jun 14, 2021

tahorst Jun 14, 2021

ggsun commented Jun 14, 2021

tahorst commented Jun 14, 2021

1fish2 commented Jun 16, 2021

tahorst commented Jun 16, 2021

Add complexed enzymes as possible catalysts while building sim_data #1088

Add complexed enzymes as possible catalysts while building sim_data #1088

Conversation

ggsun commented Jun 12, 2021

tahorst left a comment

Choose a reason for hiding this comment

tahorst Jun 12, 2021

Choose a reason for hiding this comment

ggsun Jun 14, 2021

Choose a reason for hiding this comment

tahorst Jun 14, 2021

Choose a reason for hiding this comment

ggsun commented Jun 14, 2021

tahorst commented Jun 14, 2021

1fish2 commented Jun 16, 2021

tahorst commented Jun 16, 2021