Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmpdb transform behaves unexpectedly #45

Open
mu-wang opened this issue Feb 5, 2022 · 5 comments
Open

mmpdb transform behaves unexpectedly #45

mu-wang opened this issue Feb 5, 2022 · 5 comments

Comments

@mu-wang
Copy link

mu-wang commented Feb 5, 2022

The transform rules in mmpdblib appears to miss some apparent cases.

A test case with the following structures:

OC(c(cccc1)c1O)=O	 mol1
CCCCCCCC(c(cc1)cc(C(O)=O)c1O)=O	mol2
CCCCCC(c(cc1)cc(C(O)=O)c1O)=O	mol3

with some properties:

ID	prop
mol1	0.0
mol2	1.0
mol3	1.5

I performed the fragmentation, index and property loading as instructed.

python -m mmpdblib fragment test_struct.tsv --max-rotatable-bonds 20 --num-cuts 3 -o test.fragments
python -m mmpdblib index test.fragments -o test.mmpdb
python -m mmpdblib loadprops --properties test_prop.tsv test.mmpdb

The indexed pairs makes sense.

However, when I run:

python -m mmpdblib transform --smiles 'OC(c(cccc1)c1O)=O' test.mmpdb --explain

I noticed that I cannot get mol2 or mol3, where the rules mol1->mol2 and mol1->mol3 is included in the index step. Did I miss something here? Thank you for your help.

Here's the explanation output:

WARNING: APSW not installed. Falling back to Python's sqlite3 module.
Processing fragment Fragmentation(1, 'N', 7, '1', '*c1ccccc1O', '0', 3, '1', '*C(=O)O', 'O=CO')
  variable '*c1ccccc1O' not found as SMILES '[*:1]c1ccccc1O'
  No matching rule SMILES found. Skipping fragment.
Processing fragment Fragmentation(1, 'N', 3, '1', '*C(=O)O', '0', 7, '1', '*c1ccccc1O', 'Oc1ccccc1')
  variable '*C(=O)O' not found as SMILES '[*:1]C(=O)O'
  No matching rule SMILES found. Skipping fragment.
Processing fragment Fragmentation(2, 'N', 6, '11', '*c1ccccc1*', '01', 4, '12', '*C(=O)O.*O', None)
  variable '*c1ccccc1*' not found as SMILES '[*:1]c1ccccc1[*:2]'
  variable '*c1ccccc1*' not found as SMILES '[*:2]c1ccccc1[*:1]'
  No matching rule SMILES found. Skipping fragment.
Processing fragment Fragmentation(1, 'N', 1, '1', '*O', '0', 9, '1', '*c1ccccc1C(=O)O', 'O=C(O)c1ccccc1')
  variable '*O' not found as SMILES '[*:1]O'
  No matching rule SMILES found. Skipping fragment.
Processing fragment Fragmentation(1, 'N', 9, '1', '*c1ccccc1C(=O)O', '0', 1, '1', '*O', 'O')
  variable '*c1ccccc1C(=O)O' not found as SMILES '[*:1]c1ccccc1C(=O)O'
  No matching rule SMILES found. Skipping fragment.
Processing fragment Fragmentation(2, 'N', 6, '11', '*c1ccccc1*', '01', 4, '12', '*O.*C(=O)O', None)
  variable '*c1ccccc1*' not found as SMILES '[*:1]c1ccccc1[*:2]'
  variable '*c1ccccc1*' not found as SMILES '[*:2]c1ccccc1[*:1]'
  No matching rule SMILES found. Skipping fragment.
== Product SMILES in database: 0 ==
ID      SMILES  prop_from_smiles        prop_to_smiles  prop_radius     prop_fingerprint      prop_rule_environment_id        prop_count      prop_avg        prop_std      
  prop_kurtosis prop_skewness   prop_min        prop_q1 prop_median     prop_q3 prop_max      prop_paired_t   prop_p_value
@mu-wang mu-wang changed the title Trasnform for larger fragments mmpdb transform behaves unexpectedly Feb 5, 2022
@adalke
Copy link
Contributor

adalke commented Apr 12, 2022

I believe what's happening is that transform works on the variable part, but hydrogens aren't treated as the variable *[H] but instead are treated as a special case.

If so, I don't remember if transformation from a hydrogen was deliberately not included in the "transform" operation, or if it was an oversight.

@adalke
Copy link
Contributor

adalke commented Apr 12, 2022

As Jérôme and Christian point out, hydrogen transformations were explicitly not included as there would be too many.

The transform option lets you specify a specific hydrogen to consider, by denoting it with an explicit [H] in the SMILES string.

However, that code path has not been used for years and it does not work in the main mmpdb release. (RDKit changed its wildcard representation from [*] to * about five years ago, and mmpdb used a hard-coded [*][H] to recognize the cut hydrogen SMILES fragment.)

The fixed code is available in the v3 development version, available from https://github.com/adalke/mmpdb/tree/v3-dev .

@mu-wang
Copy link
Author

mu-wang commented Jun 7, 2022

Hi @adalke , thank you for your help. I will try the v3-dev version of mmpdb.

@djhuggins
Copy link

Hello. I am restarting this thread as I have a follow up question. I am able to specify a hydrogen to consider by denoting it with an explicit [H] in the SMILES string. I also note that one can specify multiple hydrogens this way and vary them all.

However, it appears that if you specify one or more hydrogens then only the specified hydrogen position(s) are modified, with the rest of the molecule remaining unchanged. Is there a way to vary the hydrogen(s) and the rest of the molecule? Or a flag to vary all hydrogens? I understand this may generate large numbers of compounds.

Thanks.

@KramerChristian
Copy link
Contributor

Hi DJ,

back when we designed mmpdb for the first time, we found that the number of compounds generated can become extremely large if you allow for both H and Fragment exchanges. In fact, depending on your database, the number of compounds generated can already become really large if you allow for replacing all hydrogens. We therefore decided to only allow either exchange of explicit hydrogens or fragments that include at least one heavy atom.
You can change this behaviour, but you'd have to hack the code a bit. Before doing that, I'd recommend you test whether the output is still manageable by what you want to do with it - just make all hydrogens explicit in the input molecule and see what happens.

Bests,
Christian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants