Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure with mmpdb fragment for some specific smiles #30

Open
chengthefang opened this issue Apr 4, 2021 · 7 comments
Open

Failure with mmpdb fragment for some specific smiles #30

chengthefang opened this issue Apr 4, 2021 · 7 comments

Comments

@chengthefang
Copy link

chengthefang commented Apr 4, 2021

Hi all,

I am using mmpdb fragment to parse a subset of SureChembl database, and then I found the mmpdb fragment will fail for some specific smiles. I wonder if we could add some error handling to deal with some unfavorable structures.

Here is the example of test.smi.

C[C@]12CCC3c4c5cc(O)cc4[C@@]4(CC[C@@]1(C4)C3CC5)[C@@H]2O SCHEMBL9251776
Oc1ccccc1 phenol
Oc1ccccc1O catechol
Oc1ccccc1N 2-aminophenol
Oc1ccccc1Cl 2-chlorophenol
Nc1ccccc1N o-phenylenediamine
Nc1cc(O)ccc1N amidol
Oc1cc(O)ccc1O hydroxyquinol
Nc1ccccc1 phenylamine
C1CCCC1N cyclopentanol

I ran "python mmpdb/mmpdb fragment test.smi -o test_data.fragments". It failed on parsing the first smiles and won't skip it to continue. The error is shown as below:

Failure: file 'test.smi', line 1, record #1: first line starts 'C[C@]12CCC3c4c5cc(O)cc4[C@@]4(CC[C@@]1(C ...'
Traceback (most recent call last): File "mmpdb/mmpdb", line 11, in commandline.main() File "/mmpdb/mmpdblib/commandline.py", line 1054, in main parsed_args.command(parsed_args.subparser, parsed_args) File "/mmpdb/mmpdblib/commandline.py", line 181, in fragment_command do_fragment.fragment_command(parser, args) File "/mmpdb/mmpdblib/do_fragment.py", line 581, in fragment_command writer.write_records(records) File "/mmpdb/mmpdblib/fragment_io.py", line 404, in write_records for rec in fragment_records: File "/mmpdb/mmpdblib/do_fragment.py", line 475, in make_fragment_records fragments = result.get() File "anaconda2/lib/python2.7/multiprocessing/pool.py", line 572, in get raise self._value ValueError: need more than 1 value to unpack

Appreciate any suggestions or ideas.

Thanks,
Cheng

@KramerChristian
Copy link
Contributor

Hi Cheng,

thanks for pointing out this issue.

mmpdb does have functionality to skip erroneous SMILES, but this one seems to be another problem - the SMILES is complicated, but chemically correct. The most likely explanation I have so far is that there is an issue with the ring perception for bonds in RDKit. I will do some further tests to make sure I am on the right track, and if I am right, file a bug report in RDKit to solve the issue.

Will keep you posted as this continues.

Bests,
Christian

@chengthefang
Copy link
Author

Hi Christian,

Thank you so much for looking into this issue. I agree that it might have something to do with the complicated ring system.

Thanks,
Cheng

@PARODBE
Copy link

PARODBE commented Nov 15, 2022

Hi Christian,

I can't convert my .smi to fragment for a UTF-8 problem, but i don't understand this because I specify in the code the encoding:

image

And the error:

image

Could you help me please???

@KramerChristian
Copy link
Contributor

Hi Pablo,

I currently do not personally develop mmpdb any more. This is in the hands of @adalke and Jerome Hert. Maybe they can comment?

Bests,
Christian

@adalke
Copy link
Contributor

adalke commented Nov 18, 2022

For @chengthefang , I cannot reproduce the problem using mmpdb3, available from https://github.com/adalke/mmpdb . Perhaps some of the changes I did for version 3 resolves your issue?

For @PARODBE , your comment is not connected to this issue. Please use a new issue instead.

It doesn't appear your problem is connected to mmpdb. It appears to be a general RDKit question. At the very least, you don't describe how "cdk2.fragdb" is generated, or the step you did which generates that error message.

My guess is you're showing me how you exported the SDF to SMILES format, which you then converted to a "fragdb" using mmpdb v3.

Version 2 used a text format to store the fragmentations, version 3 switched to sqlite3. You cannot use text processing to read an SQLite3 file as it's a binary format which includes non-UTF8 byte sequences.

@PARODBE
Copy link

PARODBE commented Nov 21, 2022

thanks @adalke ! So...In what format were the saved smiles provided?

@adalke
Copy link
Contributor

adalke commented Nov 21, 2022

It's an SQLite3 file. This is the format specified by the SQLite embedded relational database, and accessible from Python via the sqlite3 module.

The specific schema is at https://github.com/adalke/mmpdb/blob/v3-dev/mmpdblib/fragment_schema.sql .

Your question is not related to issue #30 so please do not continue asking questions in this thread. Also, I am not willing to provide additional support on how use SQL or SQLite. There are many existing teaching resources for those topics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants