-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DrugBank 5.0: extract pubmed IDs for references #2
Comments
Looks like DrugBank 5.0 uses a different schema for references. From https://www.drugbank.ca/releases/5-0-7/downloads/all-full-database, I see the following XML: <references>
<articles>
<article>
<pubmed-id>10505536</pubmed-id>
<citation>Turpie AG: Anticoagulants in acute coronary syndromes. Am J Cardiol. 1999 Sep 2;84(5A):2M-6M.</citation>
</article>
<article>
<pubmed-id>10912644</pubmed-id>
<citation>Warkentin TE: Venous thromboembolism in heparin-induced thrombocytopenia. Curr Opin Pulm Med. 2000 Jul;6(4):343-51.</citation>
</article>
<article>
<pubmed-id>11055889</pubmed-id>
<citation>Eriksson BI: New therapeutic options in deep vein thrombosis prophylaxis. Semin Hematol. 2000 Jul;37(3 Suppl 5):7-9.</citation>
</article>
<article>
<pubmed-id>11467439</pubmed-id>
<citation>Fabrizio MC: Use of ecarin clotting time (ECT) with lepirudin therapy in heparin-induced thrombocytopenia and cardiopulmonary bypass. J Extra Corpor Technol. 2001 May;33(2):117-25.</citation>
</article>
<article>
<pubmed-id>11807012</pubmed-id>
<citation>Szaba FM, Smiley ST: Roles for thrombin and fibrin(ogen) in cytokine/chemokine production and macrophage adhesion in vivo. Blood. 2002 Feb 1;99(3):1053-9.</citation>
</article>
<article>
<pubmed-id>11752352</pubmed-id>
<citation>Chen X, Ji ZL, Chen YZ: TTD: Therapeutic Target Database. Nucleic Acids Res. 2002 Jan 1;30(1):412-5.</citation>
</article>
</articles>
<textbooks/>
<links/>
</references> So you have to modify pubmed_ids = protein.findall("{ns}references//{ns}pubmed-id".format(ns=ns))
row['pubmed_ids'] = '|'.join(x.text for x in pubmed_ids) Let us know whether this works. Also pull requests to upgrade this repo to DrugBank 5.0 would be of interest. |
Thanks for your quick response!
Your suggested XPath query seems to work, only 3 entries were *None* is returned, which might be a database issue. I have no further upgrades to the repo for Drugbank 5.0 compatibility, so hence please go forward with this (minor) change.
|
In case it helps anyone else, the following changes (based on the suggestion above) fixed the issue for me: pubmed_ids = protein.findall("{ns}references//{ns}pubmed-id".format(ns=ns))
row['pubmed_ids'] = '|'.join([x.text for x in pubmed_ids if x.text is not None]) |
doesn't seem to catch anything on the latest drugbank 5 release.
Any bugfix for this?
The text was updated successfully, but these errors were encountered: