Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to retrieve UniProt data #100

Closed
HobnobMancer opened this issue Nov 18, 2022 · 4 comments · Fixed by #103
Closed

Failing to retrieve UniProt data #100

HobnobMancer opened this issue Nov 18, 2022 · 4 comments · Fixed by #103
Assignees
Labels
bug Something isn't working

Comments

@HobnobMancer
Copy link
Owner

Describe the bug

When using cw_get_uniprot_data to retrieve data from UniProt, no data is retrieved and added to the local CAZyme database

To Reproduce

  1. Build a local CAZyme database: cazy_webscraper <email> -o cazy.db
  2. cw_get_uniprot_data cazy.db --families 20 --pdb
Built output directory: .cazy_webscraper_2022-11-18_20-03-08/uniprot_data_retrieval
Using default CAZy class synonyms
Retrieving GenBank accessions for selected CAZy classes: 0it [00:00, ?it/s]
Applying CAZy family filter(s)
Retrieving GenBank accessions for selected CAZy families:   0%|                                                           | 0/1 [00:00<?, ?it/s]Retrieving CAZymes for CAZy family PL20
Retrieving GenBank accessions for selected CAZy families: 100%|███████████████████████████████████████████████████| 1/1 [00:00<00:00, 20.02it/s]
Applying no taxonomic filters
Retrieving UniProt data for 76
Batch retrieving UniProt IDs: 11it [00:00, 15.03it/s]                                                                                           
Batch retrieving protein data from UniProt: 0it [00:00, ?it/s]
Adding data to the local CAZyme database
Retrieving existing UniProt records from db: 0it [00:00, ?it/s]
Separating new and existing records: 0it [00:00, ?it/s]
Loading existing PDB db records: 0it [00:00, ?it/s]
Identifying new PDBs to add to db: 0it [00:00, ?it/s]
Loading existing Genbank_Pdbs db records: 0it [00:00, ?it/s]
Identifying new protein-PDB relationships to add to db: 0it [00:00, ?it/s]

No data is retrieved from UniProt.

Expected behavior

Retrieve data from UniProt and add to the local CAZyme database

@HobnobMancer HobnobMancer added the bug Something isn't working label Nov 18, 2022
@HobnobMancer HobnobMancer self-assigned this Nov 18, 2022
@mherold1
Copy link

mherold1 commented Dec 6, 2022

Hi,
when testing the tool I noticed that I had problems at this step and after searching I noticed that the requests via:
get_uniprot_accessions() from https://github.com/HobnobMancer/saintBioutils/blob/master/saintBioutils/uniprot/__init__.py were failing.
Apparently the UniProt API has recently changed.
Maybe this is helpful for replacing the queries: https://github.com/multimeric/Unipressed

@HobnobMancer
Copy link
Owner Author

Hi,

Thanks for using cazy_webscraper!

I found the cause of the issue a couple of weeks back. It wasn't with saintBioutils, the minimum required version of bioservices needed to be updated - but I forgot to document this here, so my bad!

The Fix
If you install the latest version of bioservices then cazy_webscraper will be able to communicate with the new UniProt API.

The required bioservices version will be updated shortly.

In the next couple of weeks, we will also altering how the cazy_webscraper links NCBI protein version accessions to their corresponding record in UniProt. A more robust method for identifying records that are related (i.e. linking a NCBI protein record to it's corresponding UniProt record) is planned to be available in 2.2.4.

@mherold1
Copy link

mherold1 commented Dec 7, 2022

Thanks for the quick response.
Are you sure that the issue is related to the bioservices version? I had 1.10.4 (the latest?) installed.
I went through the script:
https://github.com/HobnobMancer/cazy_webscraper/blob/master/cazy_webscraper/expand/uniprot/get_uniprot_data.py
and where it is failing for me is at the EMBL to Uniprot accessions mapping step through saintBioutils. The returned uniprot_gkb_dict is empty in:
https://github.com/HobnobMancer/cazy_webscraper/blob/master/cazy_webscraper/expand/uniprot/get_uniprot_data.py#L180
When testing the other script it was failing at the request (or L94):
https://github.com/HobnobMancer/saintBioutils/blob/master/saintBioutils/uniprot/__init__.py#L98

@HobnobMancer HobnobMancer linked a pull request Dec 8, 2022 that will close this issue
@HobnobMancer
Copy link
Owner Author

This issues should now be fixed in v2.2.3 - PR #103

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants