PDBrenum

Renumber Protein Data Bank files according to their UniProt sequences.

Take a demonstration tour of using the script by clicking the above badge to launch a Jupyter session in your browswer.

For those who only have a few structures/sequences to process, you can find a webserver to do the renumbering here.

Launch directly into a companion notebook demonstrating mapping chain identifiers to UniProt identifiers by clicking here.

Note: the 'pandas2' branch of this fork has been fully updated to be compatible with Pandas v2. Also launches from there, will give you a Jupyter session with Pandas version 2+ if you have code or scripts that works with that Pandas version.

Description:

Here we provide PDBrenum (python=3.6 application) that fixes the PDB sequence numbering problem by replacing the author numbering with numbering derived from the corresponding UniProt sequences. We obtain this correspondence from the SIFTS database from PDBe (https://www.ebi.ac.uk/pdbe/docs/sifts/). PDBrenum can take a list of PDB entries and provide renumbered files in the mmCIF format and the legacy PDB format for both asymmetric unit files and biological assembly files provided by PDB. PDBrenum was heavily tested on all PDB structure files in both formats and on all popular operating systems (Linux, Mac and Windows).

Setting up PDBrenum:

Prerequisites anaconda should be installed: 
https://docs.anaconda.com/anaconda/install/

The following commands will set up a conda environment for running PDBrenum locally:
(base) $ git clone https://github.com/Faezov/PDBrenum.git
(base) $ cd PDBrenum
(base) $ conda create -n PDBrenum python=3.6 numpy=1.17 pandas=0.25.1 biopython=1.76 tqdm=4.36.1 ipython=7.8.0 requests=2.25.1 lxml=4.6.2 
(base) $ conda activate PDBrenum

Running PDBrenum:

Testing PDBrenum (please note that for Windows OS it's just python NOT python3):
(PDBrenum) $ python3 PDBrenum.py -h

Users can provide PDBids directly as a list of arguments (-rfla --renumber_from_list_of_arguments):
(PDBrenum) $ python3 PDBrenum.py -rfla 1d5t 1bxw 2vl3 5e6h -mmCIF
(PDBrenum) $ python3 PDBrenum.py -rfla 1d5t 1bxw 2vl3 5e6h -PDB
(PDBrenum) $ python3 PDBrenum.py -rfla 1d5t 1bxw 2vl3 5e6h -mmCIF_assembly
(PDBrenum) $ python3 PDBrenum.py -rfla 1d5t 1bxw 2vl3 5e6h -PDB_assembly

or put PDBids in text file (comma, space or tab delimited) (-rftf --renumber_from_text_file):
(PDBrenum) $ python3 PDBrenum.py -rftf input.txt -mmCIF
(PDBrenum) $ python3 PDBrenum.py -rftf input.txt -PDB
(PDBrenum) $ python3 PDBrenum.py -rftf input.txt -mmCIF_assembly
(PDBrenum) $ python3 PDBrenum.py -rftf input.txt -PDB_assembly

The user can renumber the entire PDB in a given format (by default in mmCIF if no format was provided):
(PDBrenum) $ python3 PDBrenum.py -redb -mmCIF 
(PDBrenum) $ python3 PDBrenum.py -redb -PDB
(PDBrenum) $ python3 PDBrenum.py -redb -mmCIF_assembly
(PDBrenum) $ python3 PDBrenum.py -redb -PDB_assembly


Note that sometimes on Windows biopython module might be installed incorrectly by conda and it will cause module error in python. 
To resolve this problem simply run: 
(PDBrenum) $ pip install biopython==1.76

PDBrenum uses multiprocessing (by default it will use all available CPUs) 
but the usercan set a limit to the numer of CPUs by providing number to -nproc flag:
"-nproc", "--set_number_of_processes"

Users can also change where input output files will go by using these self-explanatory flags (with absolute paths):
"-sipm", "--set_default_input_path_to_mmCIF"
"-sipma", "--set_default_input_path_to_mmCIF_assembly"
"-sipp", "--set_default_input_path_to_PDB"
"-sippa", "--set_default_input_path_to_PDB_assembly"
"-sips", "--set_default_input_path_to_SIFTS"
"-sopm", "--set_default_output_path_to_mmCIF"
"-sopma", "--set_default_output_path_to_mmCIF_assembly"
"-sopp", "--set_default_output_path_to_PDB"
"-soppa", "--set_default_output_path_to_PDB_assembly"

By default, files go here: 
default_input_path_to_mmCIF = current_directory + "/mmCIF"
default_input_path_to_mmCIF_assembly = current_directory + "/mmCIF_assembly"
default_input_path_to_PDB = current_directory + "/PDB"
default_input_path_to_PDB_assembly = current_directory + "/PDB_assembly"
default_input_path_to_SIFTS= current_directory + "/SIFTS"
default_output_path_to_mmCIF = current_directory + "/output_mmCIF"
default_output_path_to_mmCIF_assembly = current_directory + "/output_mmCIF_assembly"
default_output_path_to_PDB = current_directory + "/output_PDB"
default_output_path_to_PDB_assembly = current_directory + "/output_PDB_assembly"

Also, by default all files gzipped if you want to have them unzipped please use: 
"-offz" or "--set_to_off_mode_gzip"


Roland Dunbrack's Lab
Fox Chase Cancer Center
Philadelphia, PA
2020

Attributions

Users of PDBrenum should cite:

PDBrenum: A webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences. Faezov B, Dunbrack RL Jr.PLoS One. 2021 Jul 6;16(7):e0253411. doi: 10.1371/journal.pone.0253411. eCollection 2021. PMID: 34228733

Clarifying Software Attribution: I, Wayne, am not involved in the PDBrenum software at all. Those in the lab of Roland Dunbrack are the developers and source of PDBrenum. See their materials, such as the original repo and accompany article. I simply set up this repository to make the software useable on the command line without installation headaches and in a full-featured, browser-based computational environment.

I, Wayne, borrrowed the highligthed introductory text about notebooks at the top of the included notebook from Tim Sherratt's notebook here](https://github.com/GLAM-Workbench/te-papa-api/blob/master/Exploring-the-Te-Papa-collection-API.ipynb).

Demonstration of PDBrenum

Click the launch badge below to launch a Jupyter notebook that steps through a demonstration of several of above commands.

No installations are needed as everything is already installed. The computing is privided by MyBinder.org.

After demonstrating it, substitute in PDB ids of interest to you and then run.
If you make anything useful download it to your local computer, as these sessions are ephemeral.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
binder		binder
src		src
LICENSE		LICENSE
PDBrenum.ipynb		PDBrenum.ipynb
PDBrenum.py		PDBrenum.py
README.md		README.md
chainID_mapping_to_UniProt_id_demo.ipynb		chainID_mapping_to_UniProt_id_demo.ipynb
demo.ipynb		demo.ipynb
input.txt		input.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDBrenum

Description:

Setting up PDBrenum:

Running PDBrenum:

Attributions

Demonstration of PDBrenum

About

Releases

Packages

Languages

License

fomightez/PDBrenum

Folders and files

Latest commit

History

Repository files navigation

PDBrenum

Description:

Setting up PDBrenum:

Running PDBrenum:

Attributions

Demonstration of PDBrenum

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages