Skip to content

fomightez/PDBrenum

 
 

Repository files navigation

PDBrenum

Renumber Protein Data Bank files according to their UniProt sequences.

badge

Take a demonstration tour of using the script by clicking the above badge to launch a Jupyter session in your browswer.

For those who only have a few structures/sequences to process, you can find a webserver to do the renumbering here.

Launch directly into a companion notebook demonstrating mapping chain identifiers to UniProt identifiers by clicking here.

Note: the 'pandas2' branch of this fork has been fully updated to be compatible with Pandas v2. Also launches from there, will give you a Jupyter session with Pandas version 2+ if you have code or scripts that works with that Pandas version.


Description:

Here we provide PDBrenum (python=3.6 application) that fixes the PDB sequence numbering problem by replacing the author numbering with numbering derived from the corresponding UniProt sequences. We obtain this correspondence from the SIFTS database from PDBe (https://www.ebi.ac.uk/pdbe/docs/sifts/). PDBrenum can take a list of PDB entries and provide renumbered files in the mmCIF format and the legacy PDB format for both asymmetric unit files and biological assembly files provided by PDB. PDBrenum was heavily tested on all PDB structure files in both formats and on all popular operating systems (Linux, Mac and Windows).

Setting up PDBrenum:
Prerequisites anaconda should be installed: 
https://docs.anaconda.com/anaconda/install/

The following commands will set up a conda environment for running PDBrenum locally:
(base) $ git clone https://github.com/Faezov/PDBrenum.git
(base) $ cd PDBrenum
(base) $ conda create -n PDBrenum python=3.6 numpy=1.17 pandas=0.25.1 biopython=1.76 tqdm=4.36.1 ipython=7.8.0 requests=2.25.1 lxml=4.6.2 
(base) $ conda activate PDBrenum
Running PDBrenum:
Testing PDBrenum (please note that for Windows OS it's just python NOT python3):
(PDBrenum) $ python3 PDBrenum.py -h

Users can provide PDBids directly as a list of arguments (-rfla --renumber_from_list_of_arguments):
(PDBrenum) $ python3 PDBrenum.py -rfla 1d5t 1bxw 2vl3 5e6h -mmCIF
(PDBrenum) $ python3 PDBrenum.py -rfla 1d5t 1bxw 2vl3 5e6h -PDB
(PDBrenum) $ python3 PDBrenum.py -rfla 1d5t 1bxw 2vl3 5e6h -mmCIF_assembly
(PDBrenum) $ python3 PDBrenum.py -rfla 1d5t 1bxw 2vl3 5e6h -PDB_assembly

or put PDBids in text file (comma, space or tab delimited) (-rftf --renumber_from_text_file):
(PDBrenum) $ python3 PDBrenum.py -rftf input.txt -mmCIF
(PDBrenum) $ python3 PDBrenum.py -rftf input.txt -PDB
(PDBrenum) $ python3 PDBrenum.py -rftf input.txt -mmCIF_assembly
(PDBrenum) $ python3 PDBrenum.py -rftf input.txt -PDB_assembly

The user can renumber the entire PDB in a given format (by default in mmCIF if no format was provided):
(PDBrenum) $ python3 PDBrenum.py -redb -mmCIF 
(PDBrenum) $ python3 PDBrenum.py -redb -PDB
(PDBrenum) $ python3 PDBrenum.py -redb -mmCIF_assembly
(PDBrenum) $ python3 PDBrenum.py -redb -PDB_assembly


Note that sometimes on Windows biopython module might be installed incorrectly by conda and it will cause module error in python. 
To resolve this problem simply run: 
(PDBrenum) $ pip install biopython==1.76

PDBrenum uses multiprocessing (by default it will use all available CPUs) 
but the usercan set a limit to the numer of CPUs by providing number to -nproc flag:
"-nproc", "--set_number_of_processes"

Users can also change where input output files will go by using these self-explanatory flags (with absolute paths):
"-sipm", "--set_default_input_path_to_mmCIF"
"-sipma", "--set_default_input_path_to_mmCIF_assembly"
"-sipp", "--set_default_input_path_to_PDB"
"-sippa", "--set_default_input_path_to_PDB_assembly"
"-sips", "--set_default_input_path_to_SIFTS"
"-sopm", "--set_default_output_path_to_mmCIF"
"-sopma", "--set_default_output_path_to_mmCIF_assembly"
"-sopp", "--set_default_output_path_to_PDB"
"-soppa", "--set_default_output_path_to_PDB_assembly"

By default, files go here: 
default_input_path_to_mmCIF = current_directory + "/mmCIF"
default_input_path_to_mmCIF_assembly = current_directory + "/mmCIF_assembly"
default_input_path_to_PDB = current_directory + "/PDB"
default_input_path_to_PDB_assembly = current_directory + "/PDB_assembly"
default_input_path_to_SIFTS= current_directory + "/SIFTS"
default_output_path_to_mmCIF = current_directory + "/output_mmCIF"
default_output_path_to_mmCIF_assembly = current_directory + "/output_mmCIF_assembly"
default_output_path_to_PDB = current_directory + "/output_PDB"
default_output_path_to_PDB_assembly = current_directory + "/output_PDB_assembly"

Also, by default all files gzipped if you want to have them unzipped please use: 
"-offz" or "--set_to_off_mode_gzip"


Roland Dunbrack's Lab
Fox Chase Cancer Center
Philadelphia, PA
2020

Attributions

Users of PDBrenum should cite:

Clarifying Software Attribution: I, Wayne, am not involved in the PDBrenum software at all. Those in the lab of Roland Dunbrack are the developers and source of PDBrenum. See their materials, such as the original repo and accompany article. I simply set up this repository to make the software useable on the command line without installation headaches and in a full-featured, browser-based computational environment.

I, Wayne, borrrowed the highligthed introductory text about notebooks at the top of the included notebook from Tim Sherratt's notebook here](https://github.com/GLAM-Workbench/te-papa-api/blob/master/Exploring-the-Te-Papa-collection-API.ipynb).


Demonstration of PDBrenum

Click the launch badge below to launch a Jupyter notebook that steps through a demonstration of several of above commands.

No installations are needed as everything is already installed. The computing is privided by MyBinder.org.

After demonstrating it, substitute in PDB ids of interest to you and then run.
If you make anything useful download it to your local computer, as these sessions are ephemeral.

badge

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 95.9%
  • Python 4.1%