|
|
|
|
Since the early 1970s, 3-dimensional (3D) macromolecular structures are stored at the PDB. From then until 30 June, 1999, the PDB was located at Brookhaven National Laboratory, but on the first of July, 1999, the PDB moved to the Research Collaboratory on Structural Bioinformatics (RCSB) at Rutgers University, SDSC (San Diego Supercomputer Centre), and NIST (National Institute for Standards and Technology).
Coordinates and other information about solved macromolecular structures (provided they have been deposited, of course) can be downloaded from the PDB, and for each entry there are also many useful links to other databases and services. The PDB can be accessed at three different sites (plus a number of mirror sites outside the USA):
As the figure below shows, the growth of the number of deposited structures was slow until the mid-1990s, but appears to be roughly parabolic at present:
|
PDB DEPOSITION AND GROWTH STATISTICS FOR 2001
In 2001, 3,298 structures were deposited to the PDB, and were processed by
teams at RCSB-Rutgers (75%), Osaka University (10%), and the European
Bioinformatics Institute (14%).
Of the structures deposited, 72% were deposited with a release status
of "hold until publication"; 16% were released as soon as annotation
of the entry was complete; and 12% were held until a particular date.
82% of these entries were determined by X-ray crystallographic
methods; 15% were determined by NMR methods.
The growth of the PDB can be seen in the increase in the number of
residues added into the archive each year. 1,680,053 residues were
released into the archive in 2001, as compared to 1,314,912 in 2000 and
1,068,340 in 1999.
|
There is an archive of PDB Newslettters (dating all the way back to September, 1974 ...). Using these, you can get an idea of the growth and transformation that the PDB has undergone over the years.
Rate of PDB Holdings Growth Predicted in 1978?In 1978, Richard E. Dickerson examined the number of available crystal structures. Based upon that number, he came upon an equation to describe the exponential growth for solved crystal structures. In a letter describing a book he was working on with Irving Geis, Dickerson noted (and illustrated with a hand-drawn graph) that the number of new structures appeared to be following the exponential law n = exp(0.19 y), where n is the number of new structures per year and y is the year number since 1960. This equation predicted that at the end of 2001, there would be 13,941 crystal structure entries available in the PDB (14,000 crystal structures are currently available). Using this equation, there should be 24667 crystal structures in 2004. ![]() (Taken from the PDB News page, February 2002.) |
Up-to-date information about the number of structures in the PDB can be found here. On that page you can also find information about the fraction of deposited protein structures that has a new fold (i.e., not similar to the fold of any previously deposited protein).
Go to one of the PDB web-sites that are listed above. Every entry in the PDB is characterised by a unique 4-character string (a "PDB-id") that starts with a number between 1 and 9, followed by 3 characters from the set [A-Z,0-9].
|
Let us first explore what kind of information is available for
individual structures. In the field "Search the Archive", you can
enter an existing PDB code. Enter "1cbs" here (without the quotes),
check the box "query by PDB id only", and hit the "Find a structure"
button.
On the right, you see a panel with general information about
the structure, who solved it, where it was published etc. There
is a link to the MEDLINE literature database at NCBI which will list
the paper that describes the structure plus any papers that
were cited by the structure depositors. Find the abstract of
the paper that describes this particular structure.
Now let's have a look at an actual PDB file. Click on "Download/Display File" in the menu on the left. From the first table, select the HTML format listing of the complete entry with coordinates in PDB format. As you can see, the PDB format is not unlike the SWISS-PROT format: every line begins with a 6-character keyword (HEADER, COMPND, etc.) followed by information, formatted according to specific rules (if you click on an underlined keyword, you will get a new window with a description of it). The lines that begin with the keyword "ATOM " describe one atom each. Each line contains information such as atom name, residue type and number, and the X, Y and Z coordinates of the atom.
Have a look at the "Sequence Details" from the menu on the left.
If you click on "Structural Neighbors", you will get a list of sites that contain information concerning other proteins (not necessarily related in sequence or function !) that have a similar fold. We will encounter some of these later on, but feel free to explore some of them now.
If you click on "Other Sources", you will be presented with a list of links to other WWW sites that carry information concerning this particular protein structure. We will encounter some of these later on as well, but again: feel free to check out some of these sites.
Now let's do some simple searching of the PDB. Click on "SearchLite" at the bottom of the menu on the left. Read through the table with example queries.
There is more than one easy way to find the correct answer to Q. 6 and 7. One is to select all structures (in the pull down menu) and click go. Then, also in the pull down menu, select "Create A Tabular Report" and again click go. In the next form choose an option to view the list of pdb-files with information on authors and journal.
For a more advanced search, use the "SearchFields" search option. Scroll to the bottom of this page; there you can select which criteria you want to use in your search. Select all items and hit the "New Form" button.
You can also search the database with sequences. Locate the
FASTA search option on the search form. Check the "Use PDB ID"
box and enter "1cbs" in the box. Set the E-value cut-off to
0.001 and hit the "Search" button.
If it does not work: find the 1cbs entry, click on "Sequence details" and download the sequence in FASTAformat. Paste the sequence into the box in the SearchFields and set the E-value cut-off to 0.001. Hit the "Search" button.
|
Now let's find the longest single-chain macromolecule for which an structure determined by X-ray diffraction is available. Use your search form to find all entries that contain a chain of at least 2,000 residues. When you get a list of entries, select "Select all structures" from the pop-up menu at the top of the page, and hit the "Go" button. Then select "Create a tabular report" and hit "Go" again. From the following table, create a custom report in HTML format with only the chain length and chain identifier checked.
But the reason we came here was to find coordinates for subtilisin.
Go back to the SearchLite form and search for "subtilisin".
You should find a fairly large number of hits ! You can get
more information by clicking on the "{EXPLORE}" link that
is shown for every structure. Do this now for the structure
with PDB code 1SBC.
Select the "Download/Display File" option of 1SBC and "Download the Structure File", with no compression and in PDB format. This will save the complete PDB entry on your local disk for later use. Make sure to save the structure in a directory that belongs to you !
Latest update at 20 January, 2004.