Since the early 1970s, 3-dimensional (3D) macromolecular structures are stored at the PDB. From then until 30 June, 1999, the PDB was located at Brookhaven National Laboratory, but on the first of July, 1999, the PDB moved to the Research Collaboratory on Structural Bioinformatics (RCSB) at Rutgers University, SDSC (San Diego Supercomputer Centre), and NIST (National Institute for Standards and Technology).

Coordinates and other information about solved macromolecular structures (provided they have been deposited, of course) can be downloaded from the PDB, and for each entry there are also many useful links to other databases and services. The PDB can be accessed at three different sites (plus a number of mirror sites outside the USA):

As the figure below shows, the growth of the number of deposited structures was slow until the mid-1990s, but appears to be roughly parabolic at present:


      PDB DEPOSITION AND GROWTH STATISTICS FOR 2001

In 2001, 3,298 structures were deposited to the PDB, and were processed by 
teams at RCSB-Rutgers (75%), Osaka University (10%), and the European 
Bioinformatics Institute (14%).

Of the structures deposited, 72% were deposited with a release status
of "hold until publication"; 16% were released as soon as annotation
of the entry was complete; and 12% were held until a particular date.
82% of these entries were determined by X-ray crystallographic
methods; 15% were determined by NMR methods.

The growth of the PDB can be seen in the increase in the number of
residues added into the archive each year.  1,680,053 residues were
released into the archive in 2001, as compared to 1,314,912 in 2000 and
1,068,340 in 1999.

There is an archive of PDB Newslettters (dating all the way back to September, 1974 ...). Using these, you can get an idea of the growth and transformation that the PDB has undergone over the years.

Rate of PDB Holdings Growth Predicted in 1978?

In 1978, Richard E. Dickerson examined the number of available crystal structures. Based upon that number, he came upon an equation to describe the exponential growth for solved crystal structures.

In a letter describing a book he was working on with Irving Geis, Dickerson noted (and illustrated with a hand-drawn graph) that the number of new structures appeared to be following the exponential law n = exp(0.19 y), where n is the number of new structures per year and y is the year number since 1960. This equation predicted that at the end of 2001, there would be 13,941 crystal structure entries available in the PDB (14,000 crystal structures are currently available). Using this equation, there should be 24667 crystal structures in 2004.

(Taken from the PDB News page, February 2002.)

Up-to-date information about the number of structures in the PDB can be found here. On that page you can also find information about the fraction of deposited protein structures that has a new fold (i.e., not similar to the fold of any previously deposited protein).

Q. 1. Can you explain the large discrepancy between the number of deposited "chains" (protein structures) and the number of new folds ?

Go to one of the PDB web-sites that are listed above. Every entry in the PDB is characterised by a unique 4-character string (a "PDB-id") that starts with a number between 1 and 9, followed by 3 characters from the set [A-Z,0-9].

Q. 2. How many structures can be archived until this numbering scheme is exhausted ?

Let us first explore what kind of information is available for individual structures. In the field "Search the Archive", you can enter an existing PDB code. Enter "1cbs" here (without the quotes), check the box "query by PDB id only", and hit the "Find a structure" button.

On the right, you see a panel with general information about the structure, who solved it, where it was published etc. There is a link to the MEDLINE literature database at NCBI which will list the paper that describes the structure plus any papers that were cited by the structure depositors. Find the abstract of the paper that describes this particular structure.

Q. 3. Which residues in this protein interact with the carboxylate group of the ligand (all-trans-retinoic acid) ?

Now let's have a look at an actual PDB file. Click on "Download/Display File" in the menu on the left. From the first table, select the HTML format listing of the complete entry with coordinates in PDB format. As you can see, the PDB format is not unlike the SWISS-PROT format: every line begins with a 6-character keyword (HEADER, COMPND, etc.) followed by information, formatted according to specific rules (if you click on an underlined keyword, you will get a new window with a description of it). The lines that begin with the keyword "ATOM  " describe one atom each. Each line contains information such as atom name, residue type and number, and the X, Y and Z coordinates of the atom.

Q. 4. What do records that begin with the keyword "HET   " describe ?

Have a look at the "Sequence Details" from the menu on the left.

Q. 5. How many residues does the second helix of this structure contain ?

If you click on "Structural Neighbors", you will get a list of sites that contain information concerning other proteins (not necessarily related in sequence or function !) that have a similar fold. We will encounter some of these later on, but feel free to explore some of them now.

If you click on "Other Sources", you will be presented with a list of links to other WWW sites that carry information concerning this particular protein structure. We will encounter some of these later on as well, but again: feel free to check out some of these sites.

Now let's do some simple searching of the PDB. Click on "SearchLite" at the bottom of the menu on the left. Read through the table with example queries.

Q. 6. How many structures are in the PDB for which Paul Sigler is one of the authors ?

Q. 7. How many of these structures were published in the journal Nature ?

There is more than one easy way to find the correct answer to Q. 6 and 7. One is to select all structures (in the pull down menu) and click go. Then, also in the pull down menu, select "Create A Tabular Report" and again click go. In the next form choose an option to view the list of pdb-files with information on authors and journal.

For a more advanced search, use the "SearchFields" search option. Scroll to the bottom of this page; there you can select which criteria you want to use in your search. Select all items and hit the "New Form" button.

Q. 8. How many X-ray crystal structures were released during the 1970s, 1980s, and 1990s ?

You can also search the database with sequences. Locate the FASTA search option on the search form. Check the "Use PDB ID" box and enter "1cbs" in the box. Set the E-value cut-off to 0.001 and hit the "Search" button.
If it does not work: find the 1cbs entry, click on "Sequence details" and download the sequence in FASTAformat. Paste the sequence into the box in the SearchFields and set the E-value cut-off to 0.001. Hit the "Search" button.

Q. 9. How many hits do you get ? How many of these are identical in sequence to 1CBS ?

Now let's find the longest single-chain macromolecule for which an structure determined by X-ray diffraction is available. Use your search form to find all entries that contain a chain of at least 2,000 residues. When you get a list of entries, select "Select all structures" from the pop-up menu at the top of the page, and hit the "Go" button. Then select "Create a tabular report" and hit "Go" again. From the following table, create a custom report in HTML format with only the chain length and chain identifier checked.

Q. 10. What is the longest single-chain macromolecule determined by X-ray diffraction ? Is it a protein ?

But the reason we came here was to find coordinates for subtilisin. Go back to the SearchLite form and search for "subtilisin". You should find a fairly large number of hits ! You can get more information by clicking on the "{EXPLORE}" link that is shown for every structure. Do this now for the structure with PDB code 1SBC.

Q. 11. How many hits did you get ? Who are the authors of the structure with ID 1SBC ? And what is the resolution of the structure ?

Select the "Download/Display File" option of 1SBC and "Download the Structure File", with no compression and in PDB format. This will save the complete PDB entry on your local disk for later use. Make sure to save the structure in a directory that belongs to you !




Practical "Structural Databases" - EMBO Bioinformatics Course - Uppsala 1999/2001 - Gerard Kleywegt

Latest update at 20 January, 2004.