Homo Crystallographicus - Quo Vadis ?
Gerard J Kleywegt & T Alwyn Jones
Supplementary material to: G J Kleywegt & T A Jones (2002).
Homo Crystallographicus - Quo Vadis ? Structure
10 (4), 465-472.
Raw data, statistics, and plots. The raw data was gathered and
analysed using "jiffy" shell scripts and programs written by GJK.
The plot files were generated with
ODBMAN,
and converted into PostScript with
O2D.
New !!!
Try Harry Plotter ... a Java-based
interactive plotting program which provides direct links between data
points in a scatter plot and the corresponding page at the RCSB-PDB !
- master.list - ASCII text file
containing the raw results of our analysis (~780 kB). The file
contains one line per PDB entry with (tab-delimited): PDB identifier,
resolution (Å), year of deposition, R-value, free R-value,
number of amino-acid residues, number of Ramachandran-plot
outliers, percentage of Ramachandran-plot outliers, method
of test-set selection, number of test-set reflection, percentage
of test-set reflections, (unused counter 1), (unused counter 2),
flag to indicate presence or absence of electron-density map
in EDS.
The list is sorted by resolution first, and percentage
Ramachandran-plot outliers second.
The first few lines of this file look as follows:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
1ejg 0.54 2000 0.090 0.094 37 0 0.000 1_REFLECTION_OUT_OF_20 11220 5.000 10336 9119 HAS_AN_OMAP_IN_EDS
1d8g 0.74 1999 0.105 0.131 0 0 0 _RANDOM_ NULL NULL 10335 9118 NULL
1dj4 0.75 1999 0.135 NULL 0 0 0 NULL NULL NULL 10333 9116 HAS_AN_OMAP_IN_EDS
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
- rval.out - output from the initial
analysis program (~60 kB). This includes overall statistics,
lists of PDB entries with the highest and lowest values for the
various statistics, etc.
- Analysis I - 10,674 entries with resolution and R-value.
- rv.odb - O datablock file with raw
data (~240 kB)
- odbman_1.out output file from
analysing this data with ODBMAN (~12 kB), containing
histograms, correlation coefficient, and non-parametric
rank correlations
- Plot resolution_vs_year:
PostScript | GIF
- Plot rvalue_vs_year:
PostScript | GIF
- Plot rvalue_vs_resolution:
PostScript | GIF
- Plot hasmap_vs_year:
PostScript | GIF
- Analysis II - 6,560 entries with resolution, R-value and free
R-value.
- Analysis III - 10,215 entries with resolution, R-value and
Ramachandran plot.
- rvra.odb - O datablock file with raw
data (~475 kB)
- odbman_3.out output file from
analysing this data with ODBMAN (~21 kB), containing
histograms, correlation coefficient, and non-parametric
rank correlations
- Plot percramaout_vs_year:
PostScript | GIF
- Plot percramaout_vs_resolution:
PostScript | GIF
- Plot percramaout_vs_rvalue:
PostScript | GIF
- Plot hasmap_vs_percramaout:
PostScript | GIF
- Plot numres_vs_year:
PostScript | GIF
- Analysis IV - 6,316 entries with resolution, R-value, free
R-value, and Ramachandran plot.
- rvrfra.odb - O datablock file with raw
data (~390 kB)
- odbman_4.out output file from
analysing this data with ODBMAN (~27 kB), containing
histograms, correlation coefficient, and non-parametric
rank correlations
- Plot percramaout_vs_rfree:
PostScript | GIF
- Plot percramaout_vs_rfminusr:
PostScript | GIF
- Analysis V - 5,421 entries with more than one protein or peptide
chain (at least 10 residues).
- anaram_max.log, list of entries with
more than one protein or peptide chain, sorted by the percentage
Ramachandran outliers of the "worst" chain (~545 kB), or just the
Top 100.
Entries look as shown below and list: index (ignore), PDB identifier,
"best" chain with name, percentage outliers and number of residues,
"worst" chain with name, percentage outliers and number of residues,
the difference between the two percentages, the resolution (Å),
and the year of deposition:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
# 1594 1f83 MIN 7.60 A 397 res - MAX 76.20 B 21 res - DIFF 68.60 - RESO 2.00 - YEAR 2000
# 5389 1tmf MIN 27.00 2 244 res - MAX 56.00 4 25 res - DIFF 29.00 - RESO 3.50 - YEAR 1992
# 5221 1nro MIN 27.60 H 225 res - MAX 50.00 L 22 res - DIFF 22.40 - RESO 3.10 - YEAR 1994
# 5373 1fll MIN 18.70 A 187 res - MAX 50.00 X 18 res - DIFF 31.30 - RESO 3.50 - YEAR 2000
# 4977 1awh MIN 7.90 B 229 res - MAX 44.40 A 27 res - DIFF 36.50 - RESO 3.00 - YEAR 1997
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
- anaram_max.log, list of entries with
more than one protein or peptide chain, sorted by the difference
between the percentage Ramachandran outliers of the "worst" and
the "best" chain (~545 kB), or just the
Top 100.
A histogram of the differences looks as follows:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Nr >= 0.0000 and < 1.0000 : 2152 ( 39.70 %; Cum 39.70 %)
Nr >= 1.0000 and < 2.0000 : 1240 ( 22.87 %; Cum 62.57 %)
Nr >= 2.0000 and < 3.0000 : 697 ( 12.86 %; Cum 75.43 %)
Nr >= 3.0000 and < 4.0000 : 406 ( 7.49 %; Cum 82.92 %)
Nr >= 4.0000 and < 5.0000 : 243 ( 4.48 %; Cum 87.40 %)
Nr >= 5.0000 and < 10.0000 : 454 ( 8.37 %; Cum 95.78 %)
Nr >= 10.0000 and < 20.0000 : 172 ( 3.17 %; Cum 98.95 %)
Nr >= 20.0000 and < 30.0000 : 46 ( 0.85 %; Cum 99.80 %)
Nr >= 30.0000 and < 40.0000 : 10 ( 0.18 %; Cum 99.98 %)
Nr >= 60.0000 and < 70.0000 : 1 ( 0.02 %; Cum 100.00 %)
Nr >= 70.0000 : 0 ( 0.00 %; Cum 100.00 %)
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
- Plot percentage outliers "worst" chain versus "best" chain:
PostScript | GIF
- Plot difference percentage outliers "worst" chain and "best" chain
versus resolution:
PostScript | GIF
Latest update at 11 April, 2006.