Published as: Kleywegt, G.J. and Jones, T.A. (1995). Braille for pugilists. In "Making the Most of Your Model", edited by W.N. Hunter, J.M. Thornton and S. Bailey. SERC Daresbury Laboratory, Warrington, pp. 11-24.

© CCLRC - Council for the Central Laboratory of the Research Councils , 1995

Braille for Pugilists

Gerard J. Kleywegt & T. Alwyn Jones,
Department of Molecular Biology,

Biomedical Centre, Uppsala University,

Box 590, S-751 24 Uppsala,

SWEDEN.

Introduction.

Before attempting to make the most out of a model (e.g., looking for structural explanations for biochemical data, designing relevant mutants, modelling as yet unobserved complexes, or constructing potential inhibitors), one has to assess how good the model really is. A thorough, critical analysis of a model requires atomic coordinates and structure factor data, but even from coordinates alone, or even a paper, one can already obtain a fairly good idea of (a) whether or not the model may contain gross errors, and (b) which parts or aspects of the model are credible, and which are to be taken with a grain of salt. At anything worse than atomic resolution, modelling details of protein structures often amounts to an attempt to read Braille while wearing boxing gloves, which means that errors are easily introduced. In this contribution we describe briefly how to minimise errors and artefactual features, and how to assess model quality using a much wider variety of "quality indicators" than is typically done.

What is a good model ?

Quite simply, a good model is one that makes sense in all respects, i.e.:

Accuracy and precision.

All other things being equal, the best model in a crystallographic sense is the one which has the highest accuracy, and with a precision that matches the information contents of the data. In protein crystallography, "accuracy" is related to <|Df|>, the average absolute magnitude of the phase errors, and "precision" is related to the level of detail of the model.

The most accurate model, given a particular set of data, is the one that has the lowest value of <|Df|>, and there are strong indications that the value of <|Df|> is highly correlated (correlation coefficient close to +1) with that of the free R-factor, Rfree [1, 2]. Therefore, the most accurate model is the one with the lowest value of Rfree.

The level of detail at which one can describe a model is dictated by the information contents (determined by quantity and quality) of the data. For this reason, few people would be tempted to refine anisotropic temperature factors at 2.5Å. On the other hand, many people do refine individual isotropic temperature factors at 3Å or worse resolution, which in most cases amounts to over-fitting of the data. Here, Rfree can be used to decide if the increased level of detail (e.g., when going from one temperature factor per residue to individual isotropic Bs) is warranted by the data: if refinement of individual Bs leads to a reasonable drop in Rfree, the new model apparently gives a better description of the data; if it doesn't, refining individual Bs over-fits the data.

Degrees of "wrongness".

Many things can go wrong during building, rebuilding and refinement of a structure, even in the case of molecular-replacement exercises [3]. A few years ago, Brändén and Jones [4] outlined possible degrees of incorrectness of crystallographic models: Of course there are more possible errors:
Probably the most common error, however, is:
Most of these errors can be detected, remedied and (even better) prevented if one uses common sense as well as state-of-the-art methodology (Rfree, high-temperature Simulated Annealing, databases) and software (i.e., O [5] and X-PLOR [6]).

How NOT to judge a model.

Many journals (Nature, Science, PNAS, to name but a few) are happy with (or insist on) a table with the minimum set of "conventional quality indicators" to convince the readers of the quality of a model: the resolution, the conventional R-factor, the average temperature factor, and the root-mean-square deviation (RMSD) from "ideal values" (often undefined in the paper) of bond lengths and bond angles. These "quality indicators", however, are absolutely unable to discriminate between good and bad models (basically, because they can quite easily be "fudged"). For example, consider the models described in Table I, and try to assess the correctness of both before reading on:


Table I. List of "conventional quality indicators" of two protein models.

Molecule"X""Y"
Resolution (Å)3.02.9
R-factor0.2140.251
RMSD bond lengths (Å)0.0090.009
RMSD bond angles (Å)2.11.6
Average temp. factor (Å2)13.449.2

Judging from Table I, and using the conventional ideas as to what constitutes a "well-refined model", model "X" looks quite good, whereas model "Y" has a high R-factor and average temperature factor, indicating that there might be something wrong with it.

In fact, model "Y" is the structure of cellular retinoic-acid-binding protein (CRABP) type I [7]. The R-factor may seem high, but the structure was refined by minimising Rfree (to get the most accurate model), and by minimising the difference between R and Rfree (to minimise over-fitting). The temperature factors are high because the quality of the data was less than fantastic (the effective resolution is ~3.2Å), and partly because the structure was refined with strictly constrained two-fold NCS (see the discussion in [7]).

Model "X" is a related protein, CRABP type II [7], which was originally solved at 1.8Å. However, the correct structure was then intentionally traced backwards, and the resulting model was refined using data out to 3Å, to yield model "X" ... [8] Note that this means that the Brändén & Jones 25% R-factor threshold has been broken (this held that a refinement that stalled at an R-factor >0.25 should make "alarm bells ring").

In other words, the "conventional quality indicators" listed in Table I are not even capable of discriminating between a correct and a backward-traced protein structure ! In the following we shall encounter a number of quality indicators that do a much better job, in particular when they are used in combination (since a good model makes sense in all respects).

Making better models.

The ultimate quality of the structure is determined by the quality of the model building and rebuilding as well as by the refinement protocol. The refinement should always start from a model which has as few assumptions and degrees of freedom as possible, in order to speed up convergence and to limit erroneous adjustments to the model. Initially, this "null-hypothesis" implies:
The model can then gradually be improved and extended in cycles of rebuilding and refinement. Prior to rebuilding, the model should be checked ("quality control") on all criteria which are also used to judge the final model: Ramachandran plot, temperature factors, peptide orientations, side-chain conformations, real-space R-factors, geometry, differences between NCS-related molecules, differences with the previous model, etc. etc. While rebuilding, the use of databases (for peptide orientations and side-chain conformations, [5, 9]) is essential. At high resolution, it turns out, only ~1-2% of the residues has an unusual peptide orientation, and only ~5-10% of the residues has a non-rotamer side-chain conformation. This means that in the large majority of cases, unusual peptides in early or low-resolution models can be assumed to be wrong (unless the density is extremely well-defined), and that most non-rotamers can safely be replaced by rotamers.

Every refinement cycle, except perhaps the few last ones, should involve high-temperature (4000 K) simulated annealing (SA) [10]. This removes (most of the) model bias and ensures that a large part of conformational space is sampled. It also indicates how "robust" the model is: well-defined parts of the structure do not suffer from high-temperature SA, except for the odd side chain.

As the model becomes better and more complete, it can be made more precise (i.e., detailed). For example, the ligand or co-factor can be included in the model, water molecules can be added, the NCS constraints can be replaced by restraints, and temperature factors can be refined for groups of atoms (e.g., two Bs per residue) or perhaps even individual atoms. However, one should realise that each of these steps increases the number of degrees of freedom, and thereby the potential for the refinement program to adjust these parameters in order to model noise and to mask errors in the structure. Undoubtedly atoms have individual (anisotropic) temperature factors, and NCS-related molecules display small differences, but the question one should ask is if the data is of sufficient quality and quantity to actually model these phenomena. At anything worse than atomic resolution, Rfree appears to be the only statistic that can actually tell if an increase in the precision of the model is warranted by the information contents of the data, i.e. if it constitutes an improved model for your particular dataset, or if the refinement program has merely used the additional freedom to reduce the conventional R-factor by making the model worse.

What if you don't ?

The refinement and rebuilding protocol outlined above differs rather drastically from the traditional modus operandi in the protein crystallographic community. The latter tends to entail unrestrained NCS and individual isotropic temperature factors, irrespective of the resolution of the data. An analysis of ~300 low-resolution structures (worse than 2.2Å) reveals that roughly one third has been refined with a data-to-parameter ratio less than one, and an additional one third with a ratio between one and 1.5 [11]. This means that most of these structures suffer from over-fitting, i.e. they have been modelled with a level of precision which is not warranted by the data. In the "best" cases, this will have introduced non-existing water molecules, fantasy temperature factors, unrealistic differences between NCS-related molecules and an overall coordinate error of up to 2Å. In the worst cases, the overdose of degrees of freedom will have been used to mask even more serious errors.

A case in point is the structure of chloromuconate cycloisomerase [12] (PDB code 1CHR). This structure was solved in spacegroup I4 with two-fold NCS. The model was refined against 3Å data without NCS-constraints, with individual temperature factors, with alternative conformations for some residues, and without Rfree. "Significant differences [...] at the active site" were found between the two NCS-related molecules, and their RMSD was 0.86Å on Ca atoms, and 1.5Å on all atoms. Closer inspection of the model and the data, however, revealed that the actual spacegroup is I422, without NCS [3]. In addition, it was found that a stretch of ~25 residues was out-of-register in the original model [3]. Both errors were masked by the refinement program: since there were ~1.5 times as many degrees of freedom as there were reflections, the wrong model in the wrong spacegroup still had a conventional R-factor of 0.195. Again this shows that the conventional R-factor is rather meaningless at worse than atomic resolution.

The major lessons to be learned from this are:

Judging a paper.

Having to judge a structure merely by reading the paper in which it is described is a situation encountered frequently by readers, editors, referees, co-authors and sometimes even supervisors. Some of the things to check include:
The person who solves the structure has to be absolutely merciless in judging his own model; the supervisor must be supercriticial; even the co-authors should be more critical than the worst nit-picking referee will ever be; the referees should demand to be convinced that the structure is correct; and the editors should start listening to their referees.


Table II.
Statistics and quality criteria for a number of models with different degrees of incorrectness. See the text for details.

Model BACK ASGL 1CHR 1PMK 2GDA 1GDC 1CBR NORM LOWR
% Incorrect 100 ~80 ~7+50 ? ? ? ? 0 0
Resolution (Å) 3.0 2.9 3.0 2.25 3-3.5 ? 2.5-3 ? 2.9 1.5-2 >2
Number of residues 137 331 2*370 2*78 72 72 2*136 >50 >50
R 0.214 - 0.195 0.164 - - 0.251 0.1-0.2 0.2-0.3
Rfree 0.617 - - - - - 0.320 <R+Rmerge? .
RMSD bond lengths (Å) 0.009 - 0.029 0.015 - - 0.009 - <0.015
RMSD bond angles ( deg) 2.1 - 5.1 - - - 1.6 - <2
Temp.-factor model Biso none Biso Biso - - grouped Biso grouped/none
Average temp. factor (Å2) 13.4 (10) 25.9 22.7 - - 49.2 5-20? 10-50?
RMS delta-B bonded atoms (Å2) 4.1 - 2.2 2.1 - - - <3? no Biso
RMSD all NCS atoms (Å) a - - 1.51 1.17 - - 0 <0.5 0
RMS delta-B all NCS atoms (Å2) a - - 5.7 3.7 - - 0 <5 0
RMSD core Ca atoms (Å) a - - 0.73 0.71 - - 0 <0.3 0
RMS delta-B core Ca atoms (Å2) a - - 4.3 2.3 - - 0 <3 0
<| delta-Phi |> (deg) a - - 23.7 20.9 - - 0 <5 0
<| delta-Psi |> (deg) a - - 23.4 19.3 - - 0 <5 0
% Residues |delta-Phi| > 10deg a - - 60.3 64.1 - - 0 <5 0
% Residues |delta-Psi| > 10deg a - - 60.3 60.3 - - 0 <5 0
% Core Ramachandran plot areas b 42.7 37.0 75.7 64.1 61.9 71.4 81.6 >90 >80
% Additional allowed areas b 36.3 31.7 19.4 34.4 30.2 25.4 16.0 5-10 10-20
% Generously allowed areas b 12.1 21.0 3.4 0.8 3.2 1.6 1.6 0-3 0-5
% Disallowed areas b 8.9 10.3 1.5 0.8 4.8 1.6 0.8 <1 <1
% Secondary structure c 48.9 24.2 62.6 9.6 h 50.0 45.8 67.6 50-70 50-70
Omega angle st. dev. (deg) b 1.6 23.1 7.6 3.2 4.1 4.6 1.5 6? <2
Zeta angle st. dev. (deg) b 1.7 5.4 1.0 4.3 2.6 2.6 1.3 4? <2
Bad contacts per 100 residues b,e 13.1 46.2 1.5 37.2 1.4 1.4 1.5 0 <2
H-bond energy st. dev. b 0.8 1.6 0.8 1.4 0.7 0.6 0.8 0.5? <1
% Non-rotamers c,f 29.2 29.0 22.4 21.8 13.9 11.1 7.4 5-10 5-10
% Unusual peptide orientations c,g 24.1 21.8 4.5 3.8 4.2 1.4 2.2 1-2 1-2
Overall ProCheck G-factor b -0.4 -3.3 -1.3 -1.2 -0.5 -0.5 +0.1 >0 >-0.5
Overall DACA score d -2.6 -2.4 -1.2 -2.0 -2.1 -2.1 -0.4 >-0.5 >-1
 a - calculated with LSQMAN (GJK & TAJ, unpublished program)
 b - calculated with ProCheck [16]
 c - calculated with O [5]
 d - calculated with What If [17]; this measures how (un)usual the
     ensemble of neighbouring protein atoms is for every group of
     atoms in the protein; this may discriminate against DNA-binding
     proteins (2GDA and 1GDC)
 e - many hydrogen bonds are flagged as bad contacts
 f - defined as residues having an RSC-fit value > 1.5 Å
 g - defined as residues having a pep-flip value > 2.5 Å
 h - this is a kringle domain (i.e., no a-helices or b-strands)

Judging coordinates.

When atomic coordinates of the complete model are available, a whole battery of tests can be executed, including those that were not mentioned in the original paper. Table II lists a number of simple checks that can be made with a set of coordinates in hand. The checks have been carried out on a number of models with different degrees of "wrongness": Finally, two columns have been included which contain our estimates of normal or expected values for the various criteria at high (NORM) and low resolution (LOWR). The following aspects of the models have been assessed:
It is clear that none of the traditional quality indicators correlates with the degree of incorrectness of the models. The only exception is the Ramachandran plot, but often this is not included or mentioned in papers in the more prestigious journals. The criteria that correlate best with incorrectness are the percentage of side chains in non-rotamer conformations, the percentage of residues with unusual peptide orientations, and the directional-atomic contact analysis score (DACA). Basically, all three are database methods that provide different ways of probing to what extent a model looks like a real protein. Note that the G-factor calculated by ProCheck can be fudged as well: using the Engh & Huber [18] force field in X-PLOR with not too high a weight for the crystallographic pseudo-energy term virtually guarantees that a structure scores "better than average" in ProCheck (with the exception, perhaps, of the Ramachandran score). If Rfree had been used in all studies, we are convinced that this statistic would have shown the best correlation with model error, since it is very hard to fudge. On the other hand, some of the structures shown here probably wouldn't have ended up in the literature in the form they did, if Rfree had been used.

An important conclusion is that an essentially correct model scores well on basically all tests (i.e., makes sense in all respects), a partly incorrect model scores poor on a few tests, and a grossly wrong model scores poor on almost all tests (other than the conventional R-factor and RMSD values). The same is true, by the way, at the residue level: problematic regions tend to score poor on a number of different criteria (temperature factors, Ramachandran plot, peptide orientation, side-chain conformation, real-space R-factor, etc.).

Judging made easy.

If structure factors are available, judging a model becomes a lot easier. First and foremost, it becomes possible to calculate maps which show how good the density really is. Second, using these maps, real-space R-factors [5, 9] can be calculated for each residue. Third, Simulated Annealing omit maps can be used to check poorly defined (or refined) regions. Finally, with the data in hand it is possible to re-do the refinement and to track errors [3], even many years after a structure was first published. Therefore, we strongly encourage the entire protein crystallography community to deposit not only a complete set of atomic coordinates of every solved structure, but also the structure factors with the PDB.


Table III.
Statistics and quality criteria for two structures from Uppsala that have been solved both at low and high resolution. See the text for details.

Model 1GUH ALEX 5RUB 9RUB
Resolution (Å) 2.6 2.0 2.6 1.7
R/Rfree 0.229/- 0.196/0.245 0.199/- 0.180/-
Number of residues 4*221 2*221 2*460 2*436
Temp.-factor model grouped Biso Biso Biso
Average temp. factor (Å2) 35.1 25.5 19.3 29.3
RMS delta-B bonded atoms (Å2) - 2.7 1.4 1.0
RMSD all NCS atoms (Å) a 0 0.57 2.31 1.25
RMS delta-B all NCS atoms (Å2) a 0 4.2 7.8 5.3
RMSD core Ca atoms (Å) a 0 0.09 0.95 0.89
RMS delta-B core Ca atoms (Å2) a 0 2.1 7.6 5.1
RMS delta-Phi (deg) a 0 3.0 45.9 18.3
% Residues |delta-Phi| > 10 deg a 0 2.3 65.3 18.3
RMS delta-Psi (deg) a 0 3.0 45.8 20.0
% Residues |delta-Phi| > 10 deg a 0 0.9 67.2 19.2
% Residues |delta Ca-Ca-Ca| > 5 deg a 0 1.4 51.0 12.5
% Residues |delta Ca-Ca-Ca-Ca| > 10 deg a 0 0.9 39.6 12.5
% Core Ramachandran plot areas b 91.9 90.9 74.1 91.0
% Additional allowed areas b 8.1 8.4 19.5 8.2
% Generously allowed areas b 0 0.8 4.3 0.6
% Disallowed areas b 0 0 2.2 0.3
% Secondary structure c 69.2 69.0 58.2 63.2
Bad contacts per 100 residues b,e 0 0.2 6.3 17.5
% Non-rotamers c,f 11.8 10.0 20.0 11.4
% Unusual peptide orientations c,g 1.8 2.0 6.8 2.6
Overall ProCheck G-factor b 0.0 +0.4 -1.3 -0.4
Overall DACA score d -0.7 -0.6 -1.5 -0.7
 a - calculated with LSQMAN (GJK & TAJ, unpublished program)
 b - calculated with ProCheck [16]
 c - calculated with O [5]
 d - calculated with What If [17]
 e - many hydrogen bonds are flagged as bad contacts
 f - defined as residues having an RSC-fit value > 1.5 Å
 g - defined as residues having a pep-flip value > 2.5 Å

The ultimate test.

A good model can withstand the ultimate test: refinement against high-resolution data. Table III shows two examples of structures from Uppsala which have been solved both at low and at high resolution:
Contrary to what one might infer from the table, 1GUH was solved before ALEX, and 9RUB was solved after 5RUB. 1GUH was refined conservatively with strict NCS and grouped temperature factors; 9RUB was refined liberally with no NCS con/restraints and individual temperature factors. Note that the GSTs have very similar values for the majority of the statistics. The GST structures have an RMSD on Ca atoms of 0.47Å, whereas the rubisco's have RMSDs between 0.74 and 1.2Å ! The average |delta-Phi| and |delta-Psi| is ~9 deg for the GSTs (rubisco's: 17-20 deg), and for ~23% of the residues these differences exceed 10 deg (rubisco's: 53-58%). The average |delta Ca-Ca-Ca-Ca dihedral| is ~3.7 deg for the GSTs (rubisco's: 8.0-9.5 deg), and for 6.9% of the residues this value exceeds 10 deg (rubisco's: 20-30%). Clearly, the 2.6Å GST model can stand refinement against higher resolution data, whereas the 2.6Å rubisco model has undergone rather large changes which must be due to over-fitting the low-resolution data. Again, the lesson is that conservative refinement minimises the chance of introducing artefacts and errors due to over-fitting, whereas liberal refinement is virtually guaranteed to yield artefacts and errors.

In practice, one often encounters situations in which a structure is first solved at high resolution. This structure is then used to solve the structures of mutants or complexes for which only low-resolution datasets are available. A dangerous mistake is to use the same refinement protocol that was used for the high-resolution refinement for the low-resolution structures (e.g., no NCS restraints, individual temperature factors). The only way in which such structures can be refined properly is by (a) using a conservative refinement strategy, (b) using weak harmonic restraints to keep the atoms near their high-resolution positions unless there is a strong driving force in the data to change them, and (c) monitoring Rfree from the very start of the refinement. This approach has successfully been applied in the refinement of a complex of Candida antarctica lipase B at 2.5Å resolution, starting from a 1.5Å model [23, 24].

Acknowledgments.

This work was supported by the Swedish Natural Science Research Council and Uppsala University. We thank Dr. Alex Wlodawer for providing us with the coordinates of the incorrect model of asparaginase/glutaminase, and Dr. Alex Cameron for the coordinates of the 2Å GST model. We also acknowledge the many fruitful discussions regarding Rfree, quality control and validation, both with Dr. Axel Brünger (Yale), and with the participants in the European Union Protein Structure Validation Initiative.

References.

  1. A.T. Brünger, Nature 355, 472 (1992).
  2. A.T. Brünger, Acta Cryst. D49, 24 (1993).
  3. G.J. Kleywegt & T.A. Jones, "A more correct crystal structure of chloromuconate cycloisomerase", to be published.
  4. C.I. Brändén & T.A. Jones, Nature 343, 687 (1990).
  5. T.A. Jones, J.Y. Zou, S.W. Cowan, & M. Kjeldgaard, Acta Cryst. A47, 110 (1991).
  6. A.T. Brünger, "X-PLOR: a system for crystallography and NMR", Yale University, New Haven , CT, 1990.
  7. G.J. Kleywegt, T. Bergfors, H. Senn, P. Le Motte, B. Gsell, K. Shudo, & T.A. Jones, Structure 2, 1241 (1994).
  8. G.J. Kleywegt & T.A. Jones, "Refinement of low-resolution structures", to be published.
  9. J.Y. Zou & S.L. Mowbray, Acta Cryst. D50, 237 (1994).
  10. A.T. Brünger & A. Krukowski, Acta Cryst. A46, 585 (1990).
  11. G.J. Kleywegt & T.A. Jones, "Maltreatment of non-crystallographic symmetry", to be published.
  12. H. Hoier, M. Schlömann, A. Hammer, J.P. Glusker, H.L. Carrell, A. Goldman, J.J. Stezowski, & U. Heinemann, Acta Cryst. D50, 75 (1994).
  13. H.L. Ammon, I.T. Weber, A. Wlodawer, R.W. Harrison, G.L. Gilliland, K.C. Murphy, L. Sjölin, & J. Roberts, Proc. Natl. Acad. Sci. USA 263, 150 (1988);
    J. Lubkowski, A. Wlodawer, D. Hosset, I.T. Weber, H.L. Ammon, K.C. Murphy, & A.L. Swain, Acta Cryst. D50, 826 (1994).
  14. K. Padmanabhan, T.P. Wu, K.G. Ravichandran, & A. Tulinsky, Prot. Sci. 3, 898 (1994).
  15. H. Baumann, K. Paulsen, H. Kovacs, H. Berglund, A.P.H. Wright, J.A. Gustafsson, & T. Härd, Biochemistry 32, 13463 (1993).
  16. R.A. Laskowski, M.W. MacArthur, D.S. Moss, & J.M. Thornton, J. Appl. Cryst. 26, 283 (1993).
  17. G. Vriend & C. Sander, J. Appl. Cryst. 26, 47 (1993).
  18. R.A. Engh & R. Huber, Acta Cryst. A47, 392 (1991).
  19. I. Sinning, G.J. Kleywegt, S.W. Cowan, P. Reinemer, H.W. Dirr, R. Huber, G.L. Gilliland, R.N. Armstrong, X. Ji, P.G. Board, B. Olin, B. Mannervik, & T.A. Jones, J. Mol. Biol. 232, 192 (1993).
  20. A.D. Cameron, et al., & T.A. Jones, "Structure refinement and analysis of human alpha class glutathione S-transferase A1-1, in the apo form and in complexes with ethacrynic acid and its glutathione conjugate", to be published.
  21. T. Lundqvist & G. Schneider, J. Biol. Chem. 266, 12604 (1991).
  22. G. Schneider, Y. Lindqvist, & T. Lundqvist, J. Mol. Biol. 211, 989 (1990).
  23. G.J. Kleywegt & T.A. Jones, "Good model-building and refinement practice", to be published.
  24. J. Uppenberg, et al., & T.A. Jones, "Crystallographic and molecular dynamics studies of lipase B from Candida antarctica reveal a stereo-specificity pocket for secondary alcohols", to be published.


Latest update at 8 October, 1998.