USF

O for the Structurally Challenged

A Good Model-building and Refinement Practice Tutorial Using O and X-PLOR

Gerard J. Kleywegt - Uppsala University

(c) 1996


1 INTRODUCTION


1.1 Version

- version 0.1 @ 951113 - GJ Kleywegt
- version 0.2 @ 951115 - GJ Kleywegt
- version 0.3 @ 951116 - GJ Kleywegt
- version 0.4 @ 951117 - GJ Kleywegt
- version 0.5 @ 951118 - GJ Kleywegt
- version 0.6 @ 951123 - GJ Kleywegt
- version 0.7 @ 951203 - GJ Kleywegt
- version 0.8 @ 960118 - GJ Kleywegt


1.2 Copyright

This tutorial and the associated files are (c) G.J. Kleywegt (Uppsala), 1996. Permission is granted to reproduce these materials in reasonable quantities for personal or educational use.

If you use the tutorial and like it, send a postcard from your hometown with a nice stamp to the author at: Department of Molecular Biology, Biomedical Centre, University of Uppsala, Box 590, S-751 24 Uppsala, SWEDEN.


1.3 Vad är detta ? (What is this ?)

This is a tutorial to introduce protein-model rebuilding and quality control using O, and refinement using X-PLOR. It requires the following:
- basic familiarity with O (e.g., through the "O for Morons" tutorial)
- O version 5.10.x and an O manual
- a few files provided with this tutorial
- access to the O public domain directory (generically called OMAC; check the correct name on your local system; if not present, get it from the O ftp server, file "pub/gerard/extras/omac/omac.tar.gz")
- the list of Frequently-Asked Questions (file "OMAC/software.faq"; also available through the WorldWide Web)
- some Uppsala utility programs (also available from the O ftp server; you will also need the manuals for these programs)
- X-PLOR (version 3.1 or 4) and the X-PLOR manual
- CCP4 programs (not necessary; they are used for map calculations, but these can also be carried out with X-PLOR)

The tutorial can be used in different ways:
- as a quick introduction to quality control and rebuilding: chapters 2, 3, 4 and 5 (optionally, 8 and 9);
- as a quick introduction to refinement (only for absolute X-PLOR beginners): chapters 6 and 7;
- as a complete course in rebuilding and refinement from first to final model: all chapters (including several iterations of chapter 9).
Chapters 5, 7 and 9 will generally be the most time-consuming (depending on the level of experience the student has with O and rebuilding and refinement in general). Chapters 2 to 8 contain a few questions in sections called "The Swedish Inquisition". Generally, these will aid or extend the understanding of the subject matter covered in that chapter.

In addition, ask your system manager to install the "run", "ono" and "oplot" scripts (from the server, directory pub/gerard/extras/scripts); these will make your O-life a tad simpler.

The tutorial does not include ab initio model-building in an MIR map. For this purpose, there are some macros on the O ftp server (directory pub/p2_course).

Also, the structure used as an example has no NCS, so averaging is not covered either. There is, however, a separate RAVE tutorial available from the O ftp server (file "pub/gerard/rave/exam.tar.gz").

The structure you will be working with is that of cellular retinoic-acid binding protein type II (CRABP II) in complex with all-trans-retinoic acid. This structure was solved at 1.8 Å, but we will only use data to 2.8 Å initially to show the effects that limited resolution may have. Our starting model is derived from a related protein, CRABPI, which was solved at 2.9 Å resolution. This structure was changed as follows:
- the N and C terminal residues were removed
- the region near the insertion in CRABPII was removed (114-117)
- a loop with poor density and high temperature factors was removed (100-106)
- residues which differ between CRABPI and II were cut back to alanines (unless they are glycines in CRABPI)

The temperature factors of this model were reset, and it was subjected to mild Simulated Annealing refinement etc. I then "rebuilt" it to introduce some deliberate errors and did some more energy minimisation and individual temperature-factor refinement. After this, a 2Fo-Fc and an Fo-Fc map were calculated.

This structure has the advantage that it is fairly small and yet has most of the common errors and problems (main chain, side chain, poor loop, insertion site, low resolution) associated with protein model refinement and rebuilding.

Before you start, copy your local version of the gmrp directory tree to your own area. This contains all the files you need. You will start working in the O directory, gmrp/o.

For teaching exercises: to answer the questions, it may be useful to have a copy of the O manual and tutorial at hand. The answers to most questions can be found in the literature references, program manuals, or by inspection (of the model, the density or some file).

It may be handy to use a web-browser to consult the O manual and FAQ (http://imsb.au.dk/~mok/o/). Manuals for the utility programs are available in HTML format as well (in Uppsala, via our homepage; elsewhere from the O ftp server, file "pub/gerard/extras/html_manuals/html_manuals.dirtar.gz"). Documentation for X-PLOR and CCP4 programs is also available on the WorldWide Web.

While you work through the tutorial, you may also want to use the Good Model-building and Refinement forms for keeping notes (available as file "OMAC/gsp_forms.ps").

Rebuilding is a lot more fun if you play loud music on your personal stereo !

Comments and suggestions about this tutorial can be E-mailed to gerard@xray.bmc.uu.se.


1.4 Essential reading

The following references are essential for working through the rebuilding parts of this tutorial.

(1) T.A. Jones & M. Kjeldgaard, "O - The Manual", version 5.10.3, Uppsala, 1995.

(2) T.A. Jones, J.Y. Zou, S.W. Cowan & M. Kjeldgaard, "Improved methods for building protein models in electron density maps and the location of errors in these models", Acta Cryst. A47, 110-119 (1991).

(3) G.J. Kleywegt & T.A. Jones, "Good model-building and refinement practice", to be published (Methods in Enzymology, 1996).

(4) G.J. Kleywegt, "O for Morons", Uppsala (1994).

(5) G.J. Kleywegt, T. Bergfors, H. Senn, P. Le Motte, B. Gsell, K. Shudo & T.A. Jones, "Crystal structures of cellular retinoic acid binding proteins I and II in complex with all-trans-retinoic acid and a synthetic retinoid", Structure 2, 1241-1258 (1994).


1.5 Additional and background reading

(1) G.J. Kleywegt & T.A. Jones, "OOPS-a-daisy", ESF/CCP4 Newsletter 30, June 1994, pp. 20-24.

(2) G.J. Kleywegt, "Dictionaries for Heteros", ESF/CCP4 Newsletter 31 (32?), June 1995, pp. 45-50.

(3) G.J. Kleywegt & T.A. Jones, "Efficient rebuilding of protein structures", Acta Cryst. D, to be published (1996).

(4) G.J. Kleywegt & T.A. Jones, "xdlMAPMAN and xdlDATAMAN - programs for reformatting, analysis and manipulation of biomacromolecular electron-density maps and reflection datasets", Acta Cryst. D, accepted for publication (1996).

(5) T.A. Jones & M. Kjeldgaard, "???", to be published (Methods in Enzymology, 1996).

(6) G.J. Kleywegt, "Use of non-crystallographic symmetry in protein structure refinement", Acta Cryst. D, accepted for publication (1996).

(7) A.T. Brünger, "X-PLOR - A System for Crystallography and NMR", Yale University, New Haven (1992). [And references therein.]

(8) T.A. Jones & S. Thirup, "Using known substructures in protein model building and crystallography", EMBO J. 5, 819-822 (1986).

(9) C.I. Brändén & T.A. Jones, "Between objectivity and subjectivity", Nature 343, 687-689 (1990).

(10) T.A. Jones & M. Kjeldgaard, "Making the first trace with O", in "From first map to final model" (S. Bailey, R. Hubbard & D. Waller, Eds.), SERC Daresbury Laboratory, pp. 1-13 (1994).

(11) G.J. Kleywegt & T.A. Jones, "Where freedom is given, liberties are taken", Structure 3, 535-540 (1995).

(12) G.J. Kleywegt & T.A. Jones, "Braille for pugilists", in "Making the most of your model" (W.N. Hunter, J.M. Thornton & S. Bailey, Eds.), SERC Daresbury Laboratory, pp. 11-24 (1995).

(12) E.J. Dodson, G.J. Kleywegt & K.S. Wilson, "Report of a workshop on the use of statistical validators in protein X-ray crystallography", Acta Cryst. D52, 228-234 (1996).

(13) A.T. Brünger, "Free R value: a novel statistical quantity for assessing the accuracy of crystal structures", Nature 355, 472-475 (1992).

(14) A.T. Brünger & L.M. Rice, "Crystallographic refinement by simulated annealing: methods and applications", to be published (Methods in Enzymology, 1996).

(15) A.T. Brünger, "The free R value: a more objective statistic for crystallography", to be published (Methods in Enzymology, 1996).

(16) Collaborative Computational Project, Number 4, "The CCP4 suite: programs for protein crystallography", Acta Cryst. D50, 760-763 (1994).

(17) A. Hodel, S.H. Kim & A.T. Brünger, "Model bias in macromolecular crystal structures", Acta Cryst. A48, 851-858 (1992).

(18) M.W. MacArthur, R.A. Laskowski & J.M. Thornton, "Knowledge-based validation of protein structures derived by X-ray crystallography and NMR spectroscopy", Curr. Opin. Struct. Biol. 4, 731-737 (1994).

(19) R.J. Read, "Model bias and phase combination", in "From first map to final model" (S. Bailey, R. Hubbard & D. Waller, Eds.), SERC Daresbury Laboratory, pp. 31-40 (1994).

(20) J.Y. Zou & S.L. Mowbray, "An evaluation of the use of databases in protein structure refinement", Acta Cryst. D50, 237-249 (1994).


2 PREPARATIONS


2.1 Files

In the gmrp/o directory you will find five files:
- "m1.pdb" = your starting model
- "m1_2fofc.map" = the 2Fo-Fc map in O format
- "m1_fofc.map" = the Fo-Fc map in O format
- "maps" = an O macro to draw these maps around the current screen centre
- "symmy" = an O macro that creates symmetry objects around the current screen centre

There is also a directory called gmrp/o/gerard, but we will ignore this for the time being.


2.2 O database

Let's start by creating a new O database:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 % 143 gerard rigel 20:40:18 gmrp/o > ono

... Run 4d_ono ... Linked /home/gerard/bindkey.macro to this directory ... Executed bindkey.macro for you

... Link to odat directory not found ... Making a soft link to the odat directory for you

... Link to omac directory not found ... Making a soft link to the omac directory for you

... Executing /nfs/taj/alwyn/o/bin/4d_ono ... For gerard on rigel at Mon Nov 13 20:49:41 MET 1995

O > Use of this program implies acceptance of conditions O > described in Appendix 1 of the O manual O > O version 5.10.3, Apr 1995 O > Define an O file (terminate with blank): O > Menu names are not defined. O > Enter file name [/nfs/taj/alwyn/o/data/menu.o]: O > menu.o file for O version 5.10 O > Last modified 20-Jul-94 O > Startup file was never loaded O > Enter file name [/nfs/taj/alwyn/o/data/startup.o]: O > startup.o file for O version 5.10 O > Last modified 28-Sep-94 O > Startup file was never loaded O > Enter file name [/nfs/taj/alwyn/o/data/access.o]: Chasis id= 1762011761 O > File_display_connectivity is not defined. O > Enter file name [/nfs/taj/alwyn/o/data/all.dat]: O > Maximum inter-residue link distance = 2.00 O > There were 23 residues. O > 175 atoms. O > Do you want to use the display? [Yes]: O > Graphics board GL4DXG-5.2 O > Making visibility data structures. O > Making visibility data structures. O > O > Trackball on (F7KEY) ioctl: Invalid argument setraw: Invalid argument save As1> File_O_save is not defined. As1> Enter file name [binary.o]: gmrp.o ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----


2.3 Setting up O for rebuilding

Execute the macro "OMAC/newo.omac". This will set up a number of things (including the O menu) which are handy for rebuilding:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > @omac/newo.omac
  O > Macro in computer file-system.
 Heap>  @all_on_off, @all_on, @all_off
 ...
  O >  As4> Save your O database now
  O >  As4> ...
  O >   O >   O >   O > save
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


2.4 Starting model

Read in the starting model and draw it. Also set up the symmetry.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > s_a_i m1.pdb m1
 Sam> File type is PDB
 Sam>  Database compressed.
 Sam> Space for    142650 atoms
 Sam> Space for     10000 residues
 Sam> Molecule M1 contained 123 residues and 907 atoms
 Sam> Centre of gravity updated for     1  123
  O > mol m1 zo ; end
  O >  Current molecule  has not been loaded.
  O > ce_zo m1 a10 c130
 As4> M1    A10   C130  M1
 As4> Centering on zone from A10 to C130
  O > symm_set
 Sym> Molecule name? [M1]:
 Sym> Define cell constants [ 45.65 47.56 77.61 90.00 90.00 90.00]:
 Sym> Name of spacegroup? [P212121]:
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


2.5 The Swedish Inquisition

(1) Change the "maps" macro such that it contours the maps within 15 Å from the screen centre.
(2) Change the "symmy" macro such that it uses the same radius as that used in the "maps" macro.
(3) What does the odat directory contain ? Where are the symmetry operator files for the most common spacegroups stored for O ?
(4) What does the script "OMAC/ofaq" do ? Use it (or your web-browser) to find out what you can about the Pep_flip command.


3 QUALITY CHECKS

Before we start rebuilding, let's see what the good, bad and ugly bits of the starting model are. To this end, use the Pep_flip, RSC_fit and RS_fit commands in O.


3.1 Pep_flip

The Pep_flip command calculates how (un)usual the orientation of the peptide oxygen atoms is compared to the database. Values greater than 2-2.5 Å need to be checked critically.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > pep_flip m1 a2 c135
 Util> M1    A2    C135  M1
 Util> Calculating zone A2     to C135   in molecule M1    , object M1
 Util>  The DB is now being loaded.
 Util>  Loading data for protein:HCAC
 ...
 Util>  Loading data for protein:TLN_3
 Util>   15 fragments used for residue A4     pep_flip value=    2.47
 Util>   20 fragments used for residue A5     pep_flip value=    3.28
 Util>   20 fragments used for residue A6     pep_flip value=    0.46
 ...
 Util>   20 fragments used for residue C133   pep_flip value=    0.63
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


3.2 RSC_fit

The RSC_fit command calculates the RMSD with the rotamer that is most similar to your sidechain conformation. Values greater than 1.5 Å (or even 1.0 Å for leucines and iso-leucines) need to be checked.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > rsc_fit m1 a2 c135
 Util> M1    A2    C135  M1
 Util>  The Rotamer_DB is now being loaded.
 Util> Calculating zone A2     to C135   in molecule M1    , object M1
 Util>  Best rotamer for A2     is No. 2 with rms   0.594
 Util>  Best rotamer for A3     is No. 1 with rms   0.702
 Util>   All atoms in this residue are fixed
 Util>  SCGLY    is missing.
 Util>   All atoms in this residue are fixed
 Util>  Best rotamer for A7     is No. 1 with rms   3.023
 Util>  Best rotamer for A8     is No. 2 with rms   2.282
 Util>   All atoms in this residue are fixed
 ...
 Util>  Best rotamer for C134   is No. 2 with rms   0.115
 Util>  Best rotamer for C135   is No. 1 with rms   0.294
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


3.3 RS_fit

The RS_fit command (in this case the real-space R-factor) calculates how well your model fits the (2Fo-Fc) map.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > map_file m1_2fofc.map
  O > rsr_map
 RSR> File_rsr_map is not defined.
 RSR> Enter file name [rsr.map]: m1_2fofc.map
 RSR> Name of map file? [m1_2fofc.map]:
  O > read odat/rsfit_all.o
  O > rsr_setup
 RSR> Automatic scaling? [Yes]:
 RSR> autoscale option on
 RSR> Contouring of refinement box? [No]:
 RSR> Which metod, CONV or DIFF? [CONV]:
 RSR> Maximize the convolution product
 RSR> Real space R factor(RFAC) or Correlation coefficient(RSCC)? [RFAC]:
 RSR> Attempt to subtract out neighbour atom density? [Yes]:
 RSR> Densities will be subtracted
 RSR> Define number of scans [5]:
 RSR> Define shifts [ 0.30 0.20 0.10 0.10 0.05]:
 RSR> Define overall B [ 20.00]:
 RSR> Define wall [ 3.50]:
 RSR> Define C and Ao [ 1.04 0.90]: 0.95 0.85
 RSR> Define integration radius [ 3]:
 RSR> Define scale to be applied to calculated density [ 40.76]:
  O > rs_fit m1 a2 c135
 Util> M1    A2    C135  M1
 Util> Calculating zone A2     to C125   in molecule M1   , object M1
 Util>   33 atoms in zone
 Util> Plus value for this map is:  114
 Util>    8 atoms for residue A2     R factor= 0.350
 Util>   11 atoms for residue A3     R factor= 0.287
 Util>    5 atoms for residue A4     R factor= 0.274
 Util>    4 atoms for residue A5     R factor= 0.246
 ...
 Util>    7 atoms for residue C134   R factor= 0.286
 Util>   11 atoms for residue C135   R factor= 0.317
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


3.4 Yasspa

Also run Yasspa to figure out which residues are in helices, strands or other regions.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > yasspa m1 alpha 0.5
 Util> Template size :    5 residues.
 Util>  There were      19
  O > yasspa m1 beta 0.8
 Util> Template size :    5 residues.
 Util>  There were      63
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


3.5 Info

Click on an atom on the screen, for instance on the CZ of the arginine residue in the centre. Check that the information line at the top of the screen now shows:
- molecule name, residue name and type, atom name;
- Cartesian coordinates and temperature factor of the atom;
- the RSC_fit value of the residue;
- the Pep_flip value of the residue;
- the real-space R-factor of the residue;
- the secondary structure type (ALPHA, BETA or "nothing").

Now write out some datablocks for use with OOPS, and stop the program:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > dir m1_resi*
 Heap>  M1_RESIDUE_NAME           C W       123
 Heap>  M1_RESIDUE_TYPE           C W       123
 Heap>  M1_RESIDUE_POINTERS       I W       246
 Heap>  M1_RESIDUE_CG             R W       492
 Heap>  M1_RESIDUE_PEPFLIP        R W       123
 Heap>  M1_RESIDUE_RSC            R W       123
 Heap>  M1_RESIDUE_RSFIT          R W       123
 Heap>  M1_RESIDUE_2RY_STRUC      C W       123
  O > wr M1_RESIDUE_NAME resnam.o ;
  O > wr M1_RESIDUE_TYPE restyp.o ;
  O > wr M1_RESIDUE_PEPFLIP pepflip.o ;
  O > wr M1_RESIDUE_RSC rsc.o ;
  O > wr M1_RESIDUE_RSFIT rsrfac_all.o ;
  O > stop
 As1>  Saved
 As1> Graphics released.
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


3.6 The Swedish Inquisition

(1) How are Pep_flip values calculated ? Why do the two N and C terminal residues of a continuous stretch of residues have Pep_flip values of zero ? Is this statistic related to the Ramachandran plot, and if so: how ?
(2) How are RSC_fit values calculated ? What happens for glycine and alanine residues ?
(3) What is the difference between the real-space R-factor and the real-space correlation coefficient ? Why did we not use the default values for A0 and C in RSR_setup ?
(4) Explain how Yasspa decides if a residue is in an alpha helix.
(5) If you want to create a file which contains one line per residue, with the residue name, type, RSC_fit and Pep_flip value, how would you go about ?
(6) Which O datablock determines what information is shown at the top of the screen when you click on an atom ?
(7) Name two ways to add a command to the O menu.
(8) How can you can calculate RS_fit values for a subset of the atoms in each residue, for example only for the main chain atoms ?


4 OOPS


4.1 Running OOPS

Now we will run OOPS. Before you start the program, create a subdirectory called oops (and read the OOPS manual if you're not familiar with the program):

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 % 144 gerard rigel 20:40:18 gmrp/o > mkdir oops
 % 145 gerard rigel 20:40:18 gmrp/o > run oops
 ...
 Print statistics and histograms ? (Y)
 Auto-generate (some) O2D plot files ? (Y)

Molecule name in O ? (M1)

O data block with residue names ? (resnam.o) ... O data block with residue types ? (restyp.o) ... Nr of WATERs : ( 0)

Analyse pep-flip values ? (Y) y O data block with pep-flip values ? (pepflip.o) ... Number of values .................... 115 Average value ....................... 0.913 Standard deviation .................. 0.670 Minimum value observed .............. 0.117 Maximum value observed .............. 4.340 ... Nr >= 2.0000 and < 2.2500 : 2 ( 1.74 %; Cum 93.91 %) Nr >= 2.2500 and < 2.5000 : 3 ( 2.61 %; Cum 96.52 %) Nr >= 2.7500 and < 3.0000 : 1 ( 0.87 %; Cum 97.39 %) Nr >= 3.0000 and < 3.2500 : 1 ( 0.87 %; Cum 98.26 %) Nr >= 3.2500 and < 3.5000 : 1 ( 0.87 %; Cum 99.13 %) Nr >= 4.2500 and < 4.5000 : 1 ( 0.87 %; Cum 100.00 %) ... O2D plot file ? (m1_pepflip.plt) Plot file written

Pep-flip cut-off ? ( 2.500) 2.0 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----


4.2 Example output

Since this is our first rebuilding round (which means we have to visit all residues anyway), we will use rather strict criteria.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Analyse RS-fit values (all atoms) ? (Y) n

Analyse RS-fit values (main chain) ? (N) n

Analyse RS-fit values (side chain) ? (N) n

Analyse RS R-factor (all atoms) ? (N) y O data block with RS R-factors ? (rsrfac_all.o) ... Number of values .................... 123 Average value ....................... 0.287 Standard deviation .................. 0.040 Minimum value observed .............. 0.200 Maximum value observed .............. 0.429 ... O2D plot file ? (m1_rsrfac_all.plt) Plot file written

RS R-factor cut-off ? ( 0.329) 0.3 RS R-factor cut-off WATERs ? ( 0.329)

Analyse RSC values ? (Y) ... Number of values .................... 85 Average value ....................... 0.980 Standard deviation .................. 0.758 Minimum value observed .............. 0.093 Maximum value observed .............. 3.023 ... O2D plot file ? (m1_rsc.plt) Plot file written

RSC cut-off ? ( 1.500)

Analyse if mask is too tight ? (N) n

Analyse low temperature factors ? (N)

Analyse high temperature factors ? (Y)

PDB file ? (m1.pdb) CRYST1 ( 45.650 47.560 77.610 90.00 90.00 90.00 P 21 21 21 4) ... Max CA-CA distance for neighbours ? ( 4.500) ... Threshold for high Bs ? ( 21.934) 20 Threshold for high Bs WATERs ? ( 27.620) Checking high Bs ...

Analyse RMS delta-B bonded atoms ? (N)

Analyse low occupancies ? (N)

Analyse high occupancies ? (N)

Analyse phi-psi values ? (Y) Checking allowed PHI-PSI areas Nr of residues with defined PHI : ( 121) Nr of residues with defined PSI : ( 121)

Analyse peptide planarity ? (Y) ... O2D plot file ? (m1_pep_plan.plt) Plot file written

Maximum absolute deviation ? ( 5.800) 3

Analyse C-alpha chirality ? (Y) ... O2D plot file ? (m1_ca_chir.plt) Plot file written

Maximum absolute deviation ? ( 3.500)

Compare with previous model ? (N)

Analyse QualWat values ? (N)

Analyse nr of bad contacts ? (N)

User-definable criteria Max number of them : ( 10)

Enter file with user datablock (<CR> to stop): ( )

Nr of user criteria : ( 0)

You may opt to get the details listed on the screen. Do you want to see the details ? (Y) n ... Do you want to have a list file ? (Y) Name of the list file ? (m1_rebuild.notes) ... Create pseudo-PDB file ? (N) ... O command(s) to execute in every macro ? (bell print DONE) @fast_dials ... Do you want macros for ALL residues ? (N) y ... Do you want CHAINED macros ? (Y) ... Do you want macros named as RESIDUES ? (Y)

OOPS - (ASN A2) OKAY - (PHE A3) ... OKAY - (VAL C134) OOPS - (ARG C135)

Nr of macros : ( 123) Nr of baddies : ( 61)

Start by typing @oops.omac in O !!!

Bad pep-flip : ( 9) ... Bad RS R-factor (all atoms) : ( 37) Bad RSC : ( 23) ... O2D plot file ? (m1_badcounts.plt) Plot file written Writing oops_remarks.pdb ... ... ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----


4.3 Files

OOPS has created:
- macros (in sub-directory gmrp/o/oops) to take you from one residue to the next, and to tell you what is suspicious about them;
- plot files ("m1_*.plt"); these can be converted to PostScript with the program O2D or the script "OMAC/o2dps";
- "m1_rebuild.notes", a very useful file which you can use as your electronic notebook file;
- "oops_remarks.pdb", containing some statistics as PDB REMARK cards;
- "oops_badcounts.o", an O datablock containing the number of violated criteria for each residue (can be read into O and used to colour your molecule, for instance);
- "oops.omac", the first OOPS macro you have to execute inside O; this is the only time you need to type an O command to execute an OOPS macro; once that is done, subsequent commands will be added to (and later removed from) the O menu automagically.


4.4 The Swedish Inquisition

(1) Make a CA object in O which is coloured according to the number of violations found by OOPS.
(2) For a well-refined protein model, we expect ~1-2% of the residues to have unusual Pep_flip values (2.5 Å cutoff), and ~5-10% to have non-rotamer sidechain conformations (1.5 Å cutoff). What are these numbers for your current model ? What does this tell you about the quality of your starting model?
(3) Run ProCheck on the starting model. What conclusion(s) can you draw from this ? Contrast this with what you know about the quality of the model. Explain why the following phrase in a paper about a low-resolution structure is meaningless: "According to ProCheck, the final model has a better quality than other structures solved at similar resolution."


5 REBUILDING

In this section, we will take you to a few spots in the current model which need attention. One example of each type of problem will be dealt with here; it is up to you to apply this to the entire model. For instance, we shall only discuss one example of a residue with a completely wrong sidechain conformation, but there are many more such residues, which you have to detect and rebuild yourself.

While you rebuild, edit the file "m1_rebuild.notes" to keep track of the changes you make to the model, observations regarding as-yet unmodeled entities etc.


5.1 Sequence

As you already know, the current model is fairly incomplete. Below is the correct sequence for human CRABPII:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 SEQRES   1    137  PRO ASN PHE SER GLY ASN TRP LYS ILE ILE ARG SER GLU  1CBS 202
 SEQRES   2    137  ASN PHE GLU GLU LEU LEU LYS VAL LEU GLY VAL ASN VAL  1CBS 203
 SEQRES   3    137  MET LEU ARG LYS ILE ALA VAL ALA ALA ALA SER LYS PRO  1CBS 204
 SEQRES   4    137  ALA VAL GLU ILE LYS GLN GLU GLY ASP THR PHE TYR ILE  1CBS 205
 SEQRES   5    137  LYS THR SER THR THR VAL ARG THR THR GLU ILE ASN PHE  1CBS 206
 SEQRES   6    137  LYS VAL GLY GLU GLU PHE GLU GLU GLN THR VAL ASP GLY  1CBS 207
 SEQRES   7    137  ARG PRO CYS LYS SER LEU VAL LYS TRP GLU SER GLU ASN  1CBS 208
 SEQRES   8    137  LYS MET VAL CYS GLU GLN LYS LEU LEU LYS GLY GLU GLY  1CBS 209
 SEQRES   9    137  PRO LYS THR SER TRP THR ARG GLU LEU THR ASN ASP GLY  1CBS 210
 SEQRES  10    137  GLU LEU ILE LEU THR MET THR ALA ASP ASP VAL VAL CYS  1CBS 211
 SEQRES  11    137  THR ARG VAL TYR VAL ARG GLU                          1CBS 212
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


5.2 Extending the N-terminus

Execute the OOPS macro for the first residue, draw the maps and generate symmetry objects.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > @oops/a2
  O > Macro in computer file-system.
 As4> No object defined.
 As4> M1    A2    A2    M1
 As4> Centering on zone from A2 to A2
  O >  As4> Residue ASN A2
  O >  As4> Bad RS R-factor (all atoms) = 0.350
  O >  As4> Too high temperature factor = 29.11
  O >  As4> Non-planar peptide; improper = 3.86
  O >   O > Macro in database.
  O >   O >  As4> Hit or type "@oops/a3" for next residue
  O >   O >   O >   O > @maps
  O > Macro in computer file-system.
 As2>  Symbol inserted.
  O >   O >   O >   O >   O >   O >   O >   O >   O >   O >
  O >   O >   O >   O >   O >   O >   O >   O >   O >   O >
  O >   O >   O >   O > @symmy
  O > Macro in computer file-system.
 Sym> Molecule c.g. =      17.98     21.20     27.16
 Sym> Radius =      26.22
 Sym> Symmop  3, Shift  0 0 0
 Sym> Centre of gravity updated for     1  123
  O > Macro in database.
  O >   O >   O >   O >
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Note that there is no clear density for the missing N-terminal proline residue, but just to demonstrate how to insert a residue at the N-terminus we will go ahead anyway.

First save your current model (and all associated data) in an O-format file (containing all the datablocks of the model). If we screw up, all we have to do is to Read_form this file again.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > write m1_* m1_save.odb ;
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Now centre on the CA atom of residue A2.
Since the Mutate_insert command can only insert residues after existing ones, we need to use a slight detour. We shall insert a residue after residue A2, copy the coordinates of A2 to this new residue, and then Mutate_replace A2 to proline and renumber the residues.

(1) Insert a residue after the current N-terminus of the same type as the current N-terminal residue:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > mut_ins
 Mut>  Mutate a molecule by inserting residues.
 Mut> Molecule ([M1    ]) :
 Mut> After which residue: a2
 Mut> New residue name and type (<cr> to end) : a2a asn
 Mut> New residue name and type (<cr> to end) :
 Mut>  There are     1 mutations
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

(2) Copy the coordinates of A2 to the new residue A2A:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > merge_atoms
 Sam> Merge from molecule name, and zone: m1 a2 a2
 Sam> Merge to molecule name and start residue: m1 a2a
 Sam> Datablock containing transformation [<cr> identity]:
 Sam>      8 atoms
 Sam>      8 updated.
 Sam> Centre of gravity updated for     2    2
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

(3) Replace the new N-terminus by the correct residue type:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > mut_repl
 Mut>  Mutate a molecule by replacing one residue type
 Mut>  by another.
 Mut> Molecule ([M1    ]) :
 Mut> Residue name and new type (<cr> to end) : a2 pro
 Mut> Residue name and new type (<cr> to end) :
 Mut>  There are     1 mutations
 Mut>  The Rotamer_DB is now being loaded.
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

(4) Redraw the object. The N-terminal proline now has its CA atom in the same position as the asparagine.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > mol m1 zo ; end
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

(5) There are several ways to get the proline in the correct position. In this case, it's probably easiest to use Move_zone on the proline, and then use Lego_side_ch (if necessary), Tor_residue and Refi_zone (and perhaps RSR_rigid) to apply the finishing touch. Unfortunately, the Lego_auto_mc command does not work for terminal residues, although you could get around that by defining an extra "dummy" residue. In that case you would have two more alternatives:
Alternative 1: use the Baton command to place the CA of the proline and of the dummy CA; then use Lego_au_mc and Lego_au_sc, Move_zone, etc.
Alternative 2: use Move_atom to place the CA of the proline in the density and to place the dummy CA; then use the same commands as in alternative 1 to touch things up.
Hint: when you use Move_zone and double-click on an atom (e.g., the CA atom), it will become the pivot for rotations of the residue.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > move_zone m1 a2 a2
 Mnp> M1    A2    A2    M1
 Mnp> Fragment pivot point:   22.088  15.961  43.470
  O >   O > Macro in database.
  O >   O >   O >   O > Macro in database.
  O >   O >   O >   O >   O >   O >   O > Macro in database.
  O >   O >   O >   O >  Mnp> Coordinates updated
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

You will need four of the menu commands during the move: Dial_next, Dial_prev, @fast_dials and @slow_dials. When you're happy, click or type Yes.

Use RSR_rigid to improve the fit to the density:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > rsr_rig m1 a2 a2
 RSR> Refining zone A2 to A2 in molecule M1 , object M1
 RSR>    7 atoms in zone
 RSR>   36 atoms in refinement box
 RSR> Old scale: 47.6289 ; new scale: 311.5877
 RSR> Shifts for this group:
 RSR>  #      x       y       z           rotx    roty    rotz     megavalue
 RSR>     1   0.000   0.000   0.900     -8.000  17.000  -8.000     5.68652
 RSR>     2   0.000   0.000   0.900     -8.000  11.000  -8.000     5.68830
 RSR>     3  -0.100  -0.100   0.700    -14.000  11.000  -8.000     5.70376
 RSR>     4  -0.200  -0.100   0.700    -14.000  11.000  -8.000     5.70476
 RSR>     5  -0.100  -0.150   0.700    -14.000  11.000  -8.000     5.70840
  O > yes
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Redraw the object (zo ; end). Perhaps the proline and asparagine are too far apart for the peptide bond to be drawn. Use Refi_zone to regularise the N-terminus of the model:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > re_zo m1 a2 a5 m1 yes
 Refi > M1    A2    A5    M1
 Refi >  Refining zone A2     to A5     in molecule M1    , object M1
 Refi >   563 lines read from dictionary
 Refi > Number of cycles is    10
      ++++++++++
 Refi >   R.m.s.d. in bond lengths, angles, fixed diherals
 Refi >                     0.07      3.89      7.63
 Refi > Centre of gravity updated for     1    5
 Refi > Accept new coordinates? Hit *Yes/*No
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Note that the geometry is very poor in this region. Repeat the Refi_zone command a number of times until you get reasonable stereo-chemistry.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 ...
 Refi >   R.m.s.d. in bond lengths, angles, fixed diherals
 Refi >                     0.02      1.86      4.39
 Refi > Centre of gravity updated for     1    5
 Refi > Accept new coordinates? Hit *Yes/*No
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Note that the Refi_zone command keeps the first and last residues anchored. In this case, that's a trifle unfortunate, since the density for the proline is very poor and we are therefore not at all sure of its exact location and orientation. But this will have to do for now. Redraw the object and check if the peptide bond is drawn. If not, regularise some more (or Move_zone the proline into a better position/orientation and do the regularisation again).
Hint: use the Dist_define and Trig_refresh commands to monitor the CA-CA distance of the first two residues (it should be 3.7-3.9 A).

(6) Rename the residues at the N-terminus. You may also want to reset the colours of the mutated residues. Save your model and database (note that we are saving our rebuilt model; so we will call the file "m2.pdb"). You may also want to update the quality indicator values for the rebuilt N-terminus (Pep_flip, RSC_fit and RS_fit).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > zo ; end
  O > sam_rename
 Sam> What molecule [M1    ]:
 Sam> Residue range [all molecule]: a2 a3
 Sam> NEW name of FIRST residue [a2    ]: a1
  O > @omac/cnos_colours.omac
  O > Macro in computer file-system.
  O >  Which molecule ? m1
  O >   O >   O > zo ; end
  O > s_a_out m2.pdb m1 ;;;;;
 Sam> Coordinate file type assumed from file name is PDB
 Sam>        914 atoms written out.
  O > save
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


5.3 Wrong peptide

One of the worst pep-flip values occurs for residue A5 (a glycine). Execute the OOPS macro for this residue, draw the maps and symmetry objects.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > @oops/a5
  O > Macro in computer file-system.
 As4> No object defined.
 As4> M1    A5    A5    M1
 As4> Centering on zone from A5 to A5
  O >  As4> Residue GLY A5
  O >  As4> Bad pep-flip = 3.28
  O >   O > Macro in database.
  O >   O >  As4> Hit or type "@oops/a6" for next baddy
  O > @maps
 ...
  O > @symmy
 ...
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Is there something wrong with the peptide orientation ? Note that the carbonyl oxygen has good 2Fo-Fc density, but that there is positive difference density to support a flipped peptide. Also note that the overall fit to the density for residues A4 to A6 is poor. This may well be caused by strain introduced by an incorrect orientation of the peptide plane of the glycine. These two observations are sufficient reason for us to see if a flipped peptide might improve the model.

Use the Flip_pep command to flip the peptide plane of glycine A5.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > flip_pep a5
 Mnp> M1    A5    CA    M1
 Mnp> Flipping peptide of residue A5
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Use the Move_zone command to move the glycine better into the density (and do the same for the neighbouring residues A4 and A6). Then use the Refi_zone command to regularise this part of the model.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > mo_zo a5
 Mnp> No object defined.
 Mnp> M1    A5    A5    M1
 Mnp> Fragment pivot point:   26.454  22.102  39.548
 ...
   O >   O >   O >  Mnp> Coordinates updated
  O >   O > re_zo m1 a1 a10 m1 yes
 Refi > M1    A1    A10   M1
 Refi >  Refining zone A1     to A10    in molecule M1    , object M1
 Refi > Number of cycles is    10
 Refi >   R.m.s.d. in bond lengths, angles, fixed diherals
 Refi >                     0.02      1.68      2.93
 Refi > Centre of gravity updated for     1   10
 Refi > Accept new coordinates? Hit *Yes/*No
 ...
 Refi >   R.m.s.d. in bond lengths, angles, fixed diherals
 Refi >                     0.01      1.29      2.51
 Refi > Centre of gravity updated for     1   10
 Refi > Accept new coordinates? Hit *Yes/*No
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Note that the RMSD for fixed dihedrals is still fairly high. If this value remains high (> 2 degrees), even after repeated regularisation, this often indicates strain due to an as-yet unflipped peptide.

By the way, did you notice anything funny with respect to the peptide of residue A3 ? Calculate pep-flip values for residues A1 to A10.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > pep_fl m1 a1 a10
 Util> M1    A1    A10   M1
 Util> Calculating zone A1     to A10    in molecule M1    , object M1
 ...
 Util>   20 fragments used for residue A3     pep_flip value=    1.02
 Util>   16 fragments used for residue A4     pep_flip value=    2.77
 Util>   20 fragments used for residue A5     pep_flip value=    1.38
 Util>   20 fragments used for residue A6     pep_flip value=    0.67
 Util>   20 fragments used for residue A7     pep_flip value=    0.48
 Util>   20 fragments used for residue A8     pep_flip value=    0.88
 Util>   20 fragments used for residue A9     pep_flip value=    0.98
 Util>   20 fragments used for residue A10    pep_flip value=    0.94
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Note that residue A3 has a normal pep-flip value, but that the density seems to tell a different story. We will come back to this residue in the next section.

Residue A4 has a large pep-flip value. However, the 2Fo-Fc density is good, and there is no difference density which would indicate that the peptide might have to be flipped. Most importantly, however, the carbonyl oxygen hydrogen bonds to the sidechain of arginine C135 (their density features are connected). Sometimes, high pep-flip values are observed for residues which have an unusual orientation for a very good reason (in this case, in order to form a hydrogen bond). Such cases are NOT errors in the model; they are unusual (and sometimes crystallographically or biologically interesting) features of a model !


5.4 Wrong sidechain

Execute the OOPS macro for tryptophan residue A7, etc.:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > @oops/a7
  O > Macro in computer file-system.
 As4> No object defined.
 As4> M1    A7    A7    M1
 As4> Centering on zone from A7 to A7
  O >  As4> Residue TRP A7
  O >  As4> Bad RSC = 3.02
  O >   O > Macro in database.
  O >   O >  As4> Hit or type "@oops/a8" for next baddy
  O > @maps
 ...
  O > @symmy
 ...
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

This residue has a very high RSC value. This doesn't always indicate problems in the case of tryptophan and methionine residues in particular, since there were very few observations of such residue types in the original rotamer study of Ponder and Richards from which the O rotamers are derived. However, in this case there are more suspicious features:
- there are two big blobs of positive difference density near the sidechain;
- the carbonyl oxygen of residue A3 looks as if it ought to rotated by ~120 degrees so as to fit the density better. However, with the present orientation of the tryptophan sidechain, this would lead to a bad contact between the carbonyl oxygen and a carbon atom in the ring.

Use the Lego_side_ch command to see if any of the tryptophan rotamers fits the density better than the present non-rotamer.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > lego_si_ch a7
 Lego> M1    A7    CA    M1
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

It is clear that rotamer number 1 (i.e., the most-frequently observed sidechain conformation for tryptophan residues) can easily be made to remedy both problems observed above: it explains the two blobs of difference density, and it would enable a hydrogen bond between its ring nitrogen atom and a better oriented carbonyl oxygen of A3 !

Accept the sidechain conformation by typing Yes (or clicking Yes on the menu).

Use Move_zone or RSR_rigid to move the tryptophan better into the density (in this particular case, the rotamer fits so well that we don't even have to adjust the sidechain torsions; however, often some fine-tuning of the chi torsion angles may be necessary - use the Tor_residue command to do that).

This is a very good example of the type of error that occurs very often at low resolution when databases are not used during rebuilding. Also note that an error in one place (the tryptophan) may introduce other errors (the peptide plane of A3). And the accumulation of many, in themselves small and local, errors make the difference between a good model and a poor one.

Rebuild the peptide of residue A3 as shown earlier, Refi_zone the first ten residues, recalculate Pep_flip, RSC_fit and RS_fit for these residues, and save the improved model. Hint: in the case of A3, flipping alone is not good enough. In this case you may want to use Move_fragm to correctly orient the peptide plane (click on the carbonyl carbon or oxygen to identify the fragment).

After the rebuild, you may find that both residue A3 and A4 now have high pep-flip values; however, in both cases there is a good reason for their being unusual.

Note: residues with polar or charged end-groups also often have non-rotamer conformations. If they are at the surface they are often disordered and have poor or no sidechain density (in such cases, put in a rotamer). If they have good density, they will usually be involved in saltlinks or hydrogen bonds, and the energy gain from that will outweigh the loss due to less favourable chi-torsion angle combinations.


5.5 Mutating a residue

Residue A9 is one which is currently alanine, but ought to be something else in the correct sequence of human CRABPII (in this case, iso-leucine).

Execute the relevant OOPS macro etc. Mutating a residue consists of two steps:
- use Mutate_replace to assign the correct residue type. O will put it in as the most common sidechain rotamer;
- use the normal rebuilding tools to fit the density (Lego_side_ch to get the correct rotamer, Tor_residue to adjust sidechain torsions, sometimes Move_zone or RSR_rigid, and finish with Refi_zone to regularise).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > mut_repl
 Mut>  Mutate a molecule by replacing one residue type
 Mut>  by another.
 Mut> Molecule ([M1    ]) :
 Mut> Residue name and new type (<cr> to end) : a9 ile
 Mut> Residue name and new type (<cr> to end) :
 Mut>  There are     1 mutations
  O > zo ; end
  O > le_si_ch a9
 Lego> M1    A9    CA    M1
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

In this case, rotamer number 4 is the most suitable one. Select it and make it fit the density better. In this case, RSR_rigid does a good job for the sidechain, but it distorts the mainchain. However, just a few cycles of Refi_zone will remedy that.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O >  RSR> Refining zone A9 to A9 in molecule M1 , object M1
 RSR>    8 atoms in zone
 RSR>   46 atoms in refinement box
 RSR> Old scale: 334.1331 ; new scale: 341.3141
 RSR> Shifts for this group:
 RSR>  #      x       y       z           rotx    roty    rotz     megavalue
 RSR>     1  -0.300   0.600  -0.300     12.000 -17.000   6.000     6.56684
 RSR>     2  -0.300   0.400  -0.100     20.000 -21.000  10.000     6.71406
 RSR>     3  -0.400   0.400   0.000     26.000 -23.000  14.000     6.75660
 RSR>     4  -0.400   0.400   0.000     30.000 -23.000  14.000     6.76478
 RSR>     5  -0.400   0.450   0.000     30.000 -23.000  14.000     6.76499
  O > re_zo m1 a5 a15 m1 yes
 Refi > M1    A5    A15   M1
 Refi >  Refining zone A5     to A15    in molecule M1    , object M1
 Refi >  Unable to anchor atom CB     in residue A5
 Refi > Number of cycles is    10
 Refi >   R.m.s.d. in bond lengths, angles, fixed diherals
 Refi >                     0.02      1.55      2.17
 Refi > Centre of gravity updated for     5   15
 Refi > Accept new coordinates? Hit *Yes/*No
 ...
 Refi >   R.m.s.d. in bond lengths, angles, fixed diherals
 Refi >                     0.01      1.00      1.41
 Refi > Centre of gravity updated for     5   15
 Refi > Accept new coordinates? Hit *Yes/*No
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


5.6 Inserting residues

Our starting model contains two breaks. Residues 100-106 are missing since they fitted the density poorly in the CRABPI structure (high temperature factors, poor density), and residues 114-117 are near the only insertion site in the sequence of CRABPII compared to that of CRABPI. We shall build the latter here; you may build the former yourself.

Centre on the relevant place and draw the maps and symmetry objects (to prevent accidental use of density belonging to a symmetry-related molecule):

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > ce_zo b113 c118
 As4> No object defined.
 As4> M1    B113  C118  M1
 As4> Centering on zone from B113 to C118
  O > @maps
 ...
  O > @symmy
 ...
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Note that there is reasonable density for the missing mainchain and a number of sidechains. The missing residues are: Thr, Asn, Asp, Gly and Glu. The sidechain density for the Asn, Asp and Glu is quite reasonable.

(1) Use Mutate_insert to insert the missing residues into the sequence:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > mut_ins
 Mut>  Mutate a molecule by inserting residues.
 Mut> Molecule ([M1    ]) :
 Mut> After which residue: b113
 Mut> New residue name and type (<cr> to end) : b113a thr
 Mut> New residue name and type (<cr> to end) : b113b asn
 Mut> New residue name and type (<cr> to end) : b113c asp
 Mut> New residue name and type (<cr> to end) : b113d gly
 Mut> New residue name and type (<cr> to end) : b113e glu
 Mut> New residue name and type (<cr> to end) :
 Mut>  There are     5 mutations
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

(2) Now we must get coordinates for the CA atoms. You can use Baton to do this or Lego_loop. In this case, we shall use Lego_loop (don't forget to save all the datablocks of the current model before doing this !). The O manual and tutorial explain the use of this command in more detail.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > wr m1_* m1_save.odb ;
  O > select_on m1 ;
  O > sel_off m1 b113a b113e
  O > lego_loop m1 b110 c121
 Lego> M1    B110  C121  M1
 ...
 Lego>  Number of selected atoms in zone is     8
 Lego>  DGNL> Top matches
 Lego>  Protein   Start Res.    Score  Sequence
 Lego>    SGA_2        112        0.492 ATVNYGSSGIVYG
 Lego>    SGA_2         93        0.514 VQRSGSTTGLRSG
 Lego>    PTN_2        179        0.909 GPVVCSGKLQGIV
 Lego>    APP_2         71        0.913 WSISYGDGSSASG
 Lego>    OVO_1        133        0.925 RPVCGSDNKTYSN
 ...
 Lego>    PA            23        1.324 VFRKAADDTWEPF
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

(3) Type or hit On_off and click off the DB_CA object. Then use the dials to check each of the hits in turn. Make sure not to select a loop which puts mainchain atoms inside sidechain density (or the other way around).
In this case, loop number 5 fits fairly well, so select this one.

The fit can be improved somewhat. To this end, make a CA object and use Move_atom to move the CA atoms of the new residues better into place.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > obj ca ca b110 c121 end
  O > ce_at b113
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

(3) Use Lego_auto_mc to generate the mainchain, and Lego_auto_sc to generate the sidechains. Regularise the region.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > le_au_mc m1 b110 c121
 Lego> M1    B110  C121  CA
 Lego> Centre of gravity updated for   104  106
 Lego> Centre of gravity updated for   107  109
 Lego> Centre of gravity updated for   110  112
 Lego> Centre of gravity updated for   113  114
  O > le_au_sc m1 b113 c118
 Lego> M1    B113  C118  CA
 Lego>  SCGLY    is missing.
 Lego>  Unable to draw the rotamers.
  O > mol m1 zo ; end
  O > re_zo m1 b108 c123 m1 yes
 Refi > M1    B108  C123  M1
 Refi >  Refining zone B108   to C123   in molecule M1    , object M1
 Refi > Number of cycles is    10
 Refi >   R.m.s.d. in bond lengths, angles, fixed diherals
 Refi >                     0.02      2.07      3.38
 Refi > Centre of gravity updated for   101  117
 Refi > Accept new coordinates? Hit *Yes/*No
 ...
 Refi >   R.m.s.d. in bond lengths, angles, fixed diherals
 Refi >                     0.01      1.32      2.22
 Refi > Centre of gravity updated for   101  117
 Refi > Accept new coordinates? Hit *Yes/*No
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

(4) Calculate pep-flip values and check the mainchain if necessary.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > pep_flip m1 b108 c123
 Util> M1    B108  C123  M1
 Util> Calculating zone B108   to C123   in molecule M1    , object M1
 Util>   20 fragments used for residue B109   pep_flip value=    0.69
 Util>   20 fragments used for residue B110   pep_flip value=    0.56
 ...
 Util>   20 fragments used for residue C122   pep_flip value=    0.39
 Util>   20 fragments used for residue C123   pep_flip value=    0.86
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

(5) For each residue, rebuild it so it fits the density. Remember that the Lego_auto_sc commands uses the most-frequent rotamer for each residue; this is not always the correct one ! For instance, leucine 113 has the conformation of the second rotamer.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > ce_at b108
  O > @maps
 ...
  O >   O >  Mnp> Fragment pivot point:   21.791  12.546  30.689
  O >  Mnp> Coordinates updated
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

(6) Regularise the model (and re-select the entire molecule).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > re_zo m1 b108 c123 m1 yes
 Refi > M1    B108  C123  M1
 Refi >  Refining zone B108   to C123   in molecule M1    , object M1
 Refi > Number of cycles is    10
 Refi >   R.m.s.d. in bond lengths, angles, fixed diherals
 Refi >                     0.04      3.14      3.12
 Refi > Centre of gravity updated for   101  117
 Refi > Accept new coordinates? Hit *Yes/*No
 ...
 Refi >   R.m.s.d. in bond lengths, angles, fixed diherals
 Refi >                     0.01      1.19      1.46
 Refi > Centre of gravity updated for   101  117
 Refi > Accept new coordinates? Hit *Yes/*No
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

(7) Renumber the sequence. The initial model consisted of three disconnected pieces which were therefore called chain A, B and C. Since we have bridged the gap between B and C we can call all residues in these regions "B". Since there is an insertion in the sequence of CRABPII (and the model still had the old CRABPI residue numbers), all residues from 113 to the C-terminus need to be renumbered. Don't forget to save your model and O database. You may also want to calculate pep-flip values etc. for the rebuilt region.

Note: normally, one would first rebuild using the OOPS macros and then insert missing bits. In this case we don't do that, so it's not a good idea to rename the residues now (since then the OOPS macros will fail to centre on the correct residues). So, defer the renaming for the moment. If you were to do it, it would go like this:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > sam_rename
 Sam> What molecule [M1    ]:
 Sam> Residue range [all molecule]: b113 c135
 Sam> NEW name of FIRST residue [b113  ]: b113
  O > sam_lis
 Sam> Molecule name [M1    ]:
 Sam>  Name   Type     From    To        Centre          Radius
 Sam>  A1     PRO         1     7   17.00   13.93   43.82    2.49
 Sam>  A2     ASN         8    15   20.77   16.93   43.72    3.07
 ...
 Sam>  A99    LEU       715   722    4.50   26.58   19.70    3.41
 Sam>  B107   THR       723   729    9.13   20.11   21.33    2.45
 Sam>  B108   ALA       730   734   10.88   17.69   24.18    1.97
 Sam>  B109   TRP       735   748   13.15   19.25   29.37    4.03
 Sam>  B110   THR       749   755   14.40   14.20   29.06    2.82
 Sam>  B111   ARG       756   766   17.23   17.61   32.30    4.51
 Sam>  B112   GLU       767   775   18.62   10.88   32.67    3.50
 Sam>  B113   LEU       776   783   20.66   11.26   38.55    3.36
 Sam>  B114   THR       784   790   22.72    9.01   36.35    2.52
 ...
 Sam>  B117   GLY       807   810   26.88   10.71   38.14    1.87
 Sam>  B118   GLU       811   819   26.91   10.31   34.59    3.87
 Sam>  B119   LEU       820   827   22.59   15.18   33.22    3.21
 ...
 Sam>  B135   VAL       936   942   30.31   15.06   33.97    2.88
 Sam>  B136   ARG       943   953   29.25   16.91   38.58    4.66
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Note: in this case we built a fairly short stretch of residues with several long sidechains which could be fitted unambiguously. This is not always the case. Sometimes long residues have poor or no sidechain density, and for long insertions you cannot even always be sure if your model's sequence is in register with the density. Since register errors are often difficult to track down, and since having mainchain in sidechain density and vice versa may be hard to correct by the refinement program, you are advised to always build the mainchain first in such cases. In other words, assign all newly built residues to be alanines; the let the refinement program find the correct fit of the mainchain to the density and only then build in the correct sidechains (provided there is reasonably convincing density for them).


5.7 Extending the C-terminus

One residue is missing from the C-terminus of our model, namely Glu 137. Adding it is similar to inserting a residue. Since the previous model's C-terminus was Arg 135 (actually B136), it used to have X-PLOR OT1 and OT2 oxygens. In this case, these have been altered (OT1 renamed to O and OT2 removed) already, but in general you will have to do this yourself (also at chain breaks), since O doesn't like these atoms at all.

Go to residue 135, and draw the maps and symmetry objects. Then go through the insertion and rebuilding motions:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > mu_ins
 Mut>  Mutate a molecule by inserting residues.
 Mut> Molecule ([M1    ]) :
 Mut> After which residue: c135
 Mut> New residue name and type (<cr> to end) : c136 arg
 Mut> New residue name and type (<cr> to end) :
 Mut>  There are     1 mutations
  O > mer_at
 Sam> Merge from molecule name, and zone: m1 c135 c135
 Sam> Merge to molecule name and start residue: m1 c136
 Sam> Datablock containing transformation [<cr> identity]:
 Sam>     11 atoms
 Sam>     11 updated.
 Sam> Centre of gravity updated for   130  130
  O > mut_repl
 Mut>  Mutate a molecule by replacing one residue type
 Mut>  by another.
 Mut> Molecule ([M1    ]) :
 Mut> Residue name and new type (<cr> to end) : c136 glu
 Mut> Residue name and new type (<cr> to end) :
 Mut>  There are     1 mutations
  O > zo ; end
  O > mo_zo c136
 Mnp> No object defined.
 Mnp> M1    C136  C136  M1
 Mnp> Fragment pivot point:   30.326  17.328  38.215
 Mnp>  Database compressed.
 Mnp> Compression caused by.save_col
  O >   O >   O >   O >   O >   O >   O >   O > Macro in database.
  O >   O >  Mnp> Coordinates updated
  O >   O > rsr_rigid m1 c136 c136 m1 yes
 ...
  O > re_zo m1 c130 c136 m1 yes
 ...
 Refi >   R.m.s.d. in bond lengths, angles, fixed diherals
 Refi >                     0.02      1.24      1.55
 Refi > Centre of gravity updated for   124  130
 Refi > Accept new coordinates? Hit *Yes/*No
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


5.8 Your turn

Start your own rebuilding session by executing the macro "oops.omac" and apply what you have learned above. Remember to save (and backup) your O file regularly. Before making big changes, it is a good idea to write out your molecule to a temporary file ("write m1_* m1_save.odb ;") just in case you (or O) screw up. Also try to build (at least a poly-Ala) for the other missing loop (100-106; you may need to contour the maps at a lower level than usual). When you're done rebuilding also put in the correct sidechains for residues which are different in CRABPI and II (if there is sidechain density, of course).


5.9 How did you do ?

If you have had/learned enough, you can finish with comparing your rebuilt model with the actual 1.8 Å crystal structure (PDB code 1CBS). Look at some of the places with large differences in mainchain conformation. Also check out some of the sidechains which you modeled differently; what does the 2.8 Å density look like ? Can you find clear examples of model bias (i.e., deceptively convincing density for a completely wrong sidechain conformation) ? What does this teach you about low-resolution models ? Discuss the oft-read phrase "the coordinate error is 0.2 Å (Luzzati, 1952)" when used to describe the "accuracy" of low-resolution models.

If you want to continue with the tutorial, don't forget to renumber your residues (from A1 to A137).

You may also simply skip the refinement part of the tutorial and go to the "New model" section of the Post-refinement chapter.


5.10 The Swedish Inquisition

(1) Find out what the effect was of the pep-flip of residue A5 on the position of A6 in the Ramachandran plot. Explain your observations. Knowing this, if you didn't change the peptide of A47, check it again.
(2) Why don't we add any water molecules to the model at this stage ?
(3) What could you do to remedy residues with a poor peptide improper ? And those with a poor CA-chirality virtual torsion ? Why are these torsions called "improper" or "virtual" ?
(4) How did you build Arg A11 ?
(5) Did you change Asp A13 ? The density fit is poor and it has a non-rotamer sidechain conformation. There is difference density nearby which fits one of the rotamers and which would enable a saltlink to Arg A11 NE. Asp A17 has a similar problem, but (model bias ?) the density for the current sidechain is deceptively good.
(6) Did you change the sidechain of Leu A18 ? At low resolution, very often "awkward" sidechain conformations of leucines can be replaced by a rotamer which fits the density equally well or better. At high resolution, non-rotamer leucines are rare ! Also check any other leucines which are not in a rotamer conformation.
(7) Why do we have symmetry objects on whenever we (re)build a residue ?
(8) How and why did you rebuild Met A27 ?
(9) Find out the chemical formula for the ligand, all-trans-retinoic acid. Have you seen any traces of density for this ligand in the maps ?
(10) Did you rebuild Ile A52 ? Check that, although this residue has a reasonable RSC value, rotamer 1 fits the density just as well and, in addition, explains a peak in the difference density.
(11) If you did Refi_zone on a zone which contained a cysteine (e.g., A81), you probably got an error message. This is because the CYS entry in the O Refi dictionary is for a disulfide. You can edit the dictionary file, remove the O datablock (which one ?) from your database, and use the Refi_setup command to point to the new file. Something similar happens for proline residues. What would happen if you used Refi_zone on a cis-proline ? How could you remedy this ?
(12) Use the Lsq commands to find the RMSD on CA atoms between your rebuilt model and the starting model.
(13) In directory gmrp/o/gerard, you'll find the result of my own quick and dirty rebuilding ("m2_gerard.pdb"). Calculate the RMSD between your model and this one. Are there any major differences between the two models ? Check the density in these places. Note that "m2_gerard.pdb" is still incomplete and still contains several errors (both in the mainchain and in the sidechains).


6 PREPARING FOR REFINEMENT


6.1 Model

We shall prepare the model (from the file "m2.pdb") for refinement with X-PLOR. We can do most of the work with MOLEMAN:
- read the PDB file (command: READ);
- generate chain and X-PLOR segment names (commands: AUTO or ASK_);
- correct sidechain atom naming if necessary (commands: CHECk or CORRect);
- since individual temperature factors are usually not a good idea at 2.8 Å, average the Bs so we get two Bs per residue, one for mainchain and one for sidechain atoms (command: AVER);
- reset very low or high Bs and set all occupancies to one (command: LIMIt);
- write the PDB file; often, you will have more than one segment, so the SPLIt command is handy to use (this will write each segment to a separate PDB file and will not write PDB records that X-PLOR doesn't like);
- we need coordinates for the carboxy-terminal oxygen OT2; these can be calculated with the SUGGest command, but you MUST edit the file to add this atom (and to rename the carbonyl oxygen of the C-terminal residue to OT1) !!

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Input PDB file ? (in.pdb) m2.pdb
 Number of lines read : (       1061)
 Number of atoms now  : (       1051)
 CPU total/user/sys :       2.7       2.7       0.0
 Option ? (READ_pdb_file) auto
 Generating chain and segids ...
 New chain  A, segid AAAA @ residue    1
 Nr of segments found : (          1)
 Option ? (AUTO) corr
 Nr of atoms : (       1051)
 Nr of residues : (        137)

Error in TYR 134 ... Swapped CD1/2 and CE1/2

# of PHE checked : 5 # errors : 0 # of TYR checked : 2 # errors : 1 # of ASP checked : 5 # errors : 0 # of GLU checked : 13 # errors : 0 # of ARG checked : 7 # errors : 0 WARNING - any attached hydrogens NOT renamed Option ? (CORR) aver Valid options are: 1. Average over all atoms (i.e., compute Boverall) 2. Average per residue over all atoms 3. Average per residue, separately for main and side-chain 4. Average corresponding atoms in different chains

Option ? ( 1) 3 Res 1 Nr_atoms 7 Bave-MC 19.49 ( 4) Bave-SC 23.19 ( 3) Res 2 Nr_atoms 8 Bave-MC 20.00 ( 4) Bave-SC 20.00 ( 4) ... Res 136 Nr_atoms 11 Bave-MC 9.61 ( 4) Bave-SC 17.89 ( 7) Res 137 Nr_atoms 9 Bave-MC 20.00 ( 4) Bave-SC 20.00 ( 5) Nr of temperature factors updated : ( 1051) Option ? (AVER) limit Enter MIN and MAX temperature factor : ( 2.000 99.900) 5 50 Enter MIN and MAX occupancy : ( 0.000 1.000) 1 1 Residue range to apply (0 0 = all molecule) ? ( 0 0) Nr of atoms updated : ( 1051) Option ? (LIMIt) split

Basename of PDB files ? (out) ../xplor/m2 New chain id : ( A) New pdb file : (../xplor/m2a.pdb) Nr of atoms written to it : ( 1051) Nr of atoms written in core : ( 1051) CPU total/user/sys : 3.0 2.9 0.1 Option ? (SPLIt) sugg Which residue number ? ( 1) 137 ... found N ... Dihedral CA-OT1-C-OT2 = ( 180.000)

==> OT1 NOW : 35.010 19.289 39.046 SUGGESTED : 35.063 19.267 39.037 ==> OT2 NOW : 0.000 0.000 0.000 SUGGESTED : 36.362 19.062 37.328

Check geometry of carboxylate group : Dist C-OT1 = ( 1.230) ... Dih CA-OT1-C-OT2 = ( 180.000) ==> YOU MUST ADD/EDIT OT1/OT2 YOURSELF !!!

Option ? (SUGG) quit ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----


6.2 Reflections

Go to the directory gmrp/hkl. You will find two reflection files, "crabp2_1.8a.hkl" and "crabp2_2.8a.hkl". We will ignore the former for the time being.

Use DATAMAN to convert these reflections into an X-PLOR reflection file and to generate Rfree flags for a subset of the reflections. First read the hkl file:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 DATAMAN > re m1 crabp2_2.8a.hkl
 File   : (crabp2_2.8a.hkl)
 Type   : (HKLFS)
 Format : (*)
 Nr of reflections read : (       3816)
 Nr of WORK reflections : (       3816)
 Nr of TEST reflections : (          0)
 Percentage TEST data   : (   0.000)
 This is NOT an Rfree dataset
 WARNING - less than 500 TEST reflections !
 DATAMAN > stats m1
 Stats : (M1)

Item Minimum Maximum Average Sdv Var ==== ======= ======= ======= === === H 0 16 6.816 3.793 14.384 K 0 16 6.376 4.314 18.609 L 0 27 9.457 6.528 42.619 Fobs 1.107E+02 2.608E+04 4.723E+03 3.090E+03 9.551E+06 SigFo 2.280E+01 2.106E+03 1.058E+02 7.459E+01 5.563E+03 Fo/Sig 1.381E+00 1.412E+02 5.110E+01 2.765E+01 7.644E+02

Correlation Fobs-SigFo : ( 0.302) Correlation Fobs-Fo/Sig : ( 0.626) Correlation SigFo-Fo/Sig : ( -0.330)

Nr of reflections : ( 3816) Nr of WORK reflections : ( 3816) Nr of TEST reflections : ( 0) Percentage TEST data : ( 0.000) This is NOT an Rfree dataset WARNING - less than 500 TEST reflections ! ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Now we can generate Rfree flags. Use the following rules-of-thumb:
- use 5-10% of the reflections, but not fewer than ~500 test reflections, and not more than ~2000;
- if there is NCS, generate test reflections in thin shells (RFree SHell command); otherwise use small spheres in reciprocal space (RFree SPheres command).

In this case (~3800 reflections), we will generate a test set of 500:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 DATAMAN > cell m1 45.65 47.56 77.61 90 90 90
 Cell : (  45.650   47.560   77.610   90.000   90.000   90.000)
 Volume (A3) : (  1.685E+05)
 DATAMAN > cal m1 res
 Calc : (M1)
 Cell volume : (  1.685E+05)
 Lowest  resolution : (  14.932)
 Highest resolution : (   2.800)
 DATAMAN > rf sph
 Which set ? (M1)
 Percentage TEST data ? (10) 500
 Converted to percentage : (  13.103)
 Reciprocal sphere radius ? (1)
 Encoding reflections ...
 Nr of TEST spheres : (         82)
 Nr of WORK reflections : (       3315)
 Nr of TEST reflections : (        501)
 Percentage TEST data   : (  13.129)
 This is an Rfree dataset
 WARNING - more than 13% TEST reflections !
 DATAMAN > st m1
 Stats : (M1)

Item Minimum Maximum Average Sdv Var ==== ======= ======= ======= === === H 0 16 6.816 3.793 14.384 K 0 16 6.376 4.314 18.609 L 0 27 9.457 6.528 42.619 Fobs 1.107E+02 2.608E+04 4.723E+03 3.090E+03 9.551E+06 SigFo 2.280E+01 2.106E+03 1.058E+02 7.459E+01 5.563E+03 Reso 2.800 14.932 4.072 1.615 2.607 Fo/Sig 1.381E+00 1.412E+02 5.110E+01 2.765E+01 7.644E+02

Correlation Fobs-SigFo : ( 0.302) Correlation Fobs-Fo/Sig : ( 0.626) Correlation SigFo-Fo/Sig : ( -0.330)

Nr of reflections : ( 3816) Nr of WORK reflections : ( 3315) Nr of TEST reflections : ( 501) Percentage TEST data : ( 13.129) This is an Rfree dataset WARNING - more than 13% TEST reflections ! ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Finally, write the reflections to a file in X-PLOR format with the Rfree flags included:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 DATAMAN > wr m1 crabp2_2.8a_rfree.xplor rxplor
 Nr of WORK reflections : (       3315)
 Nr of TEST reflections : (        501)
 Percentage TEST data   : (  13.129)
 This is an Rfree dataset
 WARNING - more than 13% TEST reflections !
 File   : (crabp2_2.8a_rfree.xplor)
 Type   : (RXPLOR)
 Format : ((' INDEX=',3i6,' FOBS=',f10.3,' SIGMA=',f10.3,' TEST=',i3))
 Write WORK and TEST set
 Nr of reflections stored  : (       3816)
 Nr of reflections written : (       3816)
 CPU total/user/sys :       1.3       1.0       0.3
 DATAMAN > quit
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


6.3 X-PLOR files

Go to the gmrp/xplor directory. You will find that a number of files are already there for you. A few of the problem-specific files (may) need to be edited:

(1) "crystal.xplor"
Enter the unit cell constants and spacegroup symmetry operators. Note that anything in { curly brackets } is treated as a comment by X-PLOR. In all example files, lines which may need to be edited by you are indicated by: { *** EDIT ME *** }. It should look as follows:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 { crystal.xplor }

{ unit cell for holo-CRABP II crystal } a=45.65 b=47.56 c=77.61 { *** EDIT ME *** } alpha=90. beta=90.0 gamma=90. { *** EDIT ME *** }

{ symmetry operators for spacegroup P212121 } { *** EDIT ME *** } symmetry=(x,y,z) symmetry=(-x+1/2,-y,z+1/2) symmetry=(-x,y+1/2,-z+1/2) symmetry=(x+1/2,-y+1/2,-z) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

(2) "charges.xplor"
Does not need to be edited normally.

(3) "reflxns.xplor"
Edit the name of your reflection file. Note that REMARK lines will be echoed to your output PDB files. It should look as follows:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 { reflxns.xplor }

{ read reflections } nreflections=100000 reflection @../hkl/crabp2_2.8a_rfree.xplor end { *** EDIT ME *** }

REMARK Uses 13% Rfree reflection file in P212121 holo-CRABPII

{ resolution range } resolution $lo_res $hi_res

{ two-sigma and F-magnitude cutoff } reduce

{ do amplitude ( fobs = fobs * heavy(fobs - 2.0*sigma)) } { fwindow 0.001 1000000 }

REMARK Uses *NO* sigma or amplitude cut-off ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

(4) "parameters.xplor"
Normally doesn't need to be edited, unless you start introducing non-protein entities (waters, ligand, carbohydrates, metal ions, etc.). It may look as follows:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 { parameters.xplor }

parameter @parhcsdx.pro { @param19.sol } nbonds atom cdie shift eps=8.0 e14fac=0.4 cutnb=7.5 ctonnb=6.0 ctofnb=6.5 nbxmod=5 vswitch wmin=0.5 end remark dielectric constant set to 8.0 (EPS) remark using UPDATED Engh & Huber parameters parhcsdx.pro remark close contacts printed only if dist < 0.5 A (WMIN) end ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Other files include:
- topology files ("top*");
- parameter files ("par*");
- X-PLOR input files ("*.inp");
- "printr.xmac", an X-PLOR "macro" which updates Fcalc and calculates and prints the R-factors;
- "rfree.csh", a command script that generates plots of R and Rfree versus progress of refinement (requires the utility programs ODBM and O2D).

The topology specifies for each residue which atoms it contains, their charges, bonds etc. The standard file for proteins is "tophcsdx.pro".

The parameters specify target values for bond lengths etc. and energy penalties associated with deviations from the ideal values. The standard file for proteins is "parhcsdx.pro".

If you have other entities, you may need to create new topology and parameter files. XPLO2D can do a major part of this job automatically when you feed it a PDB file of a small molecule (option AUTODICT). This will be used in the chapter "Another cycle".


6.4 The Swedish Inquisition.

(1) Why is it not necessarily a good idea to use individual temperature factors with 2.8 Å data ? How could you test if they are appropriate (i.e., better than grouped Bs) ? How many B-parameters are refined for your current model if you refine individual Bs ? And how many if you refine two Bs per residue (note glycines) ?
(2) What has O done to the temperature factors of mutated and newly inserted residues ?
(3) How many reflections will be used for the actual refinement ? How many parameters (coordinates and Bs) are there in your present model ? What does this mean ?
(4) Why do we need a minimum number of test reflections (i.e., ~500) ? And why is there a recommended maximum ?
(5) Why is random selection of test reflections the worst possible choice ?
(6) Why didn't we write out the reflections from DATAMAN using the XPLOR format specifier (we used RXPLOR instead) ?
(7) How does X-PLOR handle cis- and trans-prolines ?


7 REFINEMENT

In this chapter, we shall work through one round of X-PLOR refinement.


7.1 Generate

The first step in a refinement is to generate a so-called PSF file, as well as a PDB file which contains all atoms (including polar hydrogens). To do this, you need to edit the file "generate.inp". When you're done, submit the job, e.g.:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 unix> /public/bin/xplor_16000 < generate.inp |& tee generate.out
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


7.2 Check

The next step is to determine the relative weight of the X-ray pseudo-energy term compared to the combined geometric and other energetic terms.

Edit the file "check.inp" to do this and submit the job. At the end of the output will be something like "Ideal WA=0.123456E+06". This means that the best weight is 123456; however, in practice it turns out that this weight is often too high; use 1/2 or 1/3 of this value in subsequent refinement jobs.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 X-PLOR> xrefine
 XREFINE>   resolution $lo_res $hi_res gradient
 XREFIN: selected reflections will be sorted by index.
 XRTEST: number of selected reflections    3678
 XRFILL: #scatt.= 1052 #anomalous=  0 #special pos.=  0 occupancies=1
 XFFT: using grid [ 48, 50, 90] and sublattice [ 48( 49), 50( 51), 90]
 TRRESI: ->[TEST SET (TEST=1)] Fobs/Fcalc scale=  17.222 R=       0.340
 TRRESI: ->[WORKING SET (TEST=0)] Fobs/Fcalc scale=  16.979 R=       0.389
 XRGRAD: r.m.s. gradients: empirical energy function=  76.649
                           "amplitude" target= 0.55947E-03
                           "phase" target= 0.00000E+00
 XRGRAD: ideal WA= 0.13700E+06
 XREFINE> end
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Note that this job also prints the initial R-factors and the completeness of the test and work reflections in the selected resolution range.


7.3 Rigid-body refinement

In the first refinement cycle, you may want to optimise the overall orientation of your molecule(s) with rigid-body refinement.

Edit and submit the "rigid.inp" job to do this (in this case it's not necessary).


7.4 Energy minimisation

Edit and submit the "powell.inp" job. At the start:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 TRRESI: ->[TEST SET (TEST=1)] Fobs/Fcalc scale=  17.515 R=       0.323
 TRRESI: ->[WORKING SET (TEST=0)] Fobs/Fcalc scale=  17.323 R=       0.354
 --------------- cycle=     1 ------ stepsize=    0.0000 -----------------------
 | Etotal =33953.486  grad(E)=5481.256   E(BOND)=113.216    E(ANGL)=608.934    |
 | E(DIHE)=644.982    E(IMPR)=78.656     E(VDW )=1490.546   E(ELEC)=-363.305   |
 | E(XREF)=6632.591   E(PVDW)=24753.167  E(PELE)=-5.300                        |
 -------------------------------------------------------------------------------
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Note that Rfree is lower than R, but that is because we the starting model was refined against it, but using a different partitioning of test and work reflections.

At the end:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 TRRESI: ->[TEST SET (TEST=1)] Fobs/Fcalc scale=  17.604 R=       0.315
 TRRESI: ->[WORKING SET (TEST=0)] Fobs/Fcalc scale=  17.883 R=       0.286
 --------------- cycle=   100 ------ stepsize=    0.0001 -----------------------
 | Etotal =4465.241   grad(E)=20.967     E(BOND)=92.023     E(ANGL)=354.931    |
 | E(DIHE)=480.623    E(IMPR)=97.398     E(VDW )=-449.613   E(ELEC)=-373.626   |
 | E(XREF)=4317.535   E(PVDW)=-47.850    E(PELE)=-6.179                        |
 -------------------------------------------------------------------------------
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


7.5 Simulated annealing

Edit and submit the "anneal.inp" job. While this job runs, you can edit the "rfree.csh" file and execute it. This will produce a PostScript plot of the behaviour of R and Rfree as a function of the progress of refinement. You can view the plot with a program like GhostScript or GhostView, or print it on a PostScript printer.

The final R-factors etc. are:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 TRRESI: ->[TEST SET (TEST=1)] Fobs/Fcalc scale=  17.672 R=       0.318
 TRRESI: ->[WORKING SET (TEST=0)] Fobs/Fcalc scale=  18.301 R=       0.245
 --------------- cycle=    50 ------ stepsize=    0.0000 -----------------------
 | Etotal =3185.799   grad(E)=5.270      E(BOND)=63.926     E(ANGL)=326.876    |
 | E(DIHE)=488.317    E(IMPR)=70.329     E(VDW )=-510.423   E(ELEC)=-376.621   |
 | E(XREF)=3188.080   E(PVDW)=-59.017    E(PELE)=-5.670                        |
 -------------------------------------------------------------------------------
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Note that Rfree has increased slightly. In this particular case it need not worry us, since this is our first refinement in which we had to uncouple R and Rfree.


7.6 Temperature-factor refinement

For this you can use either of three input files: "bindiv.inp" to refine restrained individual isotropic temperature factors, "bgroup2.inp" to refine two Bs per residue, or "bgroup1.inp" to refine only one B per residue. Select the most appropriate B-factor model, edit the corresponding file and submit the job. You could try all three to find out which method yields the most appropriate B-factor model, but in this case we will only use "bgroup2.inp". Note that R and Rfree both drop by ~1%, which is a good sign.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 TRRESI: ->[TEST SET (TEST=1)] Fobs/Fcalc scale=  17.422 R=       0.309
 TRRESI: ->[WORKING SET (TEST=0)] Fobs/Fcalc scale=  17.986 R=       0.235
 --------------- cycle=    25 --------------------------------------------------
 | E(XREF)= 0.293E+04  grad(E)= 0.509E-02                                      |
 -------------------------------------------------------------------------------
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


7.7 The Swedish Inquisition

(1) Explain why Rfree can be lower than R at the start of a refinement, as well as after previous refinement using a different partitioning of work and test reflections.
(2) If you run rigid-body refinement, what resolution limits would you use (and why) ? If they are different from those used in the other refinement jobs, would you need to run "check.inp" again to find the most appropriate value of WA for that resolution range ? Why (not) ?
(3) How could you decide what the best B-factor model is for your model and dataset ?


8 POST-REFINEMENT


8.1 Geometry

To analyse the geometry of your model and to find any bad (symmetry) contacts, edit and submit the job "geom.inp".

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 X-PLOR> print threshold=0.05 bonds
 ...
 Number of violations greater    0.050:     1
 RMS deviation=   0.008
 X-PLOR>
 X-PLOR> print threshold=10.0 angles
 ...
 Number of violations greater   10.000:     2
 RMS deviation=   1.460
 X-PLOR>
 X-PLOR> print threshold=60.0 dihedrals
 ...
 Number of violations greater   60.000:     0
 RMS deviation=  27.774
 X-PLOR>
 X-PLOR> print threshold=5.0 impropers
 ...
 Number of violations greater    5.000:     2
 RMS deviation=   1.267
 ...
 X-PLOR> distance from=( not hydrogen ) to=( not hydrogen ) cutoff=2.5 end
 SELRPN:   1052 atoms have been selected out of   1290
 SELRPN:   1052 atoms have been selected out of   1290
 DISTAN: nonbonded distances printed
 atoms "AAAA-75  -THR -OG1 " and "AAAA-77  -ASP -OD1 "            2.4944 A apart
 atoms "AAAA-89  -SER -N   " and "AAAA-89  -SER -O   "            2.4665 A apart
 X-PLOR>
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

So, our model is tightly restrained, as it should be at low resolution.


8.2 New model

With MOLEMAN we can prepare a suitable PDB file for O and CCP4 from the final X-PLOR model:
- read the file and strip hydrogens (command: NO_H)
- list some statistics (commands: STAT and B_Q_)
- assign chain names (commands: AUTO or ASK_)
- add cell and spacegroup information (command: CRYS)
- get correct sidechain atom names (commands: CHECk and CORRect)
- optionally, you can produce all sorts of plots (commands: PLOT, RAMA, CA_Rama, RADIal, CA_D, BALA)
- write the new PDB file (command: WRITe)
Copy the new model to your O directory.

If you have skipped the refinement part of the tutorial, copy the file "m3_gerard.pdb" from the directory gmrp/o/gerard and continue from here as if nothing had happened.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Option ? (READ_pdb_file) no_h

Input PDB file ? (in.pdb) m1_final.pdb Number of lines read : ( 1294) Hydrogens skipped : ( 238) Number of atoms now : ( 1052) Option ? (NO_H) stat Nr of atom numbers in memory : ( 1052)

Item Average St.Dev Min Max ---- ------- ------ --- --- X-coord 18.017 8.058 0.641 35.795 Y-coord 20.647 6.665 3.481 34.742 Z-coord 27.566 9.843 2.677 46.340 B-factor 11.675 9.850 5.000 61.360 Occpncy 1.000 0.000 1.000 1.000

Radius of gyration (A) : 14.36 Sum of masses : 13930.688 Centre-of-mass : 18.00 20.65 27.54 Option ? (STAT) b_q_ Amino acid residue names ? (ALA ARG ASN ASP CYS GLN GLU GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL CPR ASX GLX UNK CYH CSS PCA) Names of ligands/substrates ? (???) Which chain (** = all) ? (**) Include HYDROGEN atoms (Y/N) ? (N)

B & Q statistics for chain : (**)

Atom type Number Average B Maximum B Average Q Protein main chain 549 10.385 61.360 1.000 Protein side chain 503 13.083 57.730 1.000 Protein all atoms 1052 11.675 61.360 1.000 Ligand/substrate 0 0.000 0.000 0.000 Water molecules 0 0.000 0.000 0.000 Other entities 0 0.000 0.000 0.000 All atoms 1052 11.675 61.360 1.000 Generating REMARK records ...

Option ? (B_Q_) auto Generating chain and segids ... New chain A, segid AAAA @ residue 1 Nr of segments found : ( 1) Option ? (AUTO) crys Unit-cell constants ? ( 1.000 1.000 1.000 90.000 90.000 90.000) 45.65 47.56 77.61 90 90 90 Unit-cell volume (A3) : ( 1.685E+05) Spacegroup ? (P 1) P 21 21 21 Value of Z ? ( 1) 4 Option ? (CRYS) corr Nr of atoms : ( 1052) Nr of residues : ( 137)

Error in GLU 16 ... Swapped OE1/2 ... # of PHE checked : 5 # errors : 1 # of TYR checked : 2 # errors : 1 # of ASP checked : 5 # errors : 0 # of GLU checked : 13 # errors : 5 # of ARG checked : 7 # errors : 0 WARNING - any attached hydrogens NOT renamed Option ? (CORR) rama

In the following, hit RETURN if you do NOT want to produce the file the programs asks for

Text file with PHI-PSIs ? ( ) O2D Ramachandran plot file ? ( ) O PHI-PSI datablock file ? ( ) HPGL Ramachandran plot file ? ( ) PostScript Ramachandran plot file ? ( ) m3_rama.ps => XPS_GRAF - GJK (2.2 @ 950530) Opened PostScript file : (m3_rama.ps) Date : (Thu Nov 16 19:38:46 1995) User : (gerard) Program : (MOLEMAN) PostScript POLAR Ramachandran plot file ? ( ) Option ? (RAMA) plot

Make plot file for Bs or Qs ? (B) Filename for per_atom plot ? (atom_b.plt) q Filename for per_residue plot ? (resi_b.plt) m3_aveb.plt

You may plot the following for each residue: R = RMS B/Q over all atoms / average over molecule A = average B/Q for all atoms M = average B/Q for main-chain atoms S = average B/Q for side-chain atoms Option (R/A/M/S) ? (A) Write atom/residue labels to file (Y/N) ? (N) WARNING - if there are hydrogen atoms they will be included ! Option ? (PLOT) radi Plot file ? (b_radial.plt) m3_radb.plt Which chain (2 characters !) ? ( A) Nr of atoms selected (no Hs) : ( 1052) Shell 2.0 - 4.0 A - 6 atoms; <B> = 6.16 A**2 Shell 4.0 - 6.0 A - 22 atoms; <B> = 8.12 A**2 Shell 6.0 - 8.0 A - 45 atoms; <B> = 6.51 A**2 Shell 8.0 - 10.0 A - 103 atoms; <B> = 7.07 A**2 Shell 10.0 - 12.0 A - 164 atoms; <B> = 7.87 A**2 Shell 12.0 - 14.0 A - 191 atoms; <B> = 8.15 A**2 Shell 14.0 - 16.0 A - 209 atoms; <B> = 10.90 A**2 Shell 16.0 - 18.0 A - 171 atoms; <B> = 15.97 A**2 Shell 18.0 - 20.0 A - 85 atoms; <B> = 18.29 A**2 Shell 20.0 - 22.0 A - 40 atoms; <B> = 27.43 A**2 Shell 22.0 - 24.0 A - 11 atoms; <B> = 35.55 A**2 Shell 24.0 - 26.0 A - 3 atoms; <B> = 27.91 A**2 Shell 26.0 - 28.0 A - 2 atoms; <B> = 31.29 A**2 Plot file written Option ? (RADI) write

Output PDB file ? (out.pdb) m3.pdb REMARK at start of file ? (MoleMan PDB file) M3 X-PLOR R 0.235 Rfree 0.309 951116 Copy all REMARK, HEADER etc. cards from input ? (Y) Which chain to write (** = any and all) ? (**) Residue range to write (0 0 = all molecule) ? ( 0 0) You may output All atoms, only Main-chain atoms, a Poly-alanine (Gly intact), a poly-Serine, (Gly and Ala intact) or a poly-Glycine Which option do you want (All/M/P/S/G) ? (A) Write HYDROGEN atoms (Y/N) ? (N) Force consecutive atom numbering (Y/N) ? (Y) X-PLOR needs OT1 and OT2, but O hates them If your file contains OT1/2 you may either keep them, or replace them by O/OXT Write X-PLOR OT1/2 ? (Y/N) ? (N) y Cell : ( 45.650 47.560 77.610 90.000 90.000 90.000) CCP4 requires CRYST, SCALE and ORIGX cards X-PLOR does not like them at all Therefore: reply Y for CCP4 and N for X-PLOR : Write CRYST, SCALE, ORIGX cards (Y/N) ? (Y) Nr of atoms written : ( 1052) Option ? (WRITe) quit ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

The plot files can be converted into PostScript format with the program O2D or the script "OMAC/o2dps".


8.3 ProCheck

If you want, you can also run ProCheck again at this stage. It will probably tell you once again that this is a fantastic model. However, it also said this about the initial model, which you know was rather poor. The most useful output from ProCheck is the Ramachandran plot and the distribution of chi1/chi2 angles (similar information as with the O RSC_fit command).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
        R A M A C H A N D R A N   P L O T   S T A T I S T I C S

Residues in most favoured regions [A,B,L] 105 84.0% Residues in additional allowed regions [a,b,l,p] 19 15.2% Residues in generously allowed regions [~a,~b,~l,~p] 1 0.8% Residues in disallowed regions [XX] 0 0.0% ---- ------ Number of non-glycine and non-proline residues 125 100.0%

Number of end-residues (excl. Gly and Pro) 1

Number of glycine residues 7 Number of proline residues 4 ---- Total number of residues 137 ... S T E R E O C H E M I S T R Y O F M A I N - C H A I N

Comparison values No. of No. of Parameter Typical Band band widths Stereochemical parameter data pts value value width from mean ------------------------ -------- ----- ----- ----- --------- a. %-tage residues in A, B, L 125 84.0 70.9 10.0 1.3 BETTER b. Omega angle st dev 136 1.3 6.0 3.0 -1.6 BETTER c. Bad contacts / 100 residues 1 0.7 15.8 10.0 -1.5 BETTER d. Zeta angle st dev 130 1.3 3.1 1.6 -1.1 BETTER e. H-bond energy st dev 90 0.8 1.0 0.2 -1.1 BETTER f. Overall G-factor 137 0.2 -0.7 0.3 2.9 BETTER ... S T E R E O C H E M I S T R Y O F S I D E - C H A I N

Comparison values No. of No. of Parameter Typical Band band widths Stereochemical parameter data pts value value width from mean ------------------------ -------- ----- ----- ----- --------- a. Chi-1 gauche minus st dev 28 11.5 25.4 6.5 -2.1 BETTER b. Chi-1 trans st dev 37 12.6 24.9 5.3 -2.3 BETTER c. Chi-1 gauche plus st dev 43 14.3 23.5 4.9 -1.9 BETTER d. Chi-1 pooled st dev 108 13.1 24.3 4.8 -2.3 BETTER e. Chi-2 trans st dev 37 12.9 24.7 5.0 -2.4 BETTER ... G - F A C T O R S

Average Parameter Score Score --------- ----- ----- Dihedral angles:- Phi-psi distribution -0.43 Chi1-chi2 distribution -0.18 Chi1 only -0.30 Chi3 & chi4 -0.35 Omega 0.57 ------ -0.05 ===== Main-chain covalent forces:- Main-chain bond lengths 0.64 Main-chain bond angles 0.37 ------ 0.49 =====

OVERALL AVERAGE 0.17 ===== ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----


8.4 What happened ?

To find out how much and where X-PLOR refinement has changed the model, you can use some of the tools in LSQMAN. For example, the RMSD on CA atoms:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m2 m2a.pdb
 Old chain |A| becomes chain A
 Nr of lines read from file : (       1053)
 Nr of atoms in molecule    : (       1052)
 Nr of chains or models     : (          1)
 Stripped hydrogen atoms    : (          0)
 LSQMAN > re m3 m3.pdb
 Cell : (  45.650   47.560   77.610   90.000   90.000   90.000)
 Old chain |A| becomes chain A
 Nr of lines read from file : (       1080)
 Nr of atoms in molecule    : (       1052)
 Nr of chains or models     : (          1)
 Stripped hydrogen atoms    : (          0)
 LSQMAN > ex m2 a1-199 m3 a1
 Explicit fit of M2 A1-199
 And             M3 A1
 Atom types     | CA |
 B-factor range used: -1000.00 - 10000.00 A2
 Nr of atoms to match  : (        137)
 Nr skipped (B limits) : (          0)

The 137 atoms have an RMS distance of 0.413 A RMS delta B = 5.909 A2 Corr. coeff. = 0.6545 Rotation : 0.999993 -0.003368 -0.001592 0.003365 0.999993 -0.001814 0.001598 0.001808 0.999997 Translation : -0.117 0.098 0.001 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

The RMSD on all atoms:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > at all
 Nr of atom types : (       1)
 Type : (ALL)
 LSQMAN > ex m2 a1-199 m3 a1
 Explicit fit of M2 A1-199
 And             M3 A1
 Atom types     |ALL |
 B-factor range used: -1000.00 - 10000.00 A2
 Nr of atoms to match  : (        851)
 Nr skipped (B limits) : (          0)

The 851 atoms have an RMS distance of 0.693 A RMS delta B = 6.952 A2 Corr. coeff. = 0.6932 Rotation : 0.999998 -0.002084 0.000292 0.002084 0.999997 -0.000809 -0.000290 0.000809 1.000000 Translation : -0.047 0.077 -0.020 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

A plot of changes in the phi and psi dihedral angles:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > phipsi m2 a1-199 m3 a1 m2_m3_phipsi.plt
 Delta-Phi/Delta-Psi plot
 Plot of M2 A1-199
 And     M3 A1
 Nr of residues matched : (        137)
 RMS delta PHI       : (  24.307)
 Average |delta PHI| : (  16.926)
 Nr |delta PHI| > 10 : (      78)
 Percentage          : (  56.934)
 RMS delta PSI       : (  24.066)
 Average |delta PSI| : (  15.955)
 Nr |delta PSI| > 10 : (      78)
 Percentage          : (  56.934)
 Plot file written
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

A plot of the distances between equivalent CA atoms before and after refinement:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > at ca
 Nr of atom types : (       1)
 LSQMAN > di m2 a1-199 m3 a1 m2_m3_ca_dist.plt
 Central-atom distance plot
 Central atom type : ( CA)
 Plot of M2 A1-199
 And     M3 A1
 Nr of residues matched : (        137)
 Average distance : (   0.346)
 Minimum distance : (   0.015)
 Maximum distance : (   1.542)
 Plot file written
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

If you want to see a list of shifts for each CA atom, use the IMprove command:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > impr m2 * m3 *
 Improve fit of  M2 *
 And             M3 *
 Atom type      | CA |
 Nr of atoms in mol1 : (        137)
 Nr of atoms in mol2 : (        137)
 ...
 Fragment PRO-A   1 <===> PRO-A   1 @     0.35 A *
          ASN-A   2 <===> ASN-A   2 @     0.39 A *
 ...
          LEU-A  99 <===> LEU-A  99 @     0.29 A *
          LEU-A 100 <===> LEU-A 100 @     0.11 A *
          ALA-A 101 <===> ALA-A 101 @     0.35 A *
          ALA-A 102 <===> ALA-A 102 @     0.56 A *
          ALA-A 103 <===> ALA-A 103 @     0.80 A *
          ALA-A 104 <===> ALA-A 104 @     0.25 A *
          PRO-A 105 <===> PRO-A 105 @     0.40 A *
          ALA-A 106 <===> ALA-A 106 @     0.04 A *
          THR-A 107 <===> THR-A 107 @     0.11 A *
 ...
          LEU-A 113 <===> LEU-A 113 @     0.71 A *
          THR-A 114 <===> THR-A 114 @     0.61 A *
          ASN-A 115 <===> ASN-A 115 @     1.52 A *
          ASP-A 116 <===> ASP-A 116 @     0.86 A *
          GLY-A 117 <===> GLY-A 117 @     0.36 A *
          GLU-A 118 <===> GLU-A 118 @     0.36 A *
          LEU-A 119 <===> LEU-A 119 @     0.21 A *
 ...
          ARG-A 136 <===> ARG-A 136 @     0.62 A *
          GLU-A 137 <===> GLU-A 137 @     1.38 A *

Nr of residues in mol1 : ( 137) Nr of residues in mol2 : ( 137) Nr of matched residues : ( 137) Nr of identical residues : ( 137) % identical of matched : ( 100.000) % matched of mol1 : ( 100.000) % identical of mol1 : ( 100.000) % matched of mol2 : ( 100.000) % identical of mol2 : ( 100.000)

LSQMAN > sh m2 m3 Operator bringing : (M3) on top of : (M2) Last command was : (IMPR M2 * M3 *) The 137 atoms have an RMS distance of 0.413 A SI = RMS * Nmin / Nmatch = 0.41300 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} = 0.70772 MC = Maiorov-Crippen RHO (0-2) = 0.02895 RMS delta B for matched atoms = 5.909 A2 Corr. coefficient matched atom Bs = 0.654 Rotation : 0.99999309 -0.00336774 -0.00159196 0.00336485 0.99999267 -0.00181351 0.00159806 0.00180814 0.99999708 Translation : -0.1170 0.0981 0.0010

Nr of NCS operators : 1

NCSOP 1 = 0.9999931 0.0033648 0.0015981 -0.117 -0.0033677 0.9999927 0.0018081 0.098 -0.0015920 -0.0018135 0.9999971 0.001 Determinant of rotation matrix 1.000000 Column-vector products (12,13,23) 0.000000 0.000000 0.000000 Crowther Alpha Beta Gamma 0.00000 0.00000 -0.19296 Spherical polars Omega Phi Chi 0.06853 -999.90002 -0.19296 Direction cosines of rotation axis 1.00000 1.00000 1.00000 Dave Smith -0.10360 90.09122 -0.19279 Rotation angle 0.237388 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----


8.5 Maps

Maps can be calculated with many programs (including X-PLOR). We will use the CCP4 package here, but you can use any program you like.

Go to the gmrp/ccp4 directory. There is a command file for calculating 2Fo-Fc, Fo-Fc and 3Fo-2Fc maps. Before you can calculate the maps, you have to generate a file which contains the reflections in CCP4 format (MTZ file). Edit and execute the command file "mkmtz.com" to do just that. Subsequently, edit the command file "makemap.com" and execute it to calculate the maps.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
<19 capo.bmc.uu.se gmrp/ccp4> makemap.com
SFALL  - calculate structure factors
Overall Reliability index is 0.2648
RSTATS - scale Fobs and Fcalc
Overall Totals: 3816 0.269 0.306 331.818 322.953 121.424 0.027 121.424 0.833
FFT 1  - calculate 2Fo-Fc map
Rms deviation from mean density ................. 18.60459
FFT 2  - calculate Fo-Fc map
Rms deviation from mean density ................. 15.99642
FFT 3  - calculate 3Fo-2Fc map
Rms deviation from mean density ................. 19.70206
EXTEND - cut out 2Fo-Fc map around molecule
EXTEND - cut out Fo-Fc map around molecule
EXTEND - cut out 3Fo-2Fc map around molecule
MAPMAN - mappage 2Fo-Fc, Fo-Fc and 3Fo-2Fc maps around A molecule
... Toodle pip ...

real 6.2 user 2.0 sys 0.7 120 -rw-r--r-- 1 gerard 108528 Nov 16 20:27 /nfs/scr_uu1/gerard/scratch/m3.R 1912 -rw-r--r-- 1 gerard 1946144 Nov 16 20:27 /nfs/scr_uu1/gerard/scratch/m3_2fofc.E 1912 -rw-r--r-- 1 gerard 1946144 Nov 16 20:27 /nfs/scr_uu1/gerard/scratch/m3_fofc.E 1912 -rw-r--r-- 1 gerard 1946144 Nov 16 20:28 /nfs/scr_uu1/gerard/scratch/m3_3fo2fc.E 2088 -rw-r--r-- 1 gerard 2129288 Nov 16 20:28 /nfs/scr_uu1/gerard/scratch/m3_2fofc.xE 2088 -rw-r--r-- 1 gerard 2129288 Nov 16 20:28 /nfs/scr_uu1/gerard/scratch/m3_fofc.xE 2088 -rw-r--r-- 1 gerard 2129288 Nov 16 20:28 /nfs/scr_uu1/gerard/scratch/m3_3fo2fc.xE 616 -rw-r--r-- 1 gerard 614912 Nov 16 20:28 /nfs/scr_uu1/gerard/scratch/m3_2fofc.map 616 -rw-r--r-- 1 gerard 614912 Nov 16 20:28 /nfs/scr_uu1/gerard/scratch/m3_fofc.map 616 -rw-r--r-- 1 gerard 614912 Nov 16 20:28 /nfs/scr_uu1/gerard/scratch/m3_3fo2fc.map 18.579u 3.475s 0:38.57 57.1% 1+16k 283+2066io 5379pf+0w <20 capo.bmc.uu.se gmrp/ccp4> cp /nfs/scr_uu1/gerard/scratch/m3_2fofc.map ../o <20 capo.bmc.uu.se gmrp/ccp4> cp /nfs/scr_uu1/gerard/scratch/m3_fofc.map ../o ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Create a new subdirectory gmrp/o/m1m2, move or copy all maps and models etc. of the first two models to this directory and compress the files (include the "maps" and "symmy" macros). Update the "maps" and "symmy" macros in your O directory. If you didn't calculate the maps, move or copy them from gmrp/o/gerard ("m3*.map").


8.6 The Swedish Inquisition

(1) Why are the Ramachandran and chi1/chi2 plots in ProCheck useful, but not so much the pure geometrical information (bond lengths etc.) ?
(2) Compare the Ramachandran plot of the initial model with that of the rebuilt model and that of the refined model. Discuss the differences. Relate your observations to the delta-phi, delta-psi plot of models M2 and M3.
(3) At what level are you going to contour the Fo-Fc map ? Why is the "sigma-level" of this map meaningless, unless the map is calculated on an absolute scale ? What is the unit of electron density on an absolute scale ?
(4) Why have we still not picked any peaks that might belong to solvent molecules ?
(5) What is a radial B-factor plot ? What should it look like ?
(6) Why is it a good idea to restrain geometry tightly at worse than atomic resolution ?


9 ANOTHER CYCLE


9.1 Quality control and rebuilding

Read your new model M3 into O and apply the usual quality checks to it. Then run OOPS again. Since the model is still crude and incomplete, tell OOPS to generate macros for all residues again. Compare the quality of model M3 to that of the starting model, M1. Set up the symmetry for model M3 inside O.

Check out the density in some of the places where you built new residues or did a substantial rebuild in the first cycle. Note how much the map has improved. Also, there is fairly good density for the ligand now (with one small break). Of course, this dataset is not a "typical" 2.8 Å dataset, since it was derived from a 1.8 Å (synchrotron) dataset by applying a resolution cut-off.

Now rebuild your model using the OOPS macros. Try to put in as many correct sidechains as you can (if they have reasonable density). If you didn't build the entire missing loop, try to do it in this round. Don't forget to save your model regularly (as "m4.pdb"), and to make a backup prior to any major (re)building.

Now it's also time to start paying attention to more detailed issues. For instance, the sidechain of Asn A2 forms a hydrogen bond to the OG of Ser A4. This makes it most likely that the involved atom in the Asn sidechain is OD1. Also check Gln A45 with an eye to hydrogen bonding potential.

Did you notice a negative difference density peak for part of the sidechain of Lys A38 ? If so, rebuild the sidechain of this residue (or cut it back to an alanine).

What do you make of the peptide of Gly A47 ?

How did you (re)build Lys A101 ?

Did you change Thr A110 in any way ? Why ? And Leu A113 ?

When you have finished your rebuild, compare your model M4 to the one in file "gmrp/o/gerard/m4_gerard.pdb". Check the differences using the maps. Save your model and your O database.


9.2 Adding a hetero-entity

Normally, I would wait for the next cycle before putting in the ligand, but since this a tutorial we shall put it in now. Locate the density (two big blobs) for the retinoic acid and jot down the approximate coordinates of the centre-of-gravity of the ligand (a rough estimate suffices; this merely saves you a lot of Move_zone-ing later on). Now exit from O (or use a separate Unix window). Hint: to find coordinates, use Move_atom to place an atom in the spot of interest, then click on the atom to get the coordinates displayed, and hit No to cancel the move.

It is most convenient if you can get a set of starting coordinates for a hetero-entity from elsewhere. If you have access to the Cambridge database of small-molecule structures, that is the first place to look. If not, check out the collection of hetero-entities (extracted from PDB entries) in file "OMAC/hetero.pdb".

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 % 127 gerard rigel 23:01:57 gerard/omac > grep retinoic omac/hetero.pdb
COMPND RETINOIC ACID
COMPND RETINOIC ACID (ALL-TRANS)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Extract the relevant portion of the file, e.g. using an editor. Store it in a file called "ret.pdb" (use the first occurrence, i.e. NOT the all-trans entry since the latter was taken from the CRABPII structure). The file may look as follows:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
COMPND RETINOIC ACID
REMARK Extracted from PDB file 1tyr.pdb
REMARK Formula C20 H28 O2
REMARK Nr of non-hydrogen atoms 22
REMARK Residue type REA
REMARK Residue name 898
REMARK   2 RESOLUTION. 1.8  ANGSTROMS.                                  1TYR
REMARK Compound also present in : 1FEM 1EPB 1CBR
HETATM    1  C1  REA   898      -1.756   2.109  -3.389  1.00 20.00      1TYR
HETATM    2  C2  REA   898      -1.817   2.094  -4.924  1.00 20.00      1TYR
HETATM    3  C3  REA   898      -1.052   1.062  -5.564  1.00 20.00      1TYR
HETATM    4  C4  REA   898      -1.516  -0.335  -5.191  1.00 20.00      1TYR
HETATM    5  C5  REA   898      -1.616  -0.496  -3.690  1.00 20.00      1TYR
HETATM    6  C6  REA   898      -1.733   0.597  -2.837  1.00 20.00      1TYR
HETATM    7  C7  REA   898      -1.998   0.479  -1.370  1.00 20.00      1TYR
HETATM    8  C8  REA   898      -1.062   0.135  -0.267  1.00 20.00      1TYR
HETATM    9  C9  REA   898      -1.268  -0.041   1.143  1.00 20.00      1TYR
HETATM   10  C10 REA   898      -0.170  -0.394   1.867  1.00 20.00      1TYR
HETATM   11  C11 REA   898      -0.103  -0.669   3.288  1.00 20.00      1TYR
HETATM   12  C12 REA   898       1.022  -1.145   3.887  1.00 20.00      1TYR
HETATM   13  C13 REA   898       2.245  -1.499   3.180  1.00 20.00      1TYR
HETATM   14  C14 REA   898       3.305  -1.965   3.889  1.00 20.00      1TYR
HETATM   15  C15 REA   898       4.061  -1.105   4.787  1.00 20.00      1TYR
HETATM   16  C16 REA   898      -2.991   2.890  -2.877  1.00 20.00      1TYR
HETATM   17  C17 REA   898      -0.489   2.849  -2.945  1.00 20.00      1TYR
HETATM   18  C18 REA   898      -1.448  -1.963  -3.286  1.00 20.00      1TYR
HETATM   19  C19 REA   898      -2.704   0.107   1.687  1.00 20.00      1TYR
HETATM   20  C20 REA   898       2.202  -1.542   1.606  1.00 20.00      1TYR
HETATM   21  O1  REA   898       3.735  -0.926   6.159  1.00 20.00      1TYR
HETATM   22  O2  REA   898       5.145  -0.238   4.839  1.00 20.00      1TYR
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Now we shall use MOLEMAN to do a few things. First we will translate the molecule such that its centre-of-gravity is approximately in the right position:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Option ? (READ_pdb_file)

Input PDB file ? (in.pdb) ret.pdb Number of lines read : ( 29) Number of atoms now : ( 22) Option ? (READ_pdb_file) trans 1 = Cartesian, 2 = Fractional. Option ? ( 1) Translation vector ? ( 0.000 0.000 0.000) 22 25 21 Nr of atoms translated : ( 22) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Now rename the residue and write the new PDB file:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Option ? (TRANs) resi
 First NEW residue number ? (       1) 200
 Last residue number : (        200)
 Option ? (RESI) chain
 Chain label (2 characters) ? ( )  B
 Residue range to apply (0 0 = all molecule) ? (       0        0)
 Nr of chain labels updated : (         22)
 Option ? (CHAIn) writ

Output PDB file ? (out.pdb) ret.pdb ERROR --- XOPXNA - error # 126 while opening NEW file : ret.pdb OPEN : (UNIT= 12 STATUS=NEW CAR_CONTROL=LIST FORM=FORMATTED ACCESS=SEQUENTIAL) Error : (Connection timed out) Open file as OLD (Y/N) ? (N) y REMARK at start of file ? (MoleMan PDB file) Copy all REMARK, HEADER etc. cards from input ? (Y) Which chain to write (** = any and all) ? (**) Residue range to write (0 0 = all molecule) ? ( 0 0) You may output All atoms, only Main-chain atoms, a Poly-alanine (Gly intact), a poly-Serine, (Gly and Ala intact) or a poly-Glycine Which option do you want (All/M/P/S/G) ? (A) Write HYDROGEN atoms (Y/N) ? (N) Force consecutive atom numbering (Y/N) ? (Y) X-PLOR needs OT1 and OT2, but O hates them If your file contains OT1/2 you may either keep them, or replace them by O/OXT Write X-PLOR OT1/2 ? (Y/N) ? (N) Cell : ( 1.000 1.000 1.000 90.000 90.000 90.000) CCP4 requires CRYST, SCALE and ORIGX cards X-PLOR does not like them at all Therefore: reply Y for CCP4 and N for X-PLOR : Write CRYST, SCALE, ORIGX cards (Y/N) ? (Y) n Nr of atoms written : ( 22) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Next, generate some datablocks for use with O:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Option ? (?) torsi
 Which residue number ? (       1) 200
 Cut-off distance for bonded atoms ? (   2.000) 1.8

1 C1 REA 200 20.244 27.109 17.611 1.00 20.00 1TYR 2 C2 REA B 200 20.183 27.094 16.076 1.00 20.00 1TYR ... 22 O2 REA B 200 27.145 24.762 25.839 1.00 20.00 1TYR

Nr of atoms found : ( 22) Residue type : (REA) Atom types : ( C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 O1 O2) Datablock file ? (torsion_rea.dat) Nr of bonds : ( 22)

DIHEDRAL C6 200 C1 200 C2 200 C3 200 -37.55 Skip -> ring torsion ... DIHEDRAL O2 200 C15 200 C14 200 C13 200 -89.55 Affected atoms : ( C13 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C16 C17 C18 C19 C20) Skip -> too many affected atoms

Nr of unique rotatable torsions : ( 9) Nr of lines written : ( 14) Torsion file written (append to torsion.o) Option ? (TORSi) rsfit Which residue number ? ( 200)

1 C1 REA 200 20.244 27.109 17.611 1.00 20.00 1TYR ... 22 O2 REA B 200 27.145 24.762 25.839 1.00 20.00 1TYR

Nr of atoms found : ( 22) Residue type : (REA) Atom types : ( C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 O1 O2) Datablock file ? (rsfit_rea.odb) Datablock name : (rsfit_REA) Datablock written ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

The torsion file must be appended to your normal torsion file, and the new file must then be read into O (don't forget to update the datablock header !). The RS-fit datablock can simply be read into O directly.

Re-start O and read the two files. Also read the retinoic acid and display it.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > read ./torsion.o
  O > read rsfit_rea.odb
 Heap>  Created by MOLEMAN V. 951110/7.1.1 at Thu Nov 16 23:10:43 1995 for user g
 Heap>
 Heap>  RS-fit datablock for REA
 Heap>
  O > s_a_i ret.pdb rea mol rea zo ; end ce_zo rea b200
 Sam> File type is PDB
 Sam>  Database compressed.
 Sam> Space for    131465 atoms
 Sam> Space for     10000 residues
 Sam> Molecule REA contained 1 residues and 22 atoms
 Sam> Centre of gravity updated for     1    1
 As4> No object defined.
 As4> REA   B200  B200  REA
 As4> Centering on zone from B200 to B200
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Draw the maps and use the Move_zone command to put the ligand in the correct position and orientation (roughly).

Since the molecule we use is NOT all-trans-retinoic acid, we need to operate in several passes:
- use Move_zone to orient the ring correctly inside the density;
- use Tor_residue (TOR1) to orient the first part of the tail approximately correctly;
- use Move_zone again to improve the fit of the ring plus the first part of the tail;
- use TOR3 to try and orient the next part of the tail;
- you will notice that this fails: you need an extra torsion angle to rotate around the C11-C12 bond; add this to your torsion file (alternatively, you can use the Tor_general command);
- use TOR7 and TOR9 to move the carboxylate tail in approximately the correct position.

Now, use Tor_general to make the tail completely flat (except for the carboxylate relative to the tail). Then use TOR1 again to fit the tail. Use RSR_rigid to improve the overall fit and use TOR9 to adjust the carboxylate. Save the new coordinates. What is the real-space R-factor for your ligand ?

If you find this too difficult, use the all-trans-retinoic acid from the "OMAC/hetero.pdb" file, or use "gmrp/o/gerard/ret_gerard.pdb" (which had an RSR-value of 0.306).


9.3 Topology and parameter files

Before we can use the ligand in X-PLOR refinement, we need to generate topology and parameter files. Use XPLO2D (option AUTO) to do this (you can also do it by hand, of course). If you use XPLO2D, you can help the program (and yourself) by editing your ligand's PDB file and:
- putting the number of hydrogen atoms attached to each atom in the occupancy column;
- putting an integer number in the B-factor column which is identical for equivalent atom types (e.g., the 5 methyl carbons), but different for different atom types.

You need to edit the topology file to correct the masses (accounting for implicit hydrogen atoms), assign charges, etc. There may also be errors in the sense that non-equivalent atoms have been assigned to equivalent atom types. Check this file carefully !

The parameter files may also need to be edited (to reflect changed atom types, for instance).

If you get stuck, copy the files "gmrp/o/gerard/rea.par", "gmrp/o/gerard/rea.top" and "gmrp/o/gerard/rea_min.inp" (these are the ones used in the original work; they were generated manually).

Run the minimisation job (produced by XPLO2D) to check if nothing strange happens. Whatever your ligand looks like after this minimisation, is what X-PLOR will want to make it look like once you include it (and the X-ray data) in your refinement. It is rather silly to waste 12 hours of CPU time on a slowcool, only to find that there was an error in the dictionary files !


9.4 Preparations for refinement

Use MOLEMAN to make the PDB files for your next model, to limit temperature factors etc.

Now, let's pretend that we have just collected a 1.8 Å dataset. Convert the file "crabp2_1.8a.hkl" in the reflection directory to X-PLOR format and generate a suitable test set of reflections for Rfree calculations. Also change the name of the reflection file in "gmrp/xplor/reflxns.xplor". Note: since you will do more simulated annealing, you don't have to worry about selecting the same test reflections in the low resolution range.

Change the "gmrp/xplor/parameters.xplor" file so it includes the parameter file for your ligand.

Change the "generate.inp" file in the X-PLOR directory as follows:
- include the topology file for your ligand;
- read the ligand (instead of only the protein);
- read the latest model's PDB files.

Parts of this file could now look as follows:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 ...
 topology
   @tophcsdx.pro
   @rea.top
 end
 ...
 { the ligand }
 segment  name="BBBB" 				{ *** EDIT ME *** }
   chain
     coordinates @m4b.pdb 			{ *** EDIT ME *** }
   end
 end
 coordinates @m4b.pdb 				{ *** EDIT ME *** }
 ...
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


9.5 High-resolution refinement

This refinement cycle we shall use the 1.8 Å dataset. Don't forget to change the resolution limits in the various input files.

Go through the same refinement steps as before, but refine individual temperature factors at the end (you could have smoothed the temperature factors first in MOLEMAN). The steps are:
- generate;
- check;
- optional: rigid-body refinement (e.g., of the ligand);
- powell minimisation;
- simulated annealing;
- temperature-factor refinement.

For comparison with your own results, I found:
- WA = 150,000;
- initial R 0.372, Rfree 0.354;
- after Powell, R 0.311, Rfree 0.331;
- after simulated annealing, R 0.296, Rfree 0.325;
- after temperature-factor refinement, R 0.262, Rfree 0.298.

You could try another slowcool after this cycle, but in this case the model hardly improves (R 0.257 and Rfree 0.294 after a second slowcool and B-factor refinement).


9.6 Post-refinement

Go through the usual motions of generating a CCP4-friendly PDB file, calculating new maps (you probably want to use a finer grid with your 1.8 Å data), comparing your model before and after refinement and running the usual quality checks. Discuss your findings.

If you skipped the refinement part, get the new model from file "gmrp/o/gerard/m5_gerard.pdb".

What is the RMSD of the 22 ligand atoms before and after refinement ? Which CA atoms have moved the most during the refinement ? Which region in the protein has the highest temperature factors ? What does ProCheck think of your 1.8 Å model (compare with and contrast to the previous 2.8 Å model) ?

Before you calculate maps, don't forget to make an MTZ file for the 1.8 Å dataset. Also, when you calculate a new grid, make sure that the grid sizes do not have prime factors exceeding 19 (for the FFT; on SGIs, use the factor command to help you).

If you don't want to calculate the maps, use the ones in directory gmrp/o/gerard instead ("m5*.map"). The sigma levels in the unit cell are 9.9 for the 2Fo-Fc map and 12.8 for the Fo-Fc map.

When you have calculated the maps, edit and execute the "gmrp/ccp4/pickwater.com" command file to generate potential water molecules. If you didn't calculate the maps, use the file "gmrp/o/gerard/m5_waters.pdb" instead.

Before you calculate RS-fit values, you have to change the constants A0 and C (with the RSR_setup command). Find suitable values yourself.

Check the density for the ligand. Note that X-PLOR has flipped the ring of the ligand into a clearly wrong orientation (if you used the provided files). This can be due to an error in the topology file or may just be an artefact of the high-temperature simulated annealing refinement. Check if there is an error in the dictionaries.

Check the density for some of the previous "trouble spots" and compare with the previous 2.8 Å map. Discuss the differences between the two maps.


9.7 Rebuilding

Run OOPS and rebuild your model as usual. In OOPS, you could now also use the option to compare the current to the previous model (i.e., the one after the previous rebuild, M4). This will give you lots of information concerning the type of changes that have occurred during refinement and where they have occurred.

Try to make the protein model as complete as possible (missing sidechains and, perhaps, bits of mainchain).

Can you find a better fit for the sidechain of Arg A11 (you may want to use Move_fragm in this case) ?

Compare the density for some of the residues that did not have rotamer conformations (but should have had them) in the original model, e.g. the tryptophan and residues A18, A22, A84. What is your conclusion ?

How did you orient the sidechain N/O of Asn A25 ?

What can you say about the sidechain of Val A26 ? And for Ser A37 ? And for Thr A122 ?

Did you rebuild Arg A29 ? Why (not) and how ?

Can you build Glu A70 and Lys A82 correctly now ? Check in the previous map why this was very hard to see earlier on.

Did you rebuild the peptide of Asp A126 ? If not, check both maps in this place and re-consider.

Compare your model with the one in file "gmr/o/gerard/m6_gerard.pdb".


9.8 Adding waters

You can use the O commands Water_init, Water_add and Water_pekpik to add waters, but I prefer to do it "by hand". Read in the waters picked by pickwater.com, and colour them lime-green, for instance. Then rename them and centre on the O1 atom of the first water. Write a little O macro which will contour the maps and perhaps draw a sphere object for the protein. If you don't want to keep the water, add its residue name to a list in your electronic notebook, and at the end use Mutate_delete to remove all unwanted waters. Use "Centre_next ;; O1" to go from one water to the next (this command is on your menu). If you decide to keep a water, improve its fit with the density using the RSR_rigid command. If RSR_rigid applies a very large shift, this means that the water was not at a peak in the 2Fo-Fc map; in this case one would usually reject the water.

When assessing how credible (or fully occupied or well-ordered) a potential solvent site is, use the following criteria:
- is the site a peak in both the Fo-Fc and the 2Fo-Fc density ?
- are the Fo-Fc and 2Fo-Fc density similar in shape (spherical) ?
- does the water have reasonable hydrogen-bonding potential (use the Hbonds_all or Neighbour_at command) ?
- could the density actually belong to a sidechain (either an unmodeled one or one which has a wrong conformation at present) ?
- could it be something else but water ? (Take neighbouring atoms into account, as well as what you know about the crystallisation soup, and the peak volume relative to other, "normal" waters.)
- does it stay in approximately the same position after RSR_rigid ?
- is the peak close to atoms of a symmetry-related molecule ? (If so, reject it.)

If your Fo-Fc map is not on an absolute scale, its sigma level is meaningless. The following is a handy trick to find an appropriate contour and picking level. Leave out one or two well-defined oxygens (e.g. from well-resolved carbonyl groups) in the map calculations. Contour the difference map at a level which gives similar Fo-Fc as 2Fo-Fc "blobs" for the missing atom(s). Use a peak-picking level at which the difference density for the missing atom(s) is only just visible.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  O > s_a_i waters.pdb wat
 Sam> File type is PDB
 Sam>  Database compressed.
 Sam> Space for    128864 atoms
 Sam> Space for     10000 residues
 Sam> Molecule WAT contained 58 residues and 58 atoms
 Sam> Centre of gravity updated for     1   58
  O > mol wat pai_zone
 Paint> What molecule [WAT   ]:
 Paint> Residue range [all molecule]:
 Paint> Colour? [blue]: lime_green
  O > zo ; end
  O > sam_rena
 Sam> What molecule [WAT   ]:
 Sam> Residue range [all molecule]:
 Sam> NEW name of FIRST residue [      ]: C300
  O > ce_at wat c300 o1
  O > @watstuff
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


9.9 Et cetera

Now it's up to you to finish the refinement if you like. Don't forget the following when you include the waters:
- reset occupancies to 1.0 and Bs to 20.0 (these columns contain sigma and peak height after "pickwater.com") with MOLEMAN;
- include the solvent topology and parameter files in your X-PLOR jobs (files "gmrp/xplor/toph19.sol" and "gmrp/xplor/param19.sol", respectively);
- remove any waters which give rise to (symmetry) clashes;
- if you continue to use simulated annealing, use harmonic restraints for the water oxygen atoms, e.g.:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 { set up harmonic restraints on waters }
 vector do (harmonic= 0.0) ( all )
 vector do (harmonic=20.0) ( segid=WATR and not hydrogen ) 	{ *** EDIT ME *** }

{ store reference coordinates } vector do (refx=x) (all) vector do (refy=y) (all) vector do (refz=z) (all)

{ include the harmonic restraint energy term } flags include harm ? end ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

When you're all done, compare your final model with PDB entry 1CBS (also available as file "gmrp/o/gerard/1cbs.pdb"). Note that 1CBS still contains a couple of sidechains which could have been modeled better !!!


10 MISCELLANEOUS REFINEMENT ISSUES


10.1 Data

Some well-intended advice with respect to the choice of data to use in refinement:
- when you process your data, decide on a realistic resolution cut-off and make sure that your sigmas are realistic estimates of the true standard deviations; you then don't need to use a sigma-cut-off during refinement
- the suggested low-resolution cut-off without bulk-solvent modelling is ~8 A (not 6 or 5 A !); if you include a bulk-solvent model, you can include all low-resolution data
- remove any reflections which have been flagged as unobserved (behind the beamstop, overflowed etc.) but kept by the processing program (e.g., with zero amplitude or negative sigma) before you start refinement; in that way you don't have to worry about amplitude cut-offs
- at the end of the refinement, do some Powell minimisation and temperature-factor refinement using all data (i.e., no Rfree)


10.2 Difference refinement

If you want to try out difference refinement with X-PLOR (see: T.C. Terwilliger & J. Berendzen, Acta Cryst. D51, 609-618 (1995)), you can use DATAMAN to produce a set of modified Fobs using the following recipe:

(1) run the following job in X-PLOR:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 remarks Generate Fobs(nati) and Fcalc(nati) for use in difference refinement
 remarks T.C. Terwilliger & J. Berendzen, ACta Cryst D51, 609-618 (1995)
 @parameters.xplor
 structure   @m1_gen.psf end
 coordinates @m1_mb_mbx.pdb
 xrefine
  @crystal.xplor
  @scatter.xplor
  nreflections=100000
  reflection @../hkl/cbh2.xplor end
  resolution 8.0 1.8
  method=FFT
  fft memory=1000000 end
  tolerance=0.0 lookup=false
  mbins 20
  update-fcalc
  print r-factor
  do scale (fcalc=fobs)
  write reflections fobs sigma
        output=../../umb/hkl/nati_fobs.xplor end
  write reflections fcalc sigma
        output=../../umb/hkl/nati_fcalc.xplor end
 end
 stop
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

(2) then change FCALC to FOBS in the output FCALC reflection file:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 unix> sed -e 's/FCALC/FOBS/' nati_fcalc.xplor > q ; mv q nati_fcalc.xplor
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

(3) create the Fobs-Fcalc file in DATAMAN (note: you can *NOT* do this in X-PLOR with "do amplitude (fobs=fobs-fcalc)", since this will give you the absolute value of the difference; here you want to keep the *sign* of the difference as well !):

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 DATAMAN > re fo nati_fobs.xplor xplor
 DATAMAN > re fc nati_fcalc.xplor xplor
 DATAMAN > co fo fc
 Correlation coeff Fobs : (   0.954)
 Rmerge = SUM |F1-S*F2| / SUM |F1+S*F2|
 Value of Rmerge : (   0.087)
 [NOTE: actual R-factor is ~2 times 0.087 = 17.4 %]
 DATAMAN > df delta fo fc
 DATAMAN > wr delta nati_fo_fc.xplor rxplor
 DATAMAN > $ head -3 nati_fo_fc.xplor
INDEX= 6 0 0 FOBS= 5.200 SIGMA= 3.253 TEST= 0
INDEX= 7 0 0 FOBS= -14.239 SIGMA= 3.677 TEST= 0
INDEX= 8 0 0 FOBS= -5.998 SIGMA= 4.667 TEST= 0
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

(4) Wilson-scale the complex Fobs to the high-res dataset Fobs wit DATAMAN (note: you *must* do this, unless both datasets are already on the same -e.g., absolute- scale; otherwise the subtraction will produce rubbish):

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 DATAMAN > read nati ../../nati/hkl/cbh2.xplor xplor
 Nr of reflections read : (      55094)
 DATAMAN > read mug mug_merge.xplor xplor
 Nr of reflections read : (      23496)
 DATAMAN > compare mug nati
 ...
 HKLs in set 1 : (      23496)
 HKLs in set 2 : (      55094)
 HKLs in both  : (      10413)
 Correlation coeff Fobs : (   0.928)
 ...
 Rmerge = SUM |F1-S*F2| / SUM |F1+S*F2|
 Value of Rmerge : (   0.090)
 DATAMAN > cell nati 49.1 75.8 92.9 90.0 103.2 90.0
 DATAMAN > symm nati p21.sym
 DATAMAN > cell mug 48.76 75.1 91.7 90.0 103.0 90.0
 DATAMAN > symm mug p21.sym
 DATAMAN > calc * resol
 Highest resolution : (   1.743)
 Highest resolution : (   2.400)
 DATAMAN > calc * centr
 DATAMAN > calc * orbit
 DATAMAN > kill nati resol < 2.4
 DATAMAN > kill mug resol > 8.0
 DATAMAN > wilson nati mug
 Name of first plot file ? (wilson_nati_mug_1.plt)
 Name of second plot file ? (wilson_nati_mug_2.plt)
 Step size ? (2.4999999E-03)
 ...
           W SCALE  =  0.20725E-01
           W BTEMP  =     -4.852
 ...
 Applying scale to set 2
 ...
 Comparison of <I1> and <I2> :
 Correlation coefficient : (   0.997)
 Scaled R w.r.t. <I1>    : (  6.731E-02)
 Scaled R w.r.t. <I2>    : (  6.731E-02)
 RMS difference          : (  2.897E+07)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

(5) create the difference Fobs file with DATAMAN (note that this may give a few reflections with Fobs < 0. I tend to ignore these [using "fwindow 0.001 1000000" in X-PLOR], but you could also reset them to zero):

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 DATAMAN > df diff mug delta
 Delta-F Set 1 = (MUG)
     and Set 2 = (DELTA)
 Encoding reflections of set 1 ...
 Checking reflections of set 2 ...
 HKLs in native     set 1: (      23064)
 HKLs in derivative set 2: (      51722)
 HKLs in new nat-der set : (      22304)
 Nr of WORK reflections : (      20446)
 Nr of TEST reflections : (       1858)
 Percentage TEST data   : (   8.330)
 This is an Rfree dataset
 DATAMAN > stats diff
 Stats : (DIFF)

Item Minimum Maximum Average Sdv Var ==== ======= ======= ======= === === H -20 19 -1.205 8.986 80.754 K 0 29 11.362 7.322 53.616 L 0 38 14.850 9.247 85.508 Fobs -1.828E+01 5.202E+02 8.910E+01 5.651E+01 3.194E+03 SigFo 1.273E+00 2.019E+03 2.726E+02 2.170E+02 4.709E+04 Fo/Sig -4.208E+00 1.248E+02 5.261E+00 1.236E+01 1.527E+02

Correlation Fobs-SigFo : ( -0.212) Correlation Fobs-Fo/Sig : ( 0.344) Correlation SigFo-Fo/Sig : ( -0.502)

Nr of reflections : ( 22304) Nr of WORK reflections : ( 20446) Nr of TEST reflections : ( 1858) Percentage TEST data : ( 8.330) This is an Rfree dataset DATAMAN > wr diff diff_refinement_fobs.xplor rxplor ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

(6) refine against the new DIFF dataset (but calculate maps and [free] R-factors using the normal Fobs after every refinement cycle)


10.3 Harmonic restraints

If you start refining a low-resolution complex, starting from a high-resolution native model, and you can't use difference refinement (e.g., with different crystal forms), you can still use harmonic restraints during your simulated annealing runs to prevent large artefactual changes to the model. Harmonic restraints can be used to force atoms to stay roughly where they were at the beginning of the refinement (unless there is a major energetic driving force to move them, e.g. when the fit of the X-ray data is improved a lot by moving atoms about).

An example of how to set this up in X-PLOR:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 { set up harmonic restraints (low-resolution complex !) }

{ default force constant = 5 } vector do (harmonic=5.0) ( all )

{ for main chain atoms + O + CB, use 10 } vector do (harmonic=10.0) ( segid AAAA and ( name cb or name ca or name n or name c or name o or name ot1 or name ot2 ))

{ very low values for all residues near the ligand !!!!! } vector ident ( store1 ) ( byresidue ( (resi 901 or resi 902 or resi 903) around 10.0 ) ) vector do (harmonic=1.0) ( store1 )

{ for all waters, use 20 } vector do (harmonic=20.0) ( segid CCCC )

{ no restraints for the (poor) ligand model } vector do (harmonic=0.0) ( segid DDDD )

{ no restraints for the (poor) carbohydrates } vector do (harmonic=0.0) ( segid BBBB )

{ no restraints for hydrogen atoms of course !!! } vector do (harmonic=0.0) ( hydrogen )

vector do (refx=x) (all) vector do (refy=y) (all) vector do (refz=z) (all)

{ include the harmonic restraint energy term } flags include harm ? end ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----


10.4 NCS constraints

In this case, you refine only one copy of your molecule, the "reference".
Get your NCS operators FROM your reference molecule to all NCS-mates, e.g. in O. Use XPAND to generate an X-PLOR include file which specifies the XNCS and NCS relations. If X-PLOR complains that there are too many, remove the NCSRels (NEVER the XNCS operators !!!) that have the largest distances. You may also leave out all NCSRels if you want, but then X-PLOR will not be able to evaluate all symmetry-induced interactions, which may lead to bad contacts.
An example of parts of such a file:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
REMARK Created by XPAND V. 950220/0.6 at Fri Feb 24 22:32:08 1995 for user gerard

{ Invoke strict non-crystallographic symmetry }

ncs strict

{ ==> Assuming identity skew matrix }

skew matrix = ( 1.000000 0.000000 0.000000 ) ( 0.000000 1.000000 0.000000 ) ( 0.000000 0.000000 1.000000 ) translation = ( 0.0000 0.0000 0.0000 ) end

xncsrel { # 1 } matrix = ( 1.000000 0.000000 0.000000 ) = ( 0.000000 1.000000 0.000000 ) = ( 0.000000 0.000000 1.000000 ) translation = ( 0.0000 0.0000 0.0000 ) end { xncsrel # 1 }

xncsrel { # 2 } matrix = ( 0.994635 -0.096273 -0.037846 ) = ( 0.097060 0.995086 0.019549 ) = ( 0.035779 -0.023118 0.999092 ) translation = ( 20.0125 45.1470 45.3466 ) end { xncsrel # 2 }

{ xncsrel end }

ncsrel { # 1 } { from NCS # 1 -> NCS # 1 SGS # 1 T=( -1, 0, 0) DIST = 49.1 A } matrix = ( 1.000000 0.000001 -0.000001 ) = ( -0.000001 1.000000 0.000001 ) = ( 0.000001 -0.000001 1.000000 ) translation = ( -49.1000 0.0000 0.0000 ) end { ncsrel # 1 }

ncsrel { # 2 } { from NCS # 1 -> NCS # 1 SGS # 2 T=( 1, -1, 1) DIST = 42.4 A } matrix = ( -1.000000 0.000000 0.000000 ) = ( 0.000000 1.000000 0.000001 ) = ( 0.000000 0.000001 -1.000000 ) translation = ( 27.8862 -37.9001 90.4454 ) end { ncsrel # 2 }

...

ncsrel { # 23 } { from NCS # 2 -> NCS # 2 SGS # 2 T=( 3, -1, 2) DIST = 48.4 A } matrix = ( -1.000000 0.000000 0.000001 ) = ( 0.000000 1.000000 0.000000 ) = ( -0.000001 0.000000 -1.000000 ) translation = ( 104.8722 -37.9000 180.8909 ) end { ncsrel # 23 }

{ ncsrel end }

?

end {ncs strict} ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

The X-PLOR input files need to be changed only slightly:
- generate.inp -> set it up so it only generates PDB/PSF files for one molecule (your "NCS asymmetric unit")
- check.inp and all refinement input files -> insert "@xplor_ncs.include" after reading the coordinates.

When you use NCS constraints, you often get the lowest Rfree values (even at 2-2.2 A resolution) if you refine grouped temperature factors.

NOTE: the O macro "OMAC/ncs_symm_sphere.omac" demonstrates how to set up the symmetry commands in O with strict NCS. The file "OMAC/ncs_waters.csh" shows how to pick potential water peaks that obey the NCS.


10.5 NCS restraints

With NCS restraints you refine all the NCS-related molecules, but you add restraints to keep them similar. Use a sensible restraint scheme, i.e. regions of which you know that they obey the NCS well, should be tightly restrained, whereas regions which don't (or of which you're not sure) can be restrained with lower force constants (or even unrestrained).
It is simplest to put the NCS restraints in a separate file which you can "include" in your other input files. An example of such a file:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 REMARK NCS-restraints for CBH II native at 1.8 A

{ use some identity arrays for the selection }

{ STORE1 = all non-hydrogen protein atoms } vector ident (store1) ((segid = "AAAA" or segid = "BBBB") and (not hydrogen) and (not resid 85:86))

{ STORE2 = protein backbone } vector ident (store2) (store1 and ( name ca or name n or name c or name o or name ot1 or name ot2 ))

{ STORE3 = side chains } vector ident (store3) (store1 and (not store2))

{ STORE4 = main chain with possible NCS-breakdown or disorder } vector ident (store4) (store2 and ( resi 86:88 or resi 96:98 or resi 105:111 or resi 114:123 or resi 155:162 or resi 288:290 or resid 309:311 or resi 405:412 or resi 445:447 ))

{ STORE5 = side chains with possible NCS-breakdown or disorder } vector ident (store5) (store3 and ( resi 86 or resi 89 or resi 116:121 or resi 129 or resi 140:141 or resi 144 or resi 147 or resi 156:161 or resi 182 or resi 186 or resi 189 or resi 194 or resi 204 or resi 208 or resi 237 or resi 240 or resi 244 or resi 281 or resi 294 or resi 313 or resi 319 or resi 344 or resi 356 or resi 362:363 or resi 382 or resi 406:411 or resi 426 or resi 446:447 or resn="SER" ) )

ncs restraints

{ normal NCS protein backbone -> tight } group equi (segid "AAAA" and store2 and (not store4)) equi (segid "BBBB" and store2 and (not store4)) weight-ncs=100.0 sigb=3.0 end

{ non-NCS protein backbone -> lax } group equi (segid "AAAA" and store2 and store4) equi (segid "BBBB" and store2 and store4) weight-ncs=5.0 sigb=5.0 end

{ normal NCS protein sidechains -> fairly tight } group equi (segid "AAAA" and store3 and (not store5)) equi (segid "BBBB" and store3 and (not store5)) weight-ncs=50.0 sigb=3.0 end

{ non-NCS side chains -> unrestrained }

?

end

flags include ncs ? end ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

The X-PLOR input files need to be changed only slightly:
- generate.inp -> set it up so it generates a PDB/PSF files containing all your NCS-related molecules
- check.inp and all refinement input files -> insert "@ncs_restraints.include" after reading the coordinates.

The file "OMAC/ncs_maps.omac" demonstrates how you can contour maps at equivalent positions in NCS-related molecules and get them displayed on top of each other (so you can do "visual averaging" as well as assess which parts of the structure display NCS-breakdown).


10.6 Mixed NCS constraints and restraints

You can use arbitrary mixtures of NCS constraints, restraints and unrestrained parts of a model. For example, if you have two dimers in the asymmetric unit, you can CONstrain the two dimers to be identical, while restraining the two monomers to be similar. In that case, the dimer is the "NCS asymmetric unit" (for generating a constraints include file), whereas the restraints between the monomers must be put in a restraints file.

Note that NCS restraints (as opposed to constraints) need not apply to the entire molecule. For instance, if you refine a two-domain structure with two-fold NCS, but the relative orientation of the two domains is different in the two NCS-related molecules, you can still restrain each of the two domains to be similar to its counterpart inthe other molecule.

There are many ways in which NCS restraints can be used. For instance, you could create groups corresponding to secondary structure elements ("a helix is a helix is a helix ..."), or even subsequent pentapeptides etc.


10.7 Rigid-body and group-B refinement

Whenever you start refinement using a known structure, but in a different cell or even spacegroup, you naturally start with some rigid-body refinement, even for nearly-isomorphous complexes etc.

If, in addition, the starting structure derives from room temperature data, whereas the new dataset was collected using cryo-techniques, it is a good idea to do some grouped temperature-factor refinement after the rigid-body refinement. Note that group-B refinement applies identical B-factor *shifts* to all selected atoms; it does *not* make all Bs equal. Example (to be included after the rigid-body refinement):

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 xrefin
   resolution 8.0 2.0
   optimize group
     b=(not hydrogen)
     nstep=10
     drop=40.0
     bfmin=2.0
     ?
   end
 end
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

USF Latest update at 12 February, 1998.