Macromolecular crystallography is a pipeline, where each step may be rate limiting. Map interpretation is arguably the most exciting step in the process, and the moment that one sees the model of a new structure for the first time is often a memorable event in a crystallographer's career. All steps in the pipeline have seen major improvements, except perhaps the last one where we have to write up our results. These advances have allowed us to take on projects that are technically more challenging, and to improve the overall quality of ‘normal’ structures. New detectors and improved X-ray sources have allowed us to collect better quality diffraction data. Still, not all crystals diffract to high resolution, and a sizeable number of experiments (often of large macromolecular complexes) produce electron density maps that are difficult to interpret. This is the occasion where the crystallographer can make serious mistakes, but fortunately the most serious errors are those that are least frequently made. Most of the less serious errors are removed during refinement but spectacular errors are still made and often in the most prestigious journals. Everyone makes mistakes, and these can be reduced by education and by improving the tools that we use. Automatic tools will produce the same results if the user is an expert or a beginner, but this is certainly not the case for interactive map interpretation. This is especially true at low resolution and poor phasing,where it becomes harder to build a model without severe errors and many scientifically important structures fall into this category. Such efforts are time consuming and require many hours spent staring at maps on computer graphics screens. Fortunately, hardware issues are no longer a problem and the well known mantra 'small, faster, cheaper' means that a 3D-capable workstation on every crystallographer's desk has become a reality. Such systems can be equipped with a 'high performance' graphics card, although even entry level machines are now sufficient for many model and map manipulation needs. The widespread use of a common programming interface, OpenGL, has made it straight forward to support the commonly available operating systems.
Here I will discuss how one can go about using the tools that I have developed for map interpretation. The images were cut and pasted from O running on a Mac laptop, and the experimental maps are available as indicated in the text that follows.
3.1 Getting started
Key steps in the interpretation of an electron density map in O (Jones et al., 1991) involve manipulating an intermediate skeleton representation of the density and building a sequence-less model (poly-alanine for proteins) of the structure on the basis of how the skeleton folds in 3D. Greer (1974, 1975) introduced a simplified representation of electron density that followed the main features in a map. This so-called skeleton was used initially in one of the first attempts to automate map interpretation but was later widely used to assist in interactive map interpretation. A skeleton can be defined as an easily edited representation of the density and can be used to provide an overview of the map, which complements the detailed view provided by the classical contoured representation (Jones & Thirup, 1986). The implementation of interactive model building with skeletons was made possible by developments in computer graphics, in particular the availability of high performance colour vector-drawing systems and 32-bit mini-computer systems. Colour was used to classify individual skeleton 'atoms' in a natural way to provide the basis for a tracing hypothesis. The skeletonization algorithm in O, for example, assigns a code to each skeleton based on how far it is from a free end. This code is then mapped to a colour (under the user's control) to indicate initial assignments for main-chain and side-chain ‘bones’ (cyan and red, by default). Tools are then available to redefine connected groups of atom to take on new codes and related colours. At its simplest, the user may change atoms to the yellow class, for example, to indicate a likely main-chain hypothesis, or redefine cyan atoms to be red side-chain atoms. More complicated schemes schemes can be used where one colour is used to indicate portions of the skeleton that have been traced with confidence, for example, and another colour to indicate where one is uncertain. I have also used colour codes to provide directionality information, indicating N- and C-terminal portions of a separated segment of chain.
In the O implementation (Jones et al., 1991), the initial skeleton is generated with an algorithm that requires the definition of a base level. The default is 1.5 the root-mean-square (RMS) of the map, but this may need to be adjusted. If the selected value is too low, there may be more connections than is optimal and if too high, there may be too few. An experimental map also has errors in it and whatever the skeletonization algorithm, there will be errors in the skeleton that has been derived from the map. Therefore, tools are needed to allow the user to delete some connections (hydrogens bonds between strands, for example, or interacting side-chains where the density may fuse), and make new ones (to join points separated by weaker density, for example). Other useful tools add a side-chain branch point (it can be used to 'grow' a series of skeleton atoms in a poorly defined loop, for example) and a command to 'grab' a skeleton to move it where the builder wants it (in the middle of the poorly defined loop, for example). Skeletons can also be used to make masks for non-crystallographic symmetry (NCS) density averaging within O, and to generate the operators between NCS units. Weak signals for selected structural features can also be enhanced to aid interpretation (Kleywegt & Jones, 1997a; Cowtan, 199x), in particular to indicate helical or strand regions. A new tool to indicate the secondary structure framework is available, Auto_2ry, which uses skeletons to speedup a new real-space based algorithm.
The Skeleton Master-Menu described in Chapter 2 provides the essential tools for generating and editing skeletons. Since the skeleton is an O molecule with just one extra atom property to define the status code, any number of objects can, in principle, be generated. The master-menu generates two skeleton objects for the builder; one to show all atoms in (usually) a smaller volume, and another object showing the current hypothesis for how the main-chain folds in 3D space. Crystal symmetry can be applied to the main-chain object to indicate crystal packing and to ensure that one does not move into a symmetry copy of the molecule.
The following example outlines the separate steps involved in map interpretation and makes use of the experimental data used to solve the structure of P2 myelin protein (Jones et al., 1988). The new builder will benefit from downloading the necessary files from my public ftp account (directory p2_mir.zip). The expanded file will consist of a directory called p2_mir which will contain 3 files:
ano1.map is the experimental map in my bricked format where each density point has been scaled to fit into one byte
p2_1letter.txt the protein sequence of P2 myelin protein in one letter code, 131 amino-acids
on_startup the O macro file activated when O is started in this directory. It reads in the experimental map and opens a few windows,
Open a Terminal window, set the directory to p2_mir and we can start O, pressing the <return> key at each prompt.
O > This version of O is free for anyone to use.
O > Contact email@example.com if you have a problem.
O > O version 14.0.0, Build 120914
O > O is 3D graphics enabled
O > No dials
O > Mono enabled only
O > Gamepad disabled
O > ODAT environment variable :/Users/alwyn/o/data/
O > Undo files saved in /Users/alwyn/o/temp/
O > O for Mac & Intel processors.
O > Run line:
O > Define an O file (terminate with blank):
O > Menu names are not defined.
O > Enter file name [/Users/alwyn/o/data/menu.odb]:
O > menu.odb file for O version 13.1.0, 120828
O > Startup file was never loaded
O > Enter file name [/Users/alwyn/o/data/startup.odb]:
O > startup.odb file for O v12 080508
O > Guess Matrix with PRO
O > terminal arcade codes loaded
O > 3 button mouse, double and single pixel line widths
O > Lsq definition defaults are taken.
O > Making default Bones colours
O > Default fog values have been defined
O > These are in ODB .ogl_light entries 14-16
O > An ODB is missing, it is updated with a default.
O > Number of mouse buttons 3
O > Stereochemistry file was never loaded
O > Enter file name [/Users/alwyn/o/data/stereo_chem.odb]:
O > stereochemistry dictionary, PDB remediated names, IUPAC phosphates
O > Recent changes:
O > 100401 PRO Phi -63 -> -60
O > 100226 more PRO flattening
O > 100118 OXT residue
O > 081009 ALA rsr definitions
O > 080912 single letter codes for protein and nucleic acid
O > 0808020 ATP,NDP
O > 080807 flat PRO & PO4
O > read ok
O > Connectivity used is : all
O > Maximum linkage distance = 2.00
O > There were 53 residues.
O > 642 atoms.
O > The usual on_startup macro will be activated
O > ****************************************************
O > Warning, program version does not match the database
O > Database is: O version 13.1.0
O > Read in new startup.odb & menu.odb files!
O > ****************************************************
O > ODAT in use : /Users/alwyn/o/data/
O > There are 0 molecular objects
O > Making visibility data structures.
O > Making visibility data structures.
window count 70
As4> ......No message from the O administrator........
As4> File not found in path: on_startup
As4> Indirect file does not exist.
As4> Number of mouse buttons 3
We have now loaded the necessary ODB entries into the user database and you can inspect the contents by issuing the Directory command:
O > dir *
Heap> .O-VERSION T W 21
Heap> .MENU_MAJOR_NAME C W 88
Heap> .MENU_MINOR_NAME C W 878
Heap> .MENU_COLOUR I W 44
Heap> .MENU_DISPLAYED I W 44
Heap> .MENU_VISIBLE I W 44
Heap> .MENU_INTEGER I W 3
Heap> .MENU_REAL R W 2
Heap> .PARENT_MENU T W 1071
Heap> .CONTROLS T W 867
Heap> .DISPLAY T W 1071
Heap> .SKETCH_DISPLAY T W 1071
Heap> .PAINT_DISPLAY T W 1581
Heap> .OBJECT_DISPLAY T W 408
Heap> .MOLECULE_DISPLAY T W 357
Heap> .SEQUENCE_DISPLAY T W 408
Heap> .BATON_WINDOW T W 408
Heap> .REBUILD T W 1071
Heap> .REBUILD_GRAB T W 663
Heap> .REBUILD_TOR T W 561
Heap> .REBUILD_REFI T W 612
Heap> .STEREO R W 2
Heap> .OGL_OPTIONS I W 14
Heap> .FM_REAL R W 2
Heap> .REFI_BAD R W 3
Heap> .HKL_INTEGER I W 3
Heap> .SPROUT_REAL R W 2
Heap> .RENDER I W 1
Heap> .OX_REAL R W 1333
Heap> .GRAB_INTEGER I W 1
Heap> .LOOP_C6 C W 5
Heap> .ROTAMER I W 1
Heap> .QDSI I W 1
Heap> .QDSR R W 4
Heap> .QDSC C W 3
Heap> .BONDS_ANGLES T W 317988
Heap> .OBJECT_OBJ_DISP_BONDS T W 720
Heap> .OBJECT_VIS_DISP_BONDS I W 20
Heap> .OBJECT_OBJ_DISP_ATOMS T W 720
Heap> .OBJECT_VIS_DISP_ATOMS I W 20
Heap> 700 data blocks used, space for 10000
Heap> 6786 integer/real units used, space for 100000000
Heap> 1776 character units used, space for 10000000
Heap> 393243 text units used, space for 1000000
O has already created 700 named entries in the database!
Figure 3.1 Opening the master-menu for working with skeletons
The 3D window will show a window that contains the master-menu system, but if it is not open, use the pull-down menu as showd in Figure 3.1 and then load the Skeleton Master-Menu. We are now ready to interpret the map, which had been loaded by the on_startup macro. The text written to the Terminal provides information about the map (such as grid data, envelope size, RMS values etc):
Fm> Map type is DSN6
Fm> Parameters as read from the map file:
Fm> Grid ................. 76 82 48
Fm> Origin ............... 0 0 0
Fm> Extent ............... 77 83 49
Fm> Fast, medium, slow.... X Y Z
Fm> Cell axes ............ 91.80 99.50 56.50
Fm> Cell angles .......... 90.00 90.00 90.00
Fm> No reslicing of map necessary
Fm> Prod ................. 20.00
Fm> Plus ................. 0
Fm> Min, max, RMS ...... 0.00000 8.35000 0.95545
Fm> Scale ................ 1.000
Fm> Symmetry information may be used
Fm> O-style symm-ops for spacegroup P212121
The map has the name MIR and there will be an item in the Objects window called MIR_1 containing 3D vectors of contouring information, whose values were also defined in the on_startup macro.
3.2 Localising the molecule
Our first job decide where our molecule is with respect to the origin of the unit cell and the asymmetric unit of density. Our molecule is not normally all contained within the asymmetric unit of density for the particular cell and crystal symmetry. O, however, is able to generate suitable density to cover any volume of space provided we have an asymmetric unit. In this particular case, the crystal symmetry is P212121 and the map corresponds to a complete unit cell. In this case, it makes sense to skeletonise the complete cell and then localise a single molecule. To do this, we need to set various radii that control the radii of the skeleton objects (Setup Skeleton Objects pop-up), and then calculate the skeleton (Skeletonize all map pop-up), Figure 3.2.
Figure 3.2 Pop-ups to set object radii and to skeletonise the whole map
Notice that I have set the main-chain skeleton object to a large radius to ensure the whole molecule is drawn. In the skeletonize pop-up, I have taken the default of 1.5xRMS just to see what it gives; the result is shown in Figure 3.2. The red box is the unit cell created with the Symm_cell command (see ‘A-Z of O’).
Figure 3.3 Calculated main-chain skeleton of whole P2 unit cell at 1.5xRMS.
Once the map has been skeletonized, 2 new objects will also have been created call s1_mc and s1_msc; in the Objects window there will now be 3 entries (MIR_1, S1_MC, S1_MSC), one of which is coloured in green text (S1_MC) to indicate that the object is visible. When we start from scratch, the screen centre is set to (0,0,0) and the skeleton object will be off-set to one side. A new screen centre can be defined by activating Centre ID and then identifying a skeleton atom. The builder can interact with the 3D object using the mouse, virtual dials or other input devices (see Chapter 1 for more details).
To make an object showing the unit cell as a red box, we need to indicate which molecule to use and then issue the relevant O command:
O > mol s1 symm_cell
The Symm_cell command creates a new object called CELL, which is a non-pickable object and contains simple drawing instructions to show the unit cell of the crystal as shown in Figure 3.2 (see ‘A-Z of O’). Now we can try to evaluate how many molecules we have in the cell and roughly where they are. In space group P212121 we have 4 asymmetric units in the complete cell and clearly if there is only one molecule in the asymmetric unit, the cell would have a very high solvent content. By changing the view, you should be able to make an estimate for how many molecules there are in the cell, and if we divide by 4, that will give us the number of molecules in the asymmetric unit. P2 myelin is small, just 131 amino-acids, and since the structure was unknown at the time, one should assume it is a ball shaped molecule of radius ~20 Å and now try to count balls. Remember that molecules need to undergo unit cell translation when they come to the edge of a cell.
Let us finish the setup stage before continuing since at some stage we need to define the molecule that we hope to build and we need to load in the data required to work with secondary structure templates (SSTs). The latter is accomplished by clicking on SST setup, protein in Figure 3.2, which loads in the SST fragments for short helices and strands, as well as a poly-alanine molecule of 700 residues called TRACE. ODB entries for the molecule that we hope to build are generated from the sequence that we have to provide. Setup sequence in Figure 3.2 requires that you defined the file containing the sequence in one-letter code format. The P2 sequence is defined in the file p2_1letter.txt provided. Clicking on Setup sequence will generate a pop-up:
Figure 3.4 Pop-up to read in the sequence and generate ODB entries for the molecule to be built.
There is no default name for the molecule nor file name for the sequence in this pop-up. The molecule you define will be initialized if it already exists in the O database, otherwise the Build_init command will create all the entries that are needed for the specified sequence. The residue and atom information needed to go from a sequence to ODB entries is described in the stereo_chem.odb library file. Since some residues that have different 3-letter sequence codes have the same 1-letter code, O will prompt you as required. This is what happens for the P2 myelin case:
New> Macro will be activated.
Build_init M1 p2_1letter.txt
New> Number of residues 131
New> There were 19 different residues
New> There were 4 problems in conversion
New> Specify 3 letter code for G :gly
New> Specify 3 letter code for T :thr
New> Specify 3 letter code for A :ala
New> Specify 3 letter code for C :cyh
Sam> Nothing marked for deletion, so no compression.
Sam> Making residue names.
Sam> There are 131 residues, 1038 atoms.
Sam> This atom has the potential for an incorrect Z: CA …
We also need to set unit cell and symmetry information at the terminal:
O > sym_set
Sym> Molecule name? [M1]:
Sym> Define cell constants [ 0.00 0.00 0.00 0.00 0.00 0.00]:91.80 99.50 56.50 90 90 90
Sym> Name of spacegroup? [P1]:p212121
Sym> O-style symm-ops for spacegroup P212121
O > Symm_cell
The second command draws the unit cell shown in Figure 3.3. It is also time to save the O database to the file system:
O > save
As1> File_O_save is not defined.
As1> Enter file name [binary.o]:p2.o
Now we should try to find a molecule in the skeletonized map. The bones look like cooked spaghetti the first time you experience them, but for now we are just looking for discrete blobs localized in space to a sphere of ~20 Å radius. You should be able to recognise a few, and also see the symmetry related mates (always 4 of them for this space group, but remember that you might have to imagine a unit cell translation to see a full blob). After a while, you will see pairs of blobs that interact; it is a crystal structure remember and crystals form because the molecules interact with symmetry related copies. You should also recognise the solvent region and since this map has not been averaged or flattened, you may see a few short connections too. If you look at the skeleton, you should see more than 4 balls of connections and so we have non-crystallographic symmetry. If you count the number of balls, it has to be a multiple of 4, but beware of translational symmetry; a particular ball might be close to the edge of the cell and therefore appear cut, with the rest appearing elsewhere by applying the relevant symmetry operation. Non-crystallographic symmetry is a great to have, if your crystal has it, you should use it. In this case we will not because the map becomes too easy to interpret. However, there is another chapter on using NCS in this guide.
Figure 3.5 Localization of the A molecule. An orange circle indicates the skeletonized density that we will interpret for a single P2 myelin chain. It interacts with the B molecule, which is also indicated, and you will find continuous skeletal connections between them. Note also the clear solvent channels around A.
Since I want us to work on the same molecule, set the screen centre to this position in space:
O > cen_xyz 50 62 30
Which is at the centre of the molecule we will build. Notice that there is a relatively clear boundary with the solvent but also interactions with neighbouring molecules in the crystal. During the editing stage, it is important not to ‘move’ into the wrong molecule, which is easy to do if you have a tightly interacting dimer, for example.
Figure 3.6 Here the map is being skeletonized at 1.75xRMS, within a radius of 30 Å from the current screen centre, and the skeleton molecule will be called s3. If s3 already exists, it will first be deleted and then regenerated.
For the next step, we will create skeletons at different base levels. If we skeletonized the map at too low a level, there would be lots of connected noise, but if we choose too high a level we would miss out on connected regions. I do this by just looking at a few made at different settings (one could write a program to see how the number of branch points changes as a function of base level and then decide…). An important principle of working with computer graphics is to keep the images as simple as possible, and so I rarely work large contoured density volumes, nor large skeleton objects (except when I want an overview), so for this step we will work with skeletons that are within a sphere centred on our screen centre (currently 50 62 30). We will use the Skeleton in sphere option a number of times with different radii. The pop-up needs a molecule name as well as a radius, and I would just makes molecules s2, s3, s4 for base levels of 1.5, 1.75 and 2.0 RMS units, for example.
Figure 3.7 The effect of different base levels when skeletonizing the map. The map has been skeletonized at 1.5, 1.75 and 2 RMS to generate molecules that I have called s2, s3, s4
The skeletons that result from these 3 runs are shown in Figure 3.7. The lowest level skeletonized has bones atoms classified as main-chain where they are clearly side-chain atoms and more connections result that are wrong. The highest base level has created a skeleton where we have begun to loose main-chain bones, there is a helix on the lower right that has disappeared, for example. So far, we have not looked at the MSC-style object that show both main- and side-chain atoms, and if we did, we would probably see that some parts of the helix are now classified as side-chain and somewhere there will be a break. The middle level feels about right; well connected, but with some errors in connectivity, and some assignment errors. I will delete the other skeletons yet, but I will now work with the skeleton that I have called s3.
We would have done things a bit differently if the input map had been of an asymmetric unit. In such a case, the whole map would almost certainly not contain a complete molecule within its boundaries. We would start by skeletonizing the whole map as previously, centre on what looks like the centre-of-mass of the skeleton, and redo around the new centre but within a large radius (100 Å say). This skeleton would then be used to localize the molecule, just as we did for P2 myelin.
3.3 Editing the skeleton
This step is needed for two reasons; we are working with an experimental electron density and this will contain errors in the experimental phases. In some studies, where we have extensive NCS, for example, the phasing errors may be small but we will still need to edit the skeleton because features in the density may merge due to limited resolution (e.g. interacting side chains) or there may be breaks because of disorder. In very well phased maps, the initial skeleton is likely to show the complete main-chain trace with just a few problem areas that need user intervention. The experimental P2 map contains resolution-related artefacts as well 3-4 places where phasing errors result in breaks in the density. The skeleton atoms (bones) will have been classified as likely main-chain (class 3, default colour cyan), or side-chain atoms (class 2, default colour red). The class of a bones atom is also indicated in the default ID message template as the Z-value. The colour associated with each bones class is associated with the ODB .bones_colour and can, of course, be changed by the user. Here are the default colour descriptors:
O > wr .BONES_COLOUR ;;
.BONES_COLOUR T 10 20
The easiest way to change them, is to write out the ODB to the file system, edit it, and then read it in again. As well as errors in the skeleton connectivity, there will also be errors in bones classification that will need to be edited. A pair of interacting side-chains, for example, may produce fused density because of the limited resolution and form a locally connected skeleton that is incorrectly assigned as main-chain. This could require a reassignment, and the introduction of a break in the connectivity. Such actions (and others) are made with commands from the ** Edit Skeleton ** blind, Figure 3.8.
Figure 3.8 Skeleton modifying commands in the ** Auto build ** and ** Edit skeleton ** panels. In the left panel, the Make 2ry framework action has been activated to generate the pop-up.
Two commands modify the connectivity; Make a bond and Add a bond require the user to identify two atoms (in the same object, so ensure only one of skeleton objects is visible!) to add a new bond or to delete all bonds between them. When you delete a series of bonds between the identified atoms, O deletes the shortest connected list; if this is not what you wanted, just use the Undo Edit operation. Three commands allow you to modify bones status codes but only a limited set of codes are used. These correspond to the defaults produced my the skeletonization process that I call ‘maybe’ (the cyan coloured atoms) and ‘side-chain’ (red atoms), as well as a new class called ‘main-chain’ (yellow atoms of class 1). If Set to side-chain is used to identify a pair of atoms in the <mol>_MC object, all connected atoms will have their bones status redefined to 2, the object will be be redrawn, and the connected atoms will no longer be present in the object. They have not been deleted, they are I the <mol>_MSC object. Similarly, side-chain atoms that are in the connection list of a Set to main-chain operation will change colour in the <mol>_MSC object, but now also appear in the <mol>_MC object.
Sometimes the user may want to move a bones atom to to reposition it in the density, and so the blind includes a Grab an atom option. This activates the Grab_atom command, which can only be terminated with the Clear flags command. When the molecule is actually built with the Decor-commands, branches in the skeleton can be used to assist in the placement of CA atoms (as well as carbonyl oxygen atoms) and these can be introduced by the Add a branch command. The new atoms are classified as side-chain atoms. The command is also useful in regions of low or poor density to construct a path where we have too few, probably unconnected atoms. In such a case, I keep the Grab_atom command active, placing a bones atom where I want it, then adding a branch to it, positioning this new atom, then adding a another branch etc.
Four commands are associated with the making or displaying of the skeleton objects and symmetry mates. The ‘details’ object contains all bones atoms that are within the specified radius of the molecular centre, and is called <mol>_MSC, while the current main-chain hypothesis (I.e. those atoms that are not defined as side-chain) is called <mol>_MC. The ‘symmetry’ object is called sym_<mol> and is an instance of the main-chain hypothesis object within the radius defined during the setup stage. An instance contains just graphical primitives and lacks underlying molecular data structures; atoms do not exist in this object and so they cannot be identified (see ‘A-Z of O’, Symm_instanc). The molecular centre can be reset with the Set to molecular centre command to the current screen centre, and the the skeleton objects also re-calculated.
The Undo Edit operation has five levels of undo available, and also updates the objects. I recommend that snapshots of the skeleton are also saved to the file system so that if you change your mind after lots of edits, you will be able to return to an earlier hypothesis. This is done as follows (for the S3 skeleton above):
O > wr s3* s3.odb ;
But use a sensible file name! I always urge users to take good notes, even including screen-grabs to remind oneself of the decision making process. I rarely do anything useful without OmniOutliner open beside my O window; even Word is better than nothing.
Let’s get started on the S3 skeleton. The 2 skeleton objects provide an overview of our folding hypothesis (S3_MC), and a detailed view (S3_MSC), which is used for making most of the changes. Turn on just the overview try to identify some secondary structure. Do not get confused by the NCS and crystallographic symmetry related molecules, of which there are bits of skeleton for at least 4. Try to recognise helices (there are 2) and sheets (again 2). One sheet should be relatively easy to locate. Now click on Make 2ry framework in the ** Auto build ** blind, and activate the pop-up shown in Figure 3.8a. The algorithm is described in some detail in A-Z of O and it will not be repeated here. The whole process is fast, with the run-time depending on the number of skeleton atoms in the object and their degree of connectivity, but of order 50 seconds on my Macbook Air. This action will create a new skeleton called 2ry, which is much easier to assimilate, as well as a map that is also called 2ry. The molecular boundaries are now easily recognised, as is one of the β-sheets, which has 6 strands in it. One α-helix should be visible, and a second one is close by.
Figure 3.9 Two views of the 2ry skeleton generated from the unedited S3 skeleton. The text indicates features that are discussed in the text; 1-11 indicate strands, RT indicates reverse turns, and 2 helices are indicated
The simplifications that are introduced in the 2ry skeleton, also help one to interpret the starting skeleton. For example, the sheet made up of strands 1-6 is now easily recognisable and this allows us to make useful edits to S3, both in connectivity and in the status codes. Note that if we have a strand, the side chains will alternate up-and-down as we go along it, and main-chain interactions between strands will sometimes result in skeleton connections. Both observations are useful to keep in mind when inspecting and editing the skeleton. Figure 3.9 also indicates a number of reverse turns, some in the well defined sheet (RT1-2), and others in the ‘front’. RT1 connects strands 1 & 2, while RT2 connects 3 & 4. Strands 2 & 3 are adjacent, could they be connected by a reverse-turn? This is a hypothesis, and to decide if it is reasonable, we would look at the density. Similarly, we can ask if strand 4 & 5 are connected or not by a reverse turn. If we convince ourselves, that these reverse turns exist, we would have a β-sheet with simple up-and-down topology. What about the reverse turn RT4, which seems to connect strand 1 and 11? Does the density support the hypothesis? RT3 is harder to interpret, could it be linked with the strands labelled 9 & 10? Remember the side-chain zig-zag when you look at the density! Connections between strand 8 & 9, as well as 9 & 10 are now looking like the next candidates for reverse turns. This would then give a front β-sheet of 4 strands, 7-8-9-10.
Now lets look at the helical segments; we have two. The one that I have labelled Helix1, is connected to strand 6, which is in turn connected to strand 7. Helix2 is connected to strand 8, which has no other connections so far. Therefore, we have 4 free ends to look at and after inspecting the electron density, we would hopefully have some ideas on new connections to make.
Already at this stage we would want to check electron density for sequence-related insights. The end of Helix1 (the end that is not directly connected to strand 6) has a distinctive change in direction relative to the helical axis (the lower left in Figure 3.9a). One should immediately look for a glycine residue at this position since this is a common structural motif with a strong glycine preference because the local main-chain takes on an αL conformation (Schellman,C.,1980, In Jaencke,R. (ed.), Protein Folding. Elsevier, Amsterdam, pp. 53-62). When I first saw the skeletonised density (on December 13th, 1987, 30 years ago this week, as I write this text), I immediately checked the mid-point of strand 7 in the above because I had speculated that the N-terminal sequence of P2 myelin showed some sequence similarity to serum retinal binding protein (Bergfors, T., et al.,1987, J. Mol.Biol. 198, 357-358) and since this region has a distinctive structure in RBP (Newcomer, M.E., et al., 1984, EMBO J. 3, 1451-1454), I was on the look out for it. The density clearly confirmed the tryptophan residue that I had expected, and this gave me an immediate finger hold on placing the sequence on the density. This is, admittedly, rather specialised knowledge but time spent reading is time well spent.
After checking loose ends, and trying to visualize the zig-zag nature of the strands, you might be able to decide that P2 myelin is built up of 2 orthogonal β-sheets where the strands are indeed connected by simple up-and-down topology. The helix-turn-helix motif closes off one entrance to the β-barrel (to the left in Figure 3.9a) and the strong, connected skeleton at the barrel centre is a fatty acid ligand that interacts with arginine side chains that in turn result in even longer connections. The most difficult region to interpret is the front sheet in Figure 3.9, and it requires that one concentrates on the overview; the thought process goes something like this ‘it is a sheet, made of strands, there are breaks, there is the zig-zag of a strand so this density is a side chain while this is a main chain and so on, Figure 3.10.
Figure 3.10 In the top panel, the view has been chosen to high-light the ‘front’ β-sheet where we have a break between RT3 and segment 9. This becomes even clearer in the lower panel once we see the up-down zig-zag of the β-strand, which allows us to decide what is side-chain and what is main-chain density. The strand density is ’necking’ and produced a break in the calculated skeleton shown in Figure 3.11, as well as the 2ry skeleton shown here.
It wasn’t necessary to look at the 2ry electron density, but sometimes it is useful to check it too. Now which skeleton do I edit, the one called 2ry or S3? I would edit S3 and periodically generate a new 2ry skeleton. If I thought the 2ry was complete enough, I would in fact re-skeletonise the 2ry map and create yet another skeleton molecule (S5, say) and edit that. I do not want to work on a skeleton that I might inadvertently remake, in this case with the Make 2ry framework option! Figure 3.11 shows the S3 skeleton MC object with the same strand/reverse-turn text that was added to Figure 3.9.
Figure 3.11 The S3 skeleton in two views to show the two β-sheets that were apparent in the 2ry skeleton of Figure 3.9.
Clearly the same features are present but they are harder to see because of the extra connections. In the top panel of Figure 3.11, we see that each strand has extra linkages because of strand-strand hydrogen bonding. Now we have to make the changes, so use the 2ry object to decide where to go, the S3_MSC object to make the connectivity changes and S3_MC to get an overview of the S3 skeleton. You will have to decide on some new connections too, so work by adding a new branch, grab it and put it where you want it, repeat the process until you are happy and set the appropriate new main-chain status codes. My edited skeleton is shown in Figure 3.12.
Figure 3.12 My edited S3 skeleton, which now has a single connected path between the 2 identified atoms.
The yellow areas of the skeleton are regions where I have used the Branch/Grab/Redefine-status technique to model my changes. As you can see, many side-chain branches are still coloured cyan, i.e. remain classified as main-chain. This looks a bit untidy bit doesn’t matter; the fact that we have branches will become important later when they are used to guide Cα placement in loops connecting secondary structure. This skeleton is now ready to convert into a TRACE molecule if I know the directionality. It is also a good time to back up the molecule to the file system as described earlier.
In Figure 3.12 I have traced a single path through the electron density but in which direction does the protein chain go? To put it another way, is bones atom 356 at the N- or C-terminal end of the sequence? When I make this decision, I will have defined the directionality of my trace. Structures have been published where every secondary structure element has been built with the wrong directionality, so this problem is of more than just theoretical interest. We determine directionality by:
• The placement of the sequence in the electron density. In the case of P2 myelin protein, as I described above, I recognised a structural feature that suggested where to place the N-terminal residues, and then identified a G-X-W motif at this position. This hypothesis defined directionality and a sequence placement.
• Peptide bump identification.
Figure 3.13 The affect of resolution on the identification of branching from the main chain. In this example, model phases have been used to calculate the maps.
As one moves along the main chain, two type of branches will be observed if we have sufficiently high resolution diffraction data. One will correspond to the amino-acid side chains, and the other will be due to the carbonyl oxygen, the peptide bump. The side-chain branch will be (on average) longer then the peptide bump and so it should be obvious which is which. The distances between adjacent bumps (indicated by the yellow and magenta arrows in Figure 3.13) are different. If we move towards the C-terminus, starting at a side-chain bump we will see short, long, short distances, but if we move to the N-terminus it will be long, short, long etc. This can be done by inspection, or with tools that test the fitting of short fragments of structure in both directions. The Auto_2ry command evaluates both directions as it builds up segments of the TRACE molecule, while SST_build does a complete rotational search while fitting the centre of a secondary structure template (SST) at a skeleton atom. The Decor-commands, on the other hand, use the directionality defined by the user.
• Helix Christmas Tree
Recognising a helical region in a portion of skeleton does not define directionality, just handedness. However, in the helix we have a stronger signal then the peptide bumps alone, and is a consequence of the direction of the CA-CB vector, relative to the helix axis. This vector points towards the N-terminus of the helix, with the side chains hanging like the decorations on a Christmas tree. Local averaging (Jones, T.A., 2004, Acta Cryst. D60, 2115-2125) works with the SST-building system to enhance this affect. The user needs to build a 7 or 9-residue helix centred on a skeleton atom (see the following section), which can then be used to produce operators to average the electron density for each residue in the helix. Since only the main-chain atoms will be related by these operators, the density is smeared out for all side-chain atoms after the CB atom. Figure 3.14 shows what happens for Helix1 when we fit helices with alternate directionality.
Figure 3.14 In (a),l eft, a nine-residue helix has been fitted to the experimental P2 myelin density with the SST system and then local averaging has been used to evaluate the directionality. In (b), right, the helix has been flipped and RS fitted and the density locally averaged. The fitting of the CA of the central residue indicates that the directionality is correct in (a) and wrong in (b). Note that the local averaging does not enhance the peptide bump in this example.
Local averaging does not work for β-strands because changing the direction does not have such a big affect on the position of the CB atom at the central residue in the SST. It can be used to enhance the peptide bump signal, however.
As one works on correcting the skeleton, SST fragments can be introduced as required to indicate directionality. Such information is extremely useful in strengthening or weakening a folding hypothesis. Care must be taken when the builder gets contrary indications for the directionality; the secret is knowing what to ignore but a folding hypothesis that contains many contradictory indicators should be treated with caution. At low resolution, there is unlikely to be a strong peptide-bump signal and eventually helical segments may appear with more or less random directionality. Phasing errors will also degrade these signals.
The local averaging tool is not yet in the Master-Menu system, but can be activated from the Density/Helix tree pull-down.
3.5 Secondary Structure Templates
Secondary structure templates (SSTs) are useful building blocks for creating atomic models (Jones, 2004). O works with a special molecule called TRACE that can contain many small SST fragments and/or larger frameworks generated within the template building system. Each fragment within this molecule will have its own directionality but contains no sequence information. For proteins, all residues are of type alanine, while for nucleic acids they are cytosines (I will not discuss specific nucleic acid model building tools in this HowTo). Interactive tools are available to work with different length SSTs, to place them, and manipulate them. SSTs can be combined to build longer segments, or can assist in the creation of more accurate models. Skeletons get 'stretched' at low resolution, so a more accurate main chain is produced if we place an SST on a skeleton rather than selecting CA guide-points from the skeleton and then generating the main chain with database auto-building tools (Jones & Thirup, 1986, EMBO J. 5, 819-822; Jones et al., 1991, Acta Cryst. A47 110-119.). A secondary structure framework generated by the user can as seeds as seeds for the fully automatic model building tool, or for the interactive tools recommended for low resolution work.
SSTs have local directionality, since they are small peptides or nucleic acid fragments. They are generated and stored, however, in the TRACE molecule without any concern for the relationship of one fragment to another. Indeed they are stashed away in this molecule in any free space that becomes available. The ** SST’s ** blind is used to build and manipulate the TRACE molecule with the interactive tools, while the ** Decorate ** panel contains tools to build a protein trace that follows a skeleton. SSts that have been built by the user can be used as seeds to build a more complete trace following a skeleton.
In the ** SST’s ** blind, the initial fit of individual SSTs to the electron density is a result of a complete 3D rotational search of the template centred on a skeleton atom, and can be further improved with a tool that optimises the fit by making rigid body local shifts to this group of atoms. All SST-related tools are available via the ** SST’s ** blind shown in Figure 3.15.
Figure 3.15 SST commands available in the ** SST’s ** blind. These commands affect the TRACE molecule, and the molecular object that gets generated from it.
Most of the commands in this panel map to a single SST-command, each of which is described in more detail in A-Z of O. However, each SST operation can be undone (to the usual 5 levels of depth). Grab an SST activates the Grab_group command to allow the builder to position a segment, while Optimize SST fit will carry out rigid-body real-space refinement of a segment to an electron density. The resulting TRACE object can be drawn as a poly-alanine (Draw TRACE) or as a Cα object (Draw CA TRACE).Note the difference between flipping an SST (derive a least-square transformation that matches the first CA in the segment to the last, the second to the last but one etc until the last one is matched to the first; the transformation is then applied to the whole segment) and reversing it (placing the last CA of a segment to the first in the segment, last but one to the second etc, then main-chain auto-building the CA segment).
Figure 3.16 Combining multiple SSTs into a single longer fragment. In (a) 2 overlapping α7 templates have been fitted to a skeleton. In (b) one of them has been trimmed back to remove the overlap, and in c/ they have been combined into a single fragment of 9 residues. Although one could have built an initial α9, the helix in c/ is slightly distorted but is a good fit to the density.
If the user wishes to build a long α-helix or if the helix contains a bend or if it contains a portion of 3/10 helix, for example, one can build a number of overlapping SSTs, trim them as needed, and then combine them to make a longer fragment, Figure 3.16
An example of how this blind can be used to assist in producing a better poly-alanine TRACE model will be given in the next section.
3.6 Decorating the TRACE
As the TRACE molecule gets built, whether interactively or automatically (see below), insertion and/or deletion errors (indels) will inevitably occur, most likely in loops and non-secondary structures, and the number of errors is likely to increase at lower resolution. The builder should be aware that even helical structures can be distorted to produce an ‘indel’ if one has generated a perfect α-helix instead of something that should in reality be a little more ‘stretched’. Usually, however, skeletons look stretched because of the limited resolution and it is precisely a perfect α-helix that should get built.
The aim of map interpretation is to build an atomic model of the molecule of interest such that the sequence is correctly positioned in the electron density map. O has a number of tools that can achieve this, but the Decor-commands are recommended as the method of choice. In this process, a main-chain is built first, and this is then used to decide where to place the sequence. The ** Decorate ** blind provides the tools for quickly building a TRACE molecule from a connected skeleton, as well as the tools to evaluate how to thread the sequence onto the TRACE, Figure 3.17.
Figure 3.17 Decorate commands available in the ** Decorate ** panel. These commands can be used to build the TRACE molecule, and to decide how to thread the sequence onto it.
The tools can be run step-by-step or all in one. The following description describes the individual steps that need to be made. Trace the skeleton activates the Decor_step_1 command, which just builds the TRACE along a path through a skeleton. The command requires that the user identifies connected start and end points in a skeleton, and will then generate a poly-alanine TRACE molecule that follows the path along the skeleton. After making the second identification, the user gets to see what the track through the skeleton looks like and has a chance to cancel the command (use Clear_flag). A new skeleton called KAT is actually made that contains the connected atoms between the pair of ID’ed atoms and are colour coded to inidate helices, strands and loops. If the user has already started to build SSTs in the TRACE, he/she is prompted to see if they should be used or not (activate the YES or NO commands, as is appropriate). The N-terminus of the TRACE will be close to the first identified skeleton atom, and the C-terminus close to the second. In Figure 18a, I have started to trace the skeleton by identifying atom 356 and then 835 in the edited S3 molecule. These atoms have only one path that links them, Figure 18b, and this generates a TRACE containing 125 residues, Figure 18c. Both helices in this TRACE are somewhat distorted because of the limited resolution of the study and we would benefit from placing short helical SSTs where appropriate.
Figure 3.18 Trace the skeleton of edited S3 skeleton. In a/ atoms 356 to 835 in the skeleton are connected by just one path, shown in b/ to produce the TRACE molecule of 125 residues in c/. Note the distorted helices on the left of c/.
Figure 3.19 Trace the skeleton of edited S3 skeleton but with an existing TRACE. In a/ two α7 SSTs have been fitted to the skeleton. The same 2 points in S3 were identified and are connected by the same path as in 3.18b, to produce the TRACE molecule of 128 residues shown in b/.
Any pre-existing residues in the TRACE can be used to aid the building process. The user is able to use the full power of the SST panel to build suitable length fragments, to optimise their fit to the density, to flip them, or to merge them into a larger framework etc. They should be built with the directionality implied when identifying the skeleton atoms used for tracing, however. Each fragment will be used to seed the process of generating a single chain that follows the path of the skeleton. In Figure 3.19a, two α7 SSTs has been built onto the S3 skeleton prior to tracing the skeleton and the result is a TRACE of 128 residues that has a pair of regular helices, Figure 3.19b.
Figure 3.20 Building the secondary structure framework from the skeleton path
The algorithm for generating the TRACE first evaluates how well α5 and β5 SSTs fit at each skeleton on the path, Figure 3.20a, and uses the best fitting segments as outlined in Figure 20b to create a secondary structure framework. If SSTs already exist in the TRACE, they can be used as seeds for the segment extension process. If there are no segments defined initially, the best fitting 5-mer starts the process. If the second best fitting segment does not extend the first one, it becomes a potential second seed. Eventually a 5-mer may occur that can be combined with an existing seed so that the seed is extended.
Figure 3.21 Building the connections between segments in the secondary structure framework
The loops between secondary structure segments and at the N- and C-termini are built by following the skeleton path. The degree of branching along the skeleton is calculated to decide if there is a signal associated with carbonyl oxygens or not. In the example shown in Figure 3.21, there are 1.6 peptides on average between each branch and so there is no carbonyl oxygen branching signal. At higher resolution, there would be a branch points for (at least some) carbonyl oxygens as well as for side-chains and they can be assigned by evaluating their separations. Guide points for CA atoms can, therefore, be determined and the loops can then be built from the main-chain database. The secondary structure elements are defined by Yasspa for helices and strands and only these segments will be used in subsequent decorating steps. This is to minimise the potential of indel-errors when we have to decide on the final threading of the actual sequence onto the poly-alanine TRACE.
The TRACE is a molecule and can be manipulated with most of the model-building tools available in O. One could improve the fit to the density by making a rebuilding pass through the molecule (Grab_build), for example. One might decide that an extra residue is needed in a particular loop, for example, in which case one could go back to the SST pane to split the segment at the desired place, then combine them again with the extra insert. Alternatively, one might want to delete a residue (split the segment, trim an end, and combine). A TRACE segment can be regularized, but it cannot be mutated. No real-space optimization to the electron density is made in the Trace the skeleton tool. O’s real-space refinement tools work on rigid bodies, rotamers or a zone of residues. In the latter case, Fm_rsr_zone, each residue is split into a series of groups in the stereo_chemistry.odb dictionary. These groups undergo local rotational and translational shifts to optimize their fit to the density but are prevented to move too far from other groups containing the same atoms. The algorithm works well at medium to high resolution but will distort the model at lower resolution (or with poor quality maps). The user can, of course, decide to optimize the fit, but this has to be done by explicitly activating the Fm_rsr_zone function. It is not essential that you optimize the fit of the TRACE to the density, however, since what concerns us the most is the secondary structure framework of the TRACE. This framework should already fit the density and within each short segment of secondary structure, there should be no residue insertion or deletion error. We will use these parts of the TRACE to decide on how to thread the sequence and assume that any register issues between the sequence and the model arise due to errors in the loops. Provided the TRACE contains a single segment, it is now possible to work on the threading step using the Decorate the TRACE tool. If you have multiple segments in your TRACE, you can activate the separate Decor_guess, Decor_yasspa and Decor_slider commands (see A-Z of O) but I strongly recommend working with a single segment in your TRACE.
Decorate the TRACE activates the Decor_step_2 command activates the allows us to arrive at the optimal way of placing the actual sequence onto the structure of the poly-alanine model. As a first step, we evaluate how well the twenty different amino-acids can be fitted to the density at each position in the poly-alanine chain (Zou & Jones, 1996). Operationally, we only test a subset of the possible twenty (ALA, SER, VAL, LEU, ASP, GLU, ILE, MET, PHE, TRP, HIS, LYS, ARG, PRO) since some amino-acids are very similar in shape and size. All rotamers are generated for each of these amino-acids in turn, and every rotamer is real-space fitted to the electron density, while allowing a local pivot around the Cα atom. To localise the density associated with the side chain, a mask is generated based on the shape and height of the density, within which the real space fit (a correlation coefficient) is calculated. This prevents the erroneous placement of long side-chains in density meant for small ones. The score is saved in the user DB as entry TRACE_RESIDUE_GOF.
The threading of the actual sequence onto the TRACE is made with a dynamic programming algorithm (DPA) that finds the optimal path of the actual sequence through the goodness-of-fit (GOF) data (Jones, 2004). The TRACE is split into the consecutive segments that correspond to the secondary structure elements where it is assumed that no indels are allowed. In the connecting loops, however, register shifts of up to +/- 2 residues are accepted. The full set of tools are more general than those that are accessible from the menu system in Figure 3.21. They allow the user to specify any set of TRACE segments, as well as amino-acid preferences at particular points in the TRACE, for example. Decorate the TRACE combines the general set of tools (Decor_guess, Decor_yasspa and Decor_slider commands) to work on a single TRACE segment, threading at only secondary structure segments. Decorate the skeleton combines even more option into a single command, Decor_easy. As a first step, it builds the TRACE, optimises the fit to the density, evaluates the goodness-of-fit of each side-chain rotamer at each residue in the TRACE (so-called decorating the TRACE), and then threads the sequence onto the TRACE. It, therefore, combines the Trace the skeleton option with the Decorate the TRACE but also includes a real-space fit of the TRACE to the electron density.
Figure 3.22 Decorate the skeleton on the averaged P2 myelin electron density. The skeleton in a/ has been slightly edited to have only one path from bones atom 40 to 157. b/ shows the poly-alanine TRACE that has been generated, and displayed to show the secondary structure elements used in the sequence decoration. The best fitting thread of the sequence is restricted to just these secondary structure elements, in a green 3D text object. Registry differences in this thread, relative to the TRACE structure, are indicated by red ‘*‘s. Possible TRP residues are indicated in cyan.
As a result of the threading, the builder sees the decoration of the TRACE with the optimal placement of the sequence along the secondary structure elements. If the the threading result between adjacent secondary structure elements differs in length from the TRACE model, the discrepancy is indicated by red stars, Figure 3.22. Potential TRP residues are also indicated to assist in identifying lighthouse markers in the structure. The threading will produce errors, however, because of the quality of the experimental map, but even in good maps, the threading can be in error. For example, if two interacting residues have continuous density, the decoration GOF indicators for both residues might indicate reasonable scores for longer residue at both Cα positions. Tools, therefore, are needed to allow the builder to modify how the sequence is threaded onto the TRACE.
Figure 2.23 Modifying a decoration (triangles indicate items mentioned in the text). a/ In the best thread, the first segment of secondary structure is assigned to residues 7-9 in the sequence and strong density is indicated for Trp8. The second segment is assigned to residues 11-13 and gives rise to a registry problem between the TRACE molecule and the sequence thread, as indicated. A second registry problem occurs between the second and third segments. b/ I have decided that the problem is due to an error in the decoration of the second segment. I decide that residue 11 in the TRACE is not residue 11 in the sequence; it should be residue 12 in the sequence. This is accomplished by forcing this assignment by selecting residue 12 in the sequence slider window, activating the Fix a residue in sequence operation and identifying residue 11 in the TRACE. This causes the DPA to be re-run with this constraint and a new thread is generated through the complete TRACE. The fact that the segment has been fixed in the sequence is shown by drawing the decoration of the second segment with white text.
In Figure 23a, the first and third segments are correctly aligned with the sequence, but the decoration of the second segment is out of registry with both. It is up to the builder to make these decisions and to introduce constraints on how the sequence is threaded onto the TRACE. Fixed sequence-segments are indicated by colouring the sequence decoration white instead of green. The builder can, of course, revert to freeing a segment.
At this stage only the suggested threading and associated decoration of the poly-alanine with the suggested sequence alignment is indicated to the builder. In particular, no model is generated until the builder has decided on a particular thread. The builder can pause, save a thread to the O database (ODB) and return to the job later. Once an acceptable thread has been generated, an all atom model will be constructed for the portion of interest. Within the decorated segments, the TRACE backbone is directly copied over into the new model, and side-chain rotamers are fitted to the density. If connecting loops between adjacent segments are in size registry with the TRACE, the same process is used. When these loops differ in length, the main chain is built to follow the skeleton, generating guide Cα atoms as appropriate, with an inter Cα separation that is modified to reflect the number of residues to be added. If the registry difference is too large, the main-chain will be very approximate. Side-chains will be added by fitting rotamers to the density. If the builder decides that a particular connecting loop has a very large registry error with respect to the TRACE, it may be more appropriate to build the structure in two separate passes.