NEWS FROM THE UPPSALA SOFTWARE FACTORY - 8
Les amis d'O
Gerard J. Kleywegt
Department of Molecular Biology
Biomedical Centre, Uppsala University
Uppsala - Sweden
In this article we shall take a closer look at three of the lesser-known
utility programs from Uppsala that work in conjunction with
SOD  stands for "Sequences to O
Datablocks", i.e. it is a program that converts sequences
into information that can be used in or by O. The
program can read individual and aligned sequences in a number of formats.
It can be used to do the following:
- to generate an O
datablock of the sequence of your protein (task INIT). This is a quick way to get
this information into O
when you are about to assign the sequence to your Ca trace. The datablock produced by SOD
can be used with the sam_init_db
command in O
to initialise the data structures for a new protein molecule.
- to generate O
macros with which one can quickly build a homology model or, more interestingly,
a molecular replacement search model (task HOMO). In order to do this, SOD requires
two aligned sequences, one of a protein whose structure is known (and available -
which is not always the same, unfortunately), and one of a related protein for which you
want to generate a model. SOD will use the aligned sequences to generate an O
macro which contains mostly Mutate
, and mutate_replace
). In the case of homology modelling, all residues that differ in the two sequences
will be replaced by the residue type of the protein for which one wants to build
a model, and deletions and insertions are included (although they will have to be
modelled by the user). If one generates a molecular replacement search probe, on the other
hand, residues that differ between the two sequences will be replaced by alanines,
deletions are carried out, but insertions will not be made. (Other tools for generating
molecular replacement search models were discussed in a previous episode in this series,
available at URL:
- to do a pairwise comparison of one sequence with one or more others (task PAIR).
For each comparison an O
datablock will be generated which contains an integer code for every residue: 0 =
conserved residue, 1 = mutation, 2 = insertion in other sequence, 3 = deletion in
other sequence, 4 = outside other sequence. This datablock can be used to colour
the molecule, e.g.
using the paint_case
command; a Ca-trace will then reveal where mutations, insertions and deletions occur in the 3D
- to analyse multiple aligned sequences (task MULT). This option is useful to present
information regarding sequence conservation in a large family of related proteins
(provided the structure of one of them is available). The following O
datablocks will be produced (assuming that the molecule is called "M1" in O
Simply reading the datablock file into O
and executing the macro will produce the three graphics objects.
- - M1_RESIDUE_POSSIBLE - listing all residue types encountered for every residue;
- - M1_RESIDUE_CONSERVED - degree of conservation (%) of each residue type in the sequence;
- - M1_RESIDUE_VARIATION - a count of the number of different residue types observed
at each position;
- - .ID_SOD - a temporary .id_template
showing all of the above properties when you click on an atom;
- - @M1_SOD - a macro to produce three objects from your molecule: CONS (Ca-trace
colour-ramped by M1_RESIDUE_CONSERVED), VARI (Ca-trace colour-ramped by M1_RESIDUE_VARIATION),
and GRAD (Ca-trace coloured in steps according to M1_RESIDUE_CONSERVED).
One of the useful features in O
is its use of datablocks (of type real, integer, character or text) to represent
information pertaining to a molecule as a whole, to each of its residues, or to each
of its atoms 
. Although O
contains a number of commands to manipulate datablocks, a separate utility program
(ODBMAN, for "O
DataBlock MANipulation" 
) is also available. Its options (besides trivial I/O-related ones) fall into the
- extracting information from other sources (EXtract commands). With these commands,
information can be extracted from formatted files and stored as real or integer datablocks.
The input can either be formatted, or field oriented (i.e.
, containing fields separated by tabs or spaces). A separate option is available
to extract information from a ProCheck 
output file (residue type and name, secondary structure assignment according to the
DSSP algorithm, area of the Ramachandran plot in which each residue resides, the
number of bad contacts for each residue, and its H-bond energy).
- manipulating individual entries of a datablock (SEt commands). This includes options
to set all entries of a datablock to a particular value, to set a consecutive stretch
of entries to a particular value, to set a consecutive stretch of entries individually, and
to "translate" information from other datablocks (e.g.
, to generate an integer datablock representing secondary structure from a character
datablock). Using these options, it is not too difficult to colour a protein structure
in the colours of the Dutch flag, for instance.
- manipulating entire datablocks. This include options to do simple arithmetic on
integer and real datablocks, to smoothen datablocks, and to modify character datablocks.
- analysis of datablocks. Options are available to list some statistics and to produce
histograms of individual datablocks, and to produce line plots and scatter plots
of individual datablocks or of one datablock versus
Many of the Uppsala programs (e.g.
, ODBMAN, MOLEMAN2, LSQMAN, DATAMAN, MAPMAN) produce (ASCII) plot files in a meta-format.
is a simple program to convert such plot files into other formats. Usually, this
program will be used to convert plot files into PostScript files, but the program
can also produce tab-delimited ASCII files, which can be read by most popular spreadsheet
and graphing programs on the market, and hence used to produce more professional-looking
graphs. The SGI version of this program in addition allows the user to plot data
interactively in graphics windows. O2D can produce line and scatter plots, histograms and
simple pie charts of 1D data, and contour plots of 2D data. In interactive mode,
there are also simple facilities for integrating curves or contour plots, and to
manipulate the display. The (ASCII) meta-format for both 1D and 2D plots is simple,
using six-character keywords (see the manual for details). There is also a C-shell script
available which will do a batch conversion of many plot files to PostScript.
SOD, ODBMAN, and O2D are part of the X-UTIL package, which is available free of charge
to academic users from
ftp://xray.bmc.uu.se/pub/gerard/xutil/. Commercial users may
contact GJK for more information (firstname.lastname@example.org
). For more information about O
, contact Alwyn Jones (mailto:email@example.com
). The O WWW site is at http://imsb.au.dk/~mok/o/
, and the Uppsala Software Factory can be found at http://xray.bmc.uu.se/usf/
Jones, T.A., Zou, J.Y., Cowan, S.W. and Kjeldgaard, M. (1991). Improved methods
for building protein models in electron density maps and the location of errors in
these models. Acta Crystallogr.
The manual for this program is available at URL: http://xray.bmc.uu.se/usf/sod_man.html
Jones, T.A. and Kjeldgaard, M. (1997). Electron density map interpretation. Meth. Enzymol.
, in press.
The manual for this program is available at URL: http://xray.bmc.uu.se/usf/odbman_man.html
Laskowski, R.A., MacArthur, M.W., Moss, D.S. and Thornton, J.M. (1993). PROCHECK:
a program to check the stereochemical quality of protein structures. J. Appl. Cryst.
The manual for this program is available at URL: http://xray.bmc.uu.se/usf/o2d_man.html
Latest update at 12 February, 1998.