NEWS FROM THE UPPSALA SOFTWARE FACTORY - 9
Déjà-vu all over again
Gerard J. Kleywegt
Department of Molecular Biology
Biomedical Centre, Uppsala University
Uppsala - Sweden
While building a protein model into electron density, one often comes across features
of the model that make one wonder: "(where) have I seen this before ?
". At the level of the overall fold, there is plenty of software available nowadays
that can help answer this question (DALI, DEJAVU, TOP, etc.
). But when it comes to recognising smaller "motifs" (e.g.
, a set of residues involved in binding a ligand or metal ion, or with seemingly "unusual"
side chain-side chain interactions), answering the question "has this been observed in any other protein structure ?
" is not as simple.
At the 1995 CCP4 meeting, Peter Artymiuk described a program called ASSAM  
that could recognise spatial arrangements of side chains by comparing them to a database
of protein structures. This provided the inspiration for the SPASM package   
that contains programs for the recognition of arbitrary patterns or motifs in protein
structures, interfaced with O 
and other programs.
SPASM is a program that can be used to recognise user-defined motifs in a database
of protein structures (derived from the PDB). The user merely has to carve out those
residues that (s)he is interested in (e.g., catalytic residues, a strange loop, ligand-binding residues, a weird Met-Trp interaction, a helix-turn-helix motif, etc. etc.;
whatever is selected will be referred to as a "motif" from now on) and put them
into a small PDB file. The program will read this file as well as its database,
will prompt for values for a few parameters (the default values will do in most cases), and
will subsequently find all instances of the motif in the proteins that are in the
database. (The nitty-gritty and some of the bells and whistles are discussed in
Besides simply listing the "hits", SPASM can also generate a macro file for use with
O which, when executed, will automatically read the hits, apply the rotation-translation
operator that superimposes the hits with the user's motif, and draw the hits. Thus, within five to ten minutes one obtains a visual answer to the original question:
"(where) has this motif been observed previously ?
If you find hits that display similarity to your own protein that extend beyond the
matched motif (e.g., similar fold or domain), global superpositioning of the hits
and your own model can be carried out by LSQMAN. An input file for LSQMAN that does
this can be generated by SPASM as well, making this a very rapid process. Finally, an
interface exists to the SBIN package of programs  
, that can be used to analyse superimposed structures to find similarities in their
sequences. These, in turn, can be used to attempt "database mining" in sequence
databases such as SWISS-PROT 
, in the hope of identifying other proteins that might have the same fold, or share
a common domain.
RIGOR is another program in the SPASM package that does in essence the opposite of
SPASM. Where SPASM compares a user-defined motif to a database of protein structures,
RIGOR looks for instances of a large number of predefined motifs in the user's model.
Of course, the utility of this approach depends critically on the quality of the
database. At present, it contains a few hand-crafted motifs, but the overwhelming
majority has been generated automatically. These automatically generated motifs
were extracted from proteins in the SPASM database, and consist mostly of sets of residues whose
side chains cluster in space, or are all in close proximity to a hetero-entity.
Just like SPASM, RIGOR is interfaced to O allowing for rapid visualisation of the
results. Users are welcome to submit additional motifs for inclusion in future releases
of the RIGOR database. Eventually, I hope to develop software that takes a more
intelligent approach to detecting motifs that recur in several or many structures.
Obviously, the SPASM package can be tremendously useful in the analysis of newly determined
protein structures. The programs help crystallographers to make the most of their
models, prior to publication and deposition. After all, nobody likes to see papers in which professional database scrutinisers (for want of a better word) announce
that they have found an unexpected similarity between one's own protein (the structure
determination of which may have taken you years) and some other protein that had
been in the database for years.
In addition, SPASM can be used in comparative structural analysis, where one will
typically be interested in finding all proteins that contain a certain arrangement
of helices, strands, turns, and loops, or in all proteins that contain a certain
constellation of residues or side chains. Other potential applications lie in the areas of
protein design and engineering, and prediction of structure and function.
The SPASM package contains the programs SPASM and RIGOR, as well as two programs to
generate private databases for use with these programs (e.g.
, with in-house structures that have not yet been released by the PDB). SPASM and
friends (including databases and manuals) are available free of charge to academic
users from ftp://xray.bmc.uu.se/pub/gerard/spasm/
. Commercial users may contact GJK for more information (mailto:email@example.com
). For more information about O
, contact Alwyn Jones (mailto:firstname.lastname@example.org
). The O WWW site is at http://imsb.au.dk/~mok/o/
, and the Uppsala Software Factory can be found at http://xray.bmc.uu.se/usf/
Artymiuk, P.J., Poirrette, A.R., Grindley, H.M., Rice, D.W. and Willett, P. (1994).
A graph-theoretic approach to the identification of three-dimensional patterns of
amino acid side-chains in protein structures. J. Mol. Biol.
Artymiuk, P.J., Poirrette, A.R., Rice, D.W. and Willett, P. (1995). Comparison of
protein folds and sidechain clusters using algorithms from graph theory. In
"From First Map to Final Model" (Bailey, S., Hubbard, R. and Waller, D.A., Eds.),
pp. 71-81, SERC Daresbury Laboratory, Daresbury, U.K.
Kleywegt, G.J. and Jones, T.A. (1998). Databases in protein crystallography. Acta Cryst.
, in press. (A preprint of this paper is available at URL:
Kleywegt, G.J. (1998). Recognition of spatial motifs in protein structures. Submitted.
The manuals for the SPASM programs are available at URL:
Jones, T.A., Zou, J.Y., Cowan, S.W. and Kjeldgaard, M. (1991). Improved methods
for building protein models in electron density maps and the location of errors in
these models. Acta Crystallogr.
The manuals for the SBIN programs are available at URL:
Bairoch, A. and Apweiler, R. (1997). The SWISS-PROT protein sequence data bank and
its supplement TrEMBL. Nucl. Acids Res.
Latest update at 5 June, 1998.