USF

NEWS FROM THE UPPSALA SOFTWARE FACTORY - 9

Déjà-vu all over again

Gerard J. Kleywegt
Department of Molecular Biology
Biomedical Centre, Uppsala University
Uppsala - Sweden


While building a protein model into electron density, one often comes across features of the model that make one wonder: "(where) have I seen this before ? ". At the level of the overall fold, there is plenty of software available nowadays that can help answer this question (DALI, DEJAVU, TOP, etc. ). But when it comes to recognising smaller "motifs" (e.g. , a set of residues involved in binding a ligand or metal ion, or with seemingly "unusual" side chain-side chain interactions), answering the question "has this been observed in any other protein structure ? " is not as simple.

At the 1995 CCP4 meeting, Peter Artymiuk described a program called ASSAM [1] [2] that could recognise spatial arrangements of side chains by comparing them to a database of protein structures. This provided the inspiration for the SPASM package [3] [4] [5] that contains programs for the recognition of arbitrary patterns or motifs in protein structures, interfaced with O [6] and other programs.

SPASM

SPASM is a program that can be used to recognise user-defined motifs in a database of protein structures (derived from the PDB). The user merely has to carve out those residues that (s)he is interested in (e.g., catalytic residues, a strange loop, ligand-binding residues, a weird Met-Trp interaction, a helix-turn-helix motif, etc. etc.; whatever is selected will be referred to as a "motif" from now on) and put them into a small PDB file. The program will read this file as well as its database, will prompt for values for a few parameters (the default values will do in most cases), and will subsequently find all instances of the motif in the proteins that are in the database. (The nitty-gritty and some of the bells and whistles are discussed in [5] and [6] .)

Besides simply listing the "hits", SPASM can also generate a macro file for use with O which, when executed, will automatically read the hits, apply the rotation-translation operator that superimposes the hits with the user's motif, and draw the hits. Thus, within five to ten minutes one obtains a visual answer to the original question: "(where) has this motif been observed previously ? ".

If you find hits that display similarity to your own protein that extend beyond the matched motif (e.g., similar fold or domain), global superpositioning of the hits and your own model can be carried out by LSQMAN. An input file for LSQMAN that does this can be generated by SPASM as well, making this a very rapid process. Finally, an interface exists to the SBIN package of programs [4] [7] , that can be used to analyse superimposed structures to find similarities in their sequences. These, in turn, can be used to attempt "database mining" in sequence databases such as SWISS-PROT [8] , in the hope of identifying other proteins that might have the same fold, or share a common domain.

RIGOR

RIGOR is another program in the SPASM package that does in essence the opposite of SPASM. Where SPASM compares a user-defined motif to a database of protein structures, RIGOR looks for instances of a large number of predefined motifs in the user's model. Of course, the utility of this approach depends critically on the quality of the database. At present, it contains a few hand-crafted motifs, but the overwhelming majority has been generated automatically. These automatically generated motifs were extracted from proteins in the SPASM database, and consist mostly of sets of residues whose side chains cluster in space, or are all in close proximity to a hetero-entity. Just like SPASM, RIGOR is interfaced to O allowing for rapid visualisation of the results. Users are welcome to submit additional motifs for inclusion in future releases of the RIGOR database. Eventually, I hope to develop software that takes a more intelligent approach to detecting motifs that recur in several or many structures.

APPLICATIONS

Obviously, the SPASM package can be tremendously useful in the analysis of newly determined protein structures. The programs help crystallographers to make the most of their models, prior to publication and deposition. After all, nobody likes to see papers in which professional database scrutinisers (for want of a better word) announce that they have found an unexpected similarity between one's own protein (the structure determination of which may have taken you years) and some other protein that had been in the database for years.

In addition, SPASM can be used in comparative structural analysis, where one will typically be interested in finding all proteins that contain a certain arrangement of helices, strands, turns, and loops, or in all proteins that contain a certain constellation of residues or side chains. Other potential applications lie in the areas of protein design and engineering, and prediction of structure and function.

AVAILABILITY

The SPASM package contains the programs SPASM and RIGOR, as well as two programs to generate private databases for use with these programs (e.g. , with in-house structures that have not yet been released by the PDB). SPASM and friends (including databases and manuals) are available free of charge to academic users from ftp://xray.bmc.uu.se/pub/gerard/spasm/ . Commercial users may contact GJK for more information (mailto:gerard@xray.bmc.uu.se ). For more information about O , contact Alwyn Jones (mailto:alwyn@xray.bmc.uu.se ). The O WWW site is at http://imsb.au.dk/~mok/o/ , and the Uppsala Software Factory can be found at http://xray.bmc.uu.se/usf/ .


REFERENCES

[1] Artymiuk, P.J., Poirrette, A.R., Grindley, H.M., Rice, D.W. and Willett, P. (1994). A graph-theoretic approach to the identification of three-dimensional patterns of amino acid side-chains in protein structures. J. Mol. Biol. 243 , 327-344.
[2] Artymiuk, P.J., Poirrette, A.R., Rice, D.W. and Willett, P. (1995). Comparison of protein folds and sidechain clusters using algorithms from graph theory. In "From First Map to Final Model" (Bailey, S., Hubbard, R. and Waller, D.A., Eds.), pp. 71-81, SERC Daresbury Laboratory, Daresbury, U.K.
[3] Kleywegt, G.J. and Jones, T.A. (1998). Databases in protein crystallography. Acta Cryst. D54 , in press. (A preprint of this paper is available at URL: http://xray.bmc.uu.se/gerard/papers/databases.html. )
[4] Kleywegt, G.J. (1998). Recognition of spatial motifs in protein structures. Submitted.
[5] The manuals for the SPASM programs are available at URL: http://xray.bmc.uu.se/usf/spasm.html.
[6] Jones, T.A., Zou, J.Y., Cowan, S.W. and Kjeldgaard, M. (1991). Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Crystallogr. A47 , 110-119.
[7] The manuals for the SBIN programs are available at URL: http://xray.bmc.uu.se/usf/sbin.html.
[8] Bairoch, A. and Apweiler, R. (1997). The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucl. Acids Res. 25 , 31-36.

USF Latest update at 5 June, 1998.