As the structural database continues to expand, new methods are required to analyse and compare protein structures. Whereas the recognition, comparison, and classification of folds is now more or less a solved problem, tools for the study of constellations of small numbers of residues are few and far between. In this paper, two programs are described for the analysis of spatial motifs in protein structures. The first, SPASM, can be used to find the occurrence of a motif consisting of arbitrary main-chain and/or side-chains in a database of protein structures. The program also has a unique capability to carry out "fuzzy pattern matching" with relaxed requirements on the types of some or all of the matching residues. The second program, RIGOR, scans a single protein structure for the occurrence of any of a set of pre-defined motifs from a database. In one application, spatial motif recognition combined with profile analysis enabled the assignment of the structural and functional class of an uncharacterised hypothetical protein in the sequence database. In another application, the occurrence of short left-handed helical segments in protein structures was investigated, and such segments were found to be fairly common. Potential applications of the techniques presented here lie in the analysis of (newly determined) structures, in comparative structural analysis, in the design and engineering of novel functional sites, and in the prediction of structure and function of uncharacterised proteins.
What kinds of spatial arrangements ("3D motifs") could you possibly find "interesting" ? Well, for instance:
Besides this form of motif recognition, one can also envisage working the other way around. In other words, one could construct a large database of motifs that have some well-defined functional and/or structural relevance, and then scan each newly determined protein structure against that database. The results may well provide clues as to the function of the new protein. There is a program for such "inverse" motif recognition (which can be considered to be a 3D equivalent to ProSite motif matching !) in the SPASM suite (RIGOR), but its database is not of sufficient quality (generated by a simple program rather than a human structural biology expert) for it to be very useful at this stage.
In the original JMB paper, the following applications were foreseen:
i.e., described in the JMB paper !] involving CRABP2 demonstrated, there is scope for the use of small structural fragments in the identification of proteins of unknown structure and function. In this case, quite unexpectedly, a hypothetical protein of unknown structure and function was shown to belong to the fatty-acid-binding protein family, based on structural and profile analysis of a stretch of only 13 residues.
- analysis of newly determined protein structures. Often in a newly solved structure, one observes a local main-chain conformation, or an arrangement of side chains that may seem odd or unusual. Spatial motif recognition can be used to rapidly answer questions such as "is this a unique loop conformation ?", "in what other structures does a similar constellation of residues occur ?", etc.
- comparative structural analysis. Scientists interested in aspects of protein structure between the levels of domain folds and individual residues will be interested in identifying all proteins that contain a certain arrangement of helices, strands, turns, and loops, or in all proteins that contain a certain constellation of residues or side chains (e.g., metal-binding sites, anion-binding sites, left-handed helices). Spatial motif recognition methods can be employed to quickly enumerate all such occurrences.
- design and engineering. A search of the database with a motif consisting of certain metal-binding residues, with relaxed criteria on the exact nature of one or more of the residues, may identify other proteins in which one or a few mutations might suffice to introduce a novel binding site. Similar searches could be carried out with residues that bind anions, co-factors, ligands, or drugs. Some dedicated programs for this task exist, e.g. for the case of tetrahedral metal-binding sites, but the approach presented here is completely general, and therefore has wider applicability, although it may be less sensitive.
- prediction of structure and function. As the example [
Latest update at 6 February, 2006.