Department of Molecular Biology

Biomedical Centre, Uppsala University

Uppsala - Sweden

*
*

We have written a program called

The simplest task, superimposing molecules given two sets of atoms which should be matched, is easily accomplished

`
LSQMAN > ex m1 a1-999 m1 b1
WARNING - mol1 == mol2 !
Explicit fit of M1 A1-999
And M1 B1
Atom types |NONH|
Nr of atoms to match : ( 3499)
The 3499 atoms have an RMS distance of 2.311 A
RMS delta B = 7.802 A2
Corr. coeff. = 0.9031
Rotation : 0.382393 -0.058393 0.922153
-0.033219 -0.998225 -0.049435
0.923402 -0.011729 -0.383654
Translation : 5.715 16.617 -8.061
`

Note that,
apart from the RMS distance of the atoms after superpositioning,
the RMS DB and the linear correlation coefficient of
the temperature factors of the matched atoms are calculated
as well. In the case of NCS-related molecules, and
that of very similar molecules, one would expect RMS
DB to be of the order of ~3-5 Å2, and the correlation
coefficient to be greater than ~0.95.

Note that **LSQMAN** cannot automatically detect the optimal
alignment of two molecules as some other programs do
**[6]**. Usually, sets of matching atoms are either trivial
to define (*e.g.*, NCS-related molecules, mutants, complexes),
or non-trivial. In the latter case, we use **DEJAVU
[1]** first to carry out a rough alignment of the secondary-structure
elements of the protein of interest and all other proteins
in the PDB that appear to show structural similarities.
The rough alignments are then improved with **LSQMAN**.

**
* IMPROVING OPERATORS**

Optimal alignment of structures with low sequence homology
is somewhat arbitrary, since "optimal" involves
both the number of structurally equivalent residues,
and their RMS distance after alignment. **LSQMAN** uses
a similar operator-improvement algorithm as that employed
by **O [2, 3]**, *i.e.*: using an initial operator, consecutive
fragments of residues (using their Ca atoms, for example)
are located whose length exceeds a certain minimum
number of residues, and whose distance to the corresponding
atoms is less than a certain cut-off. These fragments
are used to calculate a new, explicit operator, and
the process is iterated until it converges. Note that
this algorithm is insensitive to sequence gaps so that
it can be used both to find the best-conserved fragments
in similar molecules, and to find the common core of
two completely different molecules. The implementation
in **LSQMAN** contains some extra "embellishments":

* a sequentiality constraint (optional). If two proteins
have a common motif with the same topology, this is
a useful constraint; on the other hand, if two structures
contain similar arrangements of helices and strands,
but in a different order in their sequences, this constraint
would be switched off.

* the two cut-offs (minimum number of consecutive residues
in matched fragments, and maximum distance between
equivalenced atoms) can either be kept fixed, or allowed
to "decay". For example, one could start
with a distance cut-off of 4 Å to get the overall
operator relating the two molecules, and then multiply
this cut-off by a factor of 0.95 in every iteration
to "zoom in" on the structurally most similar
core fragments of the two.

* the optimisation criterion can be selected by the
user. At present, **LSQMAN** can optimise: (1) the number
of matched residues (maximise); (2) the RMS distance
of the matched residues (minimise); (3) the Similarity
Index (SI; minimise); or (4) the Match Index (MI; maximise).
The Similarity Index is defined as:

**
**

where: N1,2 = number of residues in molecule 1 and 2, Nm = number of matched residues, and RMSD = their RMS distance. SI assumes values >= 0.0 Å; the lower the value of SI, the better the fit and the more similar the two molecules are. The Match Index is defined as:RMSD * min(N1,N2) SI = --------------------- Nm

where W is positive weight (the higher the weight, the bigger the influence of the RMSD on the value of MI; suggested values for W are between 0.1 and 1). MI assumes values between 0 and 1, where "0" indicates a "perfect mis-match" and "1" a perfect match.(1 + Nm) MI = -------------------------------------- (1 + W * RMSD ) * (1 + min(N1,N2))

After the operator improvement has converged (or a maximum number of cycles has been carried out), the structure-based sequence alignment is printed. The matched residues are shown, along with the distance of the atoms that were used (usually, Ca atoms). If two residues are of the same type, an asterisk is printed as well. Also, some statistics pertaining the number and percentage of matched and conserved residues are printed. An example:

`
Found fragment of length : ( 53)
Found fragment of length : ( 260)
Found fragment of length : ( 57)
Found fragment of length : ( 59) `

`
Cycle : ( 10)
Distance cut-off (A) : ( 3.800)
Min fragment length (res) : ( 5)
The 428 atoms have an RMS distance of 0.946 A
SI = RMS * Nmin / Nmatch = 1.01260
MI = (1+Nmatch)/(1+W*RMS)*(1+Nmin) = 0.48022
RMS delta B for matched atoms = 7.610 A2
Corr. coefficient matched atom Bs = 0.908
Rotation : 0.38169697 -0.06605943 0.92192382
-0.04122496 -0.99766684 -0.05441866
0.92336768 -0.01723484 -0.38352972
Translation : 5.7764 17.2442 -8.0352`

`
Fragment SER-A 4 <===> SER-B 4 @ 2.43 A *
SER-A 5 <===> SER-B 5 @ 1.11 A *
ARG-A 6 <===> ARG-B 6 @ 1.19 A *
TYR-A 7 <===> TYR-B 7 @ 0.40 A *
VAL-A 8 <===> VAL-B 8 @ 0.49 A *
ASN-A 9 <===> ASN-B 9 @ 0.21 A *
LEU-A 10 <===> LEU-B 10 @ 0.90 A *
[...]
GLY-A 456 <===> GLY-B 456 @ 3.40 A *
VAL-A 457 <===> VAL-B 457 @ 3.68 A *`

`
Nr of residues in mol1 : ( 459)
Nr of residues in mol2 : ( 458)
Nr of matched residues : ( 428)
Nr of identical residues : ( 428)
% identical of matched : ( 100.000)
% matched of mol1 : ( 93.246)
% identical of mol1 : ( 93.246)
% matched of mol2 : ( 93.450)
% identical of mol2 : ( 93.450)
`

Statistics can be obtained with the SHow_operator command:

`
The 428 atoms have an RMS distance of 0.946 A
SI = RMS * Nmin / Nmatch = 1.01260
MI = (1+Nmatch)/(1+W*RMS)*(1+Nmin) = 0.48022
RMS delta B for matched atoms = 7.610 A2
Corr. coefficient matched atom Bs = 0.908
[...]
NCSOP 1 = 0.3816970 -0.0412250 0.9233677
5.776
-0.0660594 -0.9976668 -0.0172348
17.244
0.9219238 -0.0544187 -0.3835297
-8.035
Determinant of rotation matrix = 1.000000`

`
Crowther Alpha Beta Gamma 178.93069 -112.55250
3.37809
Spherical polars Omega Phi Chi 123.71790 177.77631
178.71825
Direction cosines of rotation axis -0.83114
0.03227 -0.55510
Dave Smith -2.57299 -22.79103
-173.83571
Rotation angle = 178.718246`

`
*POOR* - NCS not restrained
*POOR* - NCS Bs not restrained`

`
`
*** OTHER FEATURES**

After operator improvement, for example using Ca atoms,
the RMS distance of any set of atoms can be calculated
with the RMsd_calc command. Operators can be stored
as, or read from **O** datablock files; they can be edited,
and they can be applied to a molecule, for example
for display purposes. In addition, an **O** macro can
be generated automatically which will read the appropriate
PDB files, apply the current operator(s), and display
the Ca traces of the superimposed molecules.**
LSQMAN** was originally written as a fast filter between

Finally, an interesting way of analysing differences between similar molecules is provided by the option to produce a DPHI/DPSI plot (essentially, a "difference Balasubramanian plot"), as suggested by Korn and Rose

**
* AVAILABILITY****
LSQMAN** is one in a series of "

**
* REFERENCES****
[1]** G.J. Kleywegt & T.A. Jones, in "From First
Map to Final Model" (S. Bailey, R. Hubbard &
D. Waller, Eds.), SERC Daresbury Laboratory (1994),
pp. 59-66.

[2]

[3]

[4]

[5]

[6]

[7]

[8]

*
*