Dictionaries for Heteros

Gerard J. Kleywegt
Department of Molecular Biology
Biomedical Centre, Uppsala University
Uppsala - Sweden

The most frequently-asked question on the X-PLOR and, to a lesser extent, the O-info mailing lists is probably: "Could somebody send me topology/parameter/dictionary files for compound X ?", where compound X is some hetero entity. In general, generating such dictionary files is a cumbersome, time-consuming and (as we have found out more than once) error-prone undertaking. In order to simplify the process, we have collected a large set of hetero-entities from the January 1995 release of the Protein Data Bank (PDB), and implemented some tools which automate much of the dictionary-generation process (and -from our own experience- drastically reduce the number of errors in the resulting dictionaries).

In our opinion, the best place to start looking for coordinates for a hetero-compound is the CSD small-molecule crystallographic database. If such a search yields no useful clues, the next-best thing is to check if anyone else has previously used the compound in a macromolecular refinement. In that case, the PDB is the most suitable place to look. In order to make the search as simple as possible, we have written a script which automatically scans the PDB and finds all unique hetero-compound names. This list is subsequently edited to remove obvious duplicates, and the edited list is fed into a program. This program scans all PDB entries again, in order of decreasing resolution. As soon as one of the hetero-compounds from the list is encountered, the coordinates are stored. Once the scan is complete, every hetero-compound is translated to put its centre-of-gravity at the origin, all occupancies are set to 1.0 and all temperature factors to 20.0 Å2. The coordinates and some more information (PDB entry from which it was taken, resolution, a list of other PDB files which contain the same compound, etc.) are then written to one large file. Our present collection (generated using the January 1995 release of the PDB) contains more than 700 (mostly unique) hetero-compounds. As an example, the entry for all-trans-retinoic acid (which we shall use here throughout) looks as follows:

REMARK Extracted from PDB file 1fem.pdb
REMARK Formula C20 H28 O2
REMARK Nr of non-hydrogen atoms 22
REMARK Residue type REA
REMARK Residue name 621
REMARK   2 RESOLUTION. 1.9  ANGSTROMS.                                  1FEM  36
REMARK Compound also present in : 1EPB 1CBR
HETATM    1  C1  REA   621      -3.034   1.835  -2.850  1.00 20.00      1FEM
HETATM    2  C2  REA   621      -3.924   1.728  -4.090  1.00 20.00      1FEM
HETATM 20 C20 REA 621 2.020 -1.190 4.576 1.00 20.00 1FEM HETATM 21 O1 REA 621 6.229 -1.851 4.290 1.00 20.00 1FEM HETATM 22 O2 REA 621 4.389 -0.721 5.672 1.00 20.00 1FEM

Using an editor, or a Unix tool as simple as grep, one can quickly find out if the compound one is looking for occurs in the file. If the compound is new to crystallography, one may have to resort to other methods to come up with a set of coordinates (e.g., using quantum-chemical or molecular mechanics calculations, or "mutating" a similar compound).

One of our utility programs, XPLO2D, contains an option which can be used to generate appropriate dictionaries for X-PLOR. Given a PDB file containing the coordinates of a hetero-compound, it generates four new files:

* a topology file (defining atom types, masses, etc., bonds, impropers [chiral carbons, flat groups and bonds], possible dihedrals, hydrogen-bond acceptors and possible donors). This file usually needs to be edited, for instance to add charges and the masses of implicit hydrogen atoms. For all-trans-retinoic acid this file looks as follows:

Remarks Created by XPLO2D V. 950419/1.2.3 at Fri May 5 22:24:52 1995 for user
Remarks Auto-generated by XPLO2D from file ./rea.pdb
Remarks You *MUST* check/edit MASSes and CHARges !!!
Remarks Check DONOrs and ACCEptors
Remarks Verify IMPRopers yourself
Remarks UNcomment any DIHEdrals you want to use

set echo=false end

edit masses yourself !!!
MASS CX1 12.01100 ! ADD 1.008 for each H
MASS OX10 15.99940 ! ADD 1.008 for each H

autogenerate angles=true end




edit these DIHEdrals if necessary
! DIHEdral C17 C1 C2 C3 ! fixed dihedral ??? 94.80
! DIHEdral C12 C13 C14 C15 ! fixed dihedral ??? -179.98

edit these IMPRopers if necessary
IMPRoper C1 C2 C6 C16 ! chirality or flatness improper 49.73
IMPRoper C15 C14 O1 O2 ! chirality or flatness improper 5.01

edit these DONOrs and ACCEptors if necessary
! DONOr H? O1 ! only true if -OHx (x>0)
ACCEptor O1 C15


* a parameter file (defining target values and force constants for bonds, etc.). The target values are simply the averages of the observed values. The force constants are set to the same value for all bonds, angles and impropers (the defaults being in the same ball-park as those of the Engh & Huber force field). For example:

Remarks rea.par
Remarks Created by XPLO2D V. 950419/1.2.3 at Fri May 5 22:24:52 1995 for user
Remarks Auto-generated by XPLO2D from file ./rea.pdb
Remarks Parameters for residue type REA

set echo=false end

edit if necessary
BOND CX1 CX2 1000.0 1.530 ! Nobs = 1
BOND CX7 OX10 1000.0 1.390 ! Nobs = 1

edit if necessary
ANGLe CX2 CX1 CX4 500.0 112.79 ! Nobs = 1
ANGLe CX1 CX3 CX2 500.0 54.52 ! Nobs = 1

insert DIHEdrals yourself if necessary
suggested weight = 300.0 for Engh & Huber compatibility

edit if necessary
IMPRoper CX7 CX3 OX9 OX10 750.0 0 5.007 ! Nobs = 1

edit if necessary
NONBonded CX1 0.1200 3.7418 0.1000 3.3854
NONBonded OX10 0.1591 2.8509 0.1591 2.8509

set echo=true end

* an X-PLOR input file which, when executed, will energy-minimise the structure of the compound, and print a list of violations afterwards. This should always be done prior to inclusion of the compound into the crystallographic refinement process, since the resulting structure reveals what X-PLOR will try to make the compound look like once it is included. If, for instance, a dihedral angle was given a target value of 180(o) whereas it should have been 0(o), this will show up immediately after the energy minimisation. Hence, this is a quick and easy way to prevent the frustration of finding that X-PLOR has "ruined" your compound after a 4,000 K slow-cool which took two weeks to run ... The input file may look as follows:

Remarks rea_min.inp
Remarks Created by XPLO2D V. 950419/1.2.3 at Fri May 5 22:24:53 1995 for user
Remarks Auto-generated by XPLO2D from file ./rea.pdb
Remarks Energy-minimisation input file for residue type REA


atom cdie shift eps=8.0 e14fac=0.4
cutnb=7.5 ctonnb=6.0 ctofnb=6.5
nbxmod=5 vswitch

segment name=1FEM
coordinates @./rea.pdb_clean
coordinates @./rea.pdb_clean

minimise powell
nstep=250 drop=40.0

write coordinates output=rea_min.pdb end

vector ident (store9) (not hydrogen)
constraints interaction (store9) (store9) end

print threshold=0.02 bonds
print threshold=3.0 angles
print threshold=10.0 dihedrals
print threshold=3.0 impropers


* a "clean" PDB file suitable for use by X-PLOR in the energy-minimisation procedure.

Once a set of coordinates has been obtained for a hetero-compound, it can be read into O and moved into place with the Move_zone command. Subsequently, the Tor_general command can often be used to adjust some of the free torsion angles. Finally, the RSR_rigid command can be invoked to optimise the fit of the compound to the density with real-space rigid-body refinement.
Another one of our utility programs (MOLEMAN) can be used to generate four of the five types of dictionary file that may be needed for the display and manipulation of a hetero-compound inside O. The only dictionary that cannot be generated in this fashion is that required for regularisation. On the other hand, regularisation can be done rapidly in X-PLOR, and if one uses sensible manipulation commands in O (e.g., Move_zone, Tor_residue, RSR_rigid, but not Move_atom) it will rarely be necessary to regularise a hetero-compound. The four types of dictionary that can be generated automatically involve:

* Connectivity. In order for O to draw the correct bonds (e.g., no bonds between hydrogen atoms), a connectivity entry is sometimes needed (although in most cases the defaults in O will do the job). For retinoic acid, such an (automatically generated) entry would look as follows:

ATOM C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
ATOM C11 C12 C13 C14 C15 C16 C17 C18 C19 C20
CONNECT - C1 C2 C3 C4 C5 C6 C1 C16 C2 +
CONNECT C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 O1

* Real-space fit. To include a compound in real-space fit calculations, a list of all its atoms can be provided as an O datablock. For instance:

rsfit_REA T 2 70
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14
C15 C16 C17 C18 C19 C20 O1 O2

* Real-space refinement. A similar datablock is needed to include the compound in some real-space refinement calculations (RSR_zone):

RSR_dict_REA T 2 70
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14
C15 C16 C17 C18 C19 C20 O1 O2

* Torsions. Some torsion-angle manipulations in O require that the angles and the affected atoms are defined. For instance:

TORSION TOR1 -108. C1 C6 C7 C8 C8 C9 C10 C11 C12 C13 C14 C15 C19 C20 \
O1 O2
TORSION TOR2 -108. C8 C7 C6 C1 C1 C2 C3 C4 C5 C16 C17 C18
TORSION TOR3 6. C19 C9 C10 C11 C11 C12 C13 C14 C15 C20 O1 O2
TORSION TOR4 6. C11 C10 C9 C19 C1 C2 C3 C4 C5 C6 C7 C8 C16 C17 C18 \
TORSION TOR5 141. C13 C14 C15 O1 O1 O2

This appears to be a trivial exercise, but it is not. Consider all-trans-retinoic acid with 22 atoms and 22 bonds. This yields more than 40 dihedral angles, but only 5 torsion entries (two of which are wrong and only appear because the torsion angle of 6(o) falls outside the tolerance for fixed dihedrals; it probably should have been restrained more tightly in the refinement). The trick is to throw away every dihedral which appears to be strongly restrained, dihedrals inside rings, etc. MOLEMAN uses the following criteria to reduce the number of torsions:
- any dihedral which is equal to -180(o), 0(o) or +180(o) (with a tolerance of 5(o)) is rejected as (probably) being strongly restrained to its current value (e.g., in conjugated systems);
- any dihedral K-I-J-L whose rotation affects atom "K" or "I" is rejected as (probably) being part of a ring system;
- if the number of atoms affected by the torsion is greater than or equal to the total number of atoms minus 4, the torsion is rejected. In this case (for example, a torsion involving a carboxylate) it makes more sense to use the torsion defined the other way around (i.e., use torsion L-J-I-K which affects only 1 or 2 atoms, instead of K-I-J-L);
- if a torsion is around the same bond as a previous torsion, and it affects the same atoms as that previous torsion, it is rejected as being a simple permutation. This happens, for instance, for carboxylates and for aliphatic tails sprouting from a ring, in which case there are two equivalent ways to define the torsion of the tail relative to the ring).
Note that the torsions found for retinoic acid (except the two unwanted ones mentioned earlier), are indeed the only intuitively reasonable ones: the tail relative to the ring, the ring relative to the tail, and the carboxylate relative to the tail.

We have also written a simple script which will take the residue name of any compound occurring in our hetero-compound collection file, and automatically generates the complete set of X-PLOR and O dictionaries as well as the "clean" PDB file which can be imported directly into both of these programs.

The hetero-compound collection is freely available to anyone interested, as is the script that generates a set of O and X-PLOR dictionaries for any compound in this collection. The programs XPLO2D and MOLEMAN are available to academic users free of charge. For more information, contact GJK (E-mail: "").

USF Latest update at 12 February, 1998.