USF

Uppsala Software Factory Tutorial - Dimensions and volumes

This page describes how to use USF programs to "guestimate" the dimensions and volume of your protein. It uses the following programs:

  1. MOLEMAN2 to prepare the PDB file and guestimate the dimensions
  2. VOIDOO to guestimate the volume
  3. MAMA as an alternative to guestimate the volume


As an example we will use the structure of P2 myelin protein. This can be found in PDB entry 1PMP. However, this entry contains three molecules of P2 myelin, and in addition they contain a ligand and some water molecules. So, let's extract the molecule we are really interested in first (with MOLEMAN2), and save it in a file by itself:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 MOLEMAN2 > read 1pmp.pdb
 Reading from file : (1pmp.pdb) 
 in normal PDB format
 ignoring hydrogen atoms
 HEADER :     CELLULAR LIPOPHILIC TRANSPORT PROTEIN   10-FEB-93   1PMP      1PMP   2
[...]
 Nr of atoms now : (       3192) 
 Nr of residues  : (        411) 
 Select ALL atoms
 Selection history : (ALL |) 
 Nr of selected atoms : (       3192) 
 MOLEMAN2 > select and type protein
[...]
 Nr of selected atoms : (       3117) 
 MOLEMAN2 > select and chain a
[...]
 Selection history : (ALL | AND TYpe = PROT | AND CHain = A |) 
 Nr of selected atoms : (       1039) 
 MOLEMAN2 > wr p2_myelin.pdb pdb selected
[...]
 Nr of atoms written : (       1039) 
 Nr of lines written : (       1288) 
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

We could now issue the STatistics command in MOLEMAN2 to get two different estimates of the size of the molecule:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 MOLEMAN2 > stat
[...]
 Nr of selected atoms : (       1039) 
      Ditto, hydrogen : (          0) 
      Ditto, ANISOU   : (          0) 

       Item    Average     St.Dev        Min        Max        RMS  Harm.ave.
       ----    -------     ------        ---        ---        ---  ---------
    X-coord     49.647      7.784     33.148     67.053
    Y-coord     64.626      7.474     43.564     83.395
    Z-coord     32.915      9.192     10.907     54.736
   B-factor     26.268      2.661     20.260     34.730     26.402     26.002
    Occpncy      1.000      0.000      1.000      1.000      1.000      1.000

 The radius of gyration is   14.2 A

 Range of X, Y, and Z coordinates:   33.9 A *   39.8 A *   43.8 A
 If you have used XYz ALign_inertia_axes, these numbers
 give you an indication of the dimensions of the selected
 molecule (or set of atoms).
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

If your molecule is roughly spherical, the radius of gyration may be a useful number to quote. Note that the molecule is in a "random" orientation, and so the range of the X, Y, and Z coordinates are not particularly meaningful. They can become more interesting if we re-orient the molecule in such a fashion that its three axes of inertia are aligned with the X, Y and Z axes (and we will also save the re-oriented molecule, overwriting the p2_myelin.pdb file):

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 MOLEMAN2 > xyz align
 Moving CofG of selected atoms to (0,0,0)
 Nr of selected atoms : (       1039) 
 Centre-of-Gravity : (  49.647   64.626   32.915) 
 CofG now at (0,0,0)
 Eigen value 1 =      91422.9 Vector :   0.092372  0.330401  0.939309
 Eigen value 2 =      70459.8 Vector :   0.815531  0.516136 -0.261750
 Eigen value 3 =      46898.3 Vector :  -0.571294  0.790214 -0.221776
 Determinant : (   1.000) 
[...]
 MOLEMAN2 > stat
[...]
 The radius of gyration is   14.2 A

 Range of X, Y, and Z coordinates:   42.9 A *   38.1 A *   34.2 A
 If you have used XYz ALign_inertia_axes, these numbers
 give you an indication of the dimensions of the selected
 molecule (or set of atoms).
 MOLEMAN2 > wr p2_myelin.pdb pdb selected
 Output PDB file : (p2_myelin.pdb) 
 Format : (Pdb) 
 Atoms  : (SELECTED) 
 ERROR --- XOPXNA - error # 126 while opening NEW file : p2_myelin.pdb
 OPEN : (UNIT= 10 STATUS=NEW CAR_CONTROL= FORM=FORMATTED ACCESS=SEQUENTIAL)
   
 Error : ('new' file exists) 
 Open file as OLD (Y/N) ? (N) y
 Number of atoms to write : (       1039) 
 Nr of atoms written : (       1039) 
 Nr of lines written : (       1288) 
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

First of all, note that the radius of gyration has not changed - it is independent of the orientation of the molecule. However, the raneg of the coordinates have changed. Due to the alignment of the inertia axes, the X axis shows the widest spread in the coordinates (~43 Å) and the Z-axis the smallest (~34 Å). In essence, the Z=0 plane has become the "least-squares plane" of the entire molecule. If you view the saved molecule in a graphics program along the Z-axis, this should give you a kind of "least-cluttered view" of the structure of the molecule. So the range of coordinates have some meaning, and you could quote them as the dimensions of the molecule (if you want to include the van der Waals radius of the atoms, add 3 or 4 Å to each of the three dimensions). Also note that for P2 myelin the three dimensions do not differ all that much, i.e. the molecule is roughly shaped like an elongated sphere.

To avoid problems with floppy or ill-determined surface sidechains, you could also do the calculations just on the CA atoms:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 MOLEMAN2 > select and atom " CA "
[...]
 Nr of selected atoms : (        131) 
 MOLEMAN2 > xyz align
 Moving CofG of selected atoms to (0,0,0)
 Nr of selected atoms : (        131) 
 Centre-of-Gravity : (  -0.066    0.057   -0.014) 
 CofG now at (0,0,0)
 Eigen value 1 =      11574.6 Vector :   0.999327  0.004686  0.036371
 Eigen value 2 =       8461.0 Vector :  -0.004321  0.999939 -0.010121
 Eigen value 3 =       5442.4 Vector :  -0.036416  0.009957  0.999287
 Determinant : (   1.000) 
[...]
 MOLEMAN2 > stat 
[...]
 The radius of gyration is   13.9 A

 Range of X, Y, and Z coordinates:   39.0 A *   31.6 A *   26.3 A
 If you have used XYz ALign_inertia_axes, these numbers
 give you an indication of the dimensions of the selected
 molecule (or set of atoms).
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

As you can see, the radius of gyration does not change much, but the "dimensions" do. Obviously, when quoting "the" dimensions of your protein, it is crucial to explain how you measured them !

Method Dimensions (Å3) Radius of gyration (Å)
MOLEMAN2 STat, all atoms, random orientation 34 * 40 * 44 14
MOLEMAN2 STat, all atoms, XYz ALign 43 * 38 * 34 14
MOLEMAN2 STat, CA atoms, XYz ALign 39 * 32 * 26 14


Now, how do we estimate the volume of your protein ? To a first approximation, the simple formula: Volume = 140 * Nres (where Nres is the number of residues) gives reasonable results. In this case: Volume = 140 * 131 ~ 18,300 Å3.

Remembering that the radius of gyration of the molecules was ~14 Å (and assuming the molecule is a perfect sphere), we can of course also estimate the volume as (4/3)*PI*Rgyr3. In this case, this gives an estimate of roughly 11,500 Å3.

A slightly better (we hope) estimate can be obtained with VOIDOO:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Select one of the following types of calculation:
 C = cavity calculations
 V = volume calculations
 R = rotate a molecule
 Q = Quit program

 Type of calculation (C/V/R/Q)         ? (C) v
 Do you want extensive output          ? (N) 
  
 (1) Vanderwaals radii and residue types
  
 Library file ? (/home/gerard/lib/cavity.lib) 
 Reading your library file ...
[...]
 (2) PDB file
  
 PDB file name ? (in.pdb) p2_myelin.pdb
 Reading your PDB file ...
 REMARK CREATED BY MOLEMAN2 V. 020628/3.0 AT MON JUL 15 18:32:50 2002 FOR GERARD
[...]
 Number of atoms read       : (       1039) 
 Number of atoms kept       : (       1039) 
 Number of atoms rejected   : (          0) 
 Max Vanderwaals radius (A) : (   2.000) 
 Sum of atomic volumes (A3) : (  2.627E+04) 
 No residue types rejected
  
 (3) Primary grid
  
 Min, max, cog for X :    -19.609    23.317     0.000
 Min, max, cog for Y :    -18.104    19.951     0.000
 Min, max, cog for Z :    -19.281    14.887     0.000
 Primary grid spacing (A) ? (   1.000) 0.5
 Probe radius (1.4 A for water) ? (   0.000)    
 Min, max, cog for X :    -22.000    25.500
 Min, max, cog for Y :    -20.500    22.000
 Min, max, cog for Z :    -21.500    17.000
 Number of grid points : (         96          86          78) 
 Volume per voxel (A3) : (  1.250E-01) 
  
 (4) Various parameters
  
 Nr of volume-refinement cycles        ? (         10) 
 Grid-shrink factor                    ? (   0.900) 
 Convergence criterion (A3)            ? (   0.100) 
 Convergence criterion (%)             ? (   0.100) 
 Create protein-surface plot file      ? (N) 
  2 CPU total/user/sys :       0.2       0.2       0.0

 CYCLE : (          1) 
 Grid spacing : (   0.500) 
 Setting up grid ...
 Nr of points in grid : (     643968) 
 Not the protein      : (     532669) 
 The protein itself   : (     111299) 
 23 CPU total/user/sys :       0.1       0.1       0.0
 Nr of voxels in protein : (     111299) 
 Volume per voxel (A3)   : (  1.250E-01) 
 Protein volume (A3)     : (  1.391E+04) 
 Volume corresponds to a sphere of radius (A) : (  1.492E+01) 
 Nr of new grid points : (        107          96          87) 

 CYCLE : (          2) 
 Grid spacing : (   0.450) 
 Setting up grid ...
 Nr of points in grid : (     893664) 
 Not the protein      : (     741159) 
 The protein itself   : (     152505) 
 23 CPU total/user/sys :       0.2       0.2       0.0
 Nr of voxels in protein : (     152505) 
 Volume per voxel (A3)   : (  9.112E-02) 
 Protein volume (A3)     : (  1.390E+04) 
 Volume corresponds to a sphere of radius (A) : (  1.491E+01) 
 Nr of new grid points : (        119         107          97) 

 CYCLE : (          3) 
 Grid spacing : (   0.405) 
 Setting up grid ...
 Nr of points in grid : (    1235101) 
 Not the protein      : (    1025857) 
 The protein itself   : (     209244) 
 23 CPU total/user/sys :       0.3       0.3       0.0
 Nr of voxels in protein : (     209244) 
 Volume per voxel (A3)   : (  6.643E-02) 
 Protein volume (A3)     : (  1.390E+04) 
 Volume corresponds to a sphere of radius (A) : (  1.492E+01) 

 >>> CONVERGENCE <<<

 Last change (A3/%) : (  3.085E+00   2.220E-02) 
 Nr of volume calculations : (          3) 
 Average volume       (A3) : (  1.390E+04) 
 Volume corresponds to a sphere of radius (A) : (  1.492E+01) 
 Standard deviation   (A3) : (  6.676E+00) 
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

So, VOIDOO claims that the volume is 13,900 Å3, which is about a third less than the value calculated from the simplest formula. VOIDOO also says that this volume is the same as that of a sphere with a radius of 14.9 Å. Remember that the radius of gyration that we calculated was 14.2 Å (if we included all atoms, which we did in the VOIDOO calculation as well), so it looks as if P2 myelin isn't too "unspherical".

However, VOIDOO calculates it volumes on discrete grids. This means that the results will be dependent on (a) the grid spacing, and (b) the orientation of the molecule. If we do the same calculation as above, but start with a grid with 1.0 Å spacing (instead of 0.5 Å), we find a volume of "1.392E+04", i.e. essentially identical to the result above. VOIDOO can also apply random rotations to a molecule in a PDB file. If we generate three such randomly oriented copies of P2 myelin and calculate their volumes (start at a grid of 1.0 Å), we find: 1.391E+04, 1.389E+04, and 1.387E+04. In other words, we can reasonably quote the average (VOIDOO) volume as ~13,900 Å3.

An alternative to using VOIDOO, is to simply make a mask, e.g. with MAMA. Again we have to decide on a grid spacing - let's use the default of 1.0 Å. For the average van der Waals radius we shall use 1.5 Å:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 MAMA > new ?
 Current defaults for the next NEW mask:
 Grid    =        100       100       100
 Origin  =          0         0         0
 Extent  =        100       100       100
 Padding =         10        10        10
 Cell    =    100.000   100.000   100.000    90.000    90.000    90.000
 Radius  =      2.000
 RT-oper =     1.000000    0.000000    0.000000
               0.000000    1.000000    0.000000
               0.000000    0.000000    1.000000
               0.000000    0.000000    0.000000
 Nr of points =    1000000 Max =    3000000
 MAMA > new rad 1.5
 NEW radius : (   1.500) 
 MAMA > new pdb m1 p2_myelin.pdb
 Number of atoms : (       1039) 
 Lower bounds (coordinates) : ( -19.609  -18.104  -19.281) 
 Upper bounds (coordinates) : (  23.317   19.951   14.887) 
 Lower bounds (grid points) : ( -19.609  -18.104  -19.281) 
 Upper bounds (grid points) : (  23.317   19.951   14.887) 
 Smallest radius : (   1.500) 
 Largest  radius : (   1.500) 
 Mask origin : (        -32         -31         -32) 
 Mask extent : (         69          64          60) 
 Grid points : (     264960) 
 Mask grid   : (        100         100         100) 
 Mask cell   : ( 100.000  100.000  100.000   90.000   90.000   90.000) 
 RT operator : (   1.000    0.000    0.000) 
 RT operator : (   0.000    1.000    0.000) 
 RT operator : (   0.000    0.000    1.000) 
 RT operator : (   0.000    0.000    0.000) 
 Nr of points set : (      11838) 
 MAMA > li m1
 Nr of masks in memory : (       1) 

 Mask  1 = M1
 File    = not_defined
 Grid    =        100       100       100
 Origin  =        -32       -31       -32
 Extent  =         69        64        60
 Cell    =    100.000   100.000   100.000    90.000    90.000    90.000
 Nr of points =     264960      Set   =      11838 (  4.47 %)
 Cell volume  =  1.000E+06      Voxel =  1.000E+00
 Grid volume  =  2.650E+05      Mask  =  1.184E+04
 Spacing =      1.000     1.000     1.000
 Top     =         36        32        27
 Changes = T
 Label   = No comment 
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   
Hence, according to MAMA, the volume of the mask (and, hence, the molecule) is 11,800 Å3. Note that this differs quite a bit from the value calculated with the simple formula and with VOIDOO. If we use a radius of 1.8 Å, the volume is 14,700 Å3. If we use a radius of 1.8 Å and a grid spacing of 0.5 Å, we find a volume of 13,000 Å3. To get a similar value, we need to use a radius of 1.85 Å and a spacing of 0.5 Å; then the volume is found to be 14,000 Å3. Of course, one could wonder if the volume of the ligand-binding cavity should be included or not ? Or if the mask should be smoothed with EXpand and COntract operations ? Again, the options are limitless, and it is therefore of the utmost importance to explain how you derived the volume that you quote in your paper or report !

Method Volume (Å3)
Volume = 140 * Nres 18,300
(4/3)*PI*Rgyr3 11,500
VOIDOO, 0.5 Å grid 13,900
VOIDOO, 1.0 Å grid 13,900
VOIDOO, 0.5 Å grid, average of 4 orientations 13,900
MAMA, 1.0 Å grid, 1.5 Å radius 11,800
MAMA, 1.0 Å grid, 1.8 Å radius 14,700
MAMA, 0.5 Å grid, 1.8 Å radius 13,000
MAMA, 0.5 Å grid, 1.85 Å radius 14,000


USF Latest update at 15 July, 2002.