Functional Mode Analysis
Functional mode analysis (FMA) is a technique to identify collective atomic motions related to a specific protein function. Given a large set to structures of one protein, for example from a molecular dynamics trajectory, the method detects a collective motion (or collective mode), that is maximally correlated to an arbitrary quantity of interest. In other words, the method aims to explain alterations in a quantity in terms of internal collective motions of the protein.
What kind of questions can be addressed with FMA?
Let us assume you are interested in a some 'functional' quantity that is important for the function of your protein, such as
- the volume of a binding site,
- the radius of gyration of the protein,
- the distance between two important functional residues,
- the number of H-bonds between two groups,
- the electrostatic potential at some site in the protein,
- etc,
How is FMA related to normal mode analysis or principal component analysis?
Normal mode analysis (NMA) or principal component analysis (PCA) are popular techniques to identify the large-scale collective motions of proteins. NMA computes low-frequency modes and PCA yields the modes that give the larges contributions to the atomic RMSD in a given protein ensemble.
However, a functionally relevant mode is in most cases not identical to a specific normal or PCA mode, but may be distributed over a number of normal or PCA modes. For example, the electrostatic potential at a ligand binding site may be tuned by a combination of PCA vector no. 1, 3 and 15. In such cases, FMA yields a single collective mode which tunes the functional quantity. In addition, FMA quantifies the contributions of the individual PCA vectors (or normal modes) to the fluctuations of the functional quantity.
For more details on FMA, please check the original publication
J.S. Hub and B.L. de Groot, Detection of functional modes in protein dynamics, PLoS Comp Biol 5(8), e1000480 (2009) [pdf] [supporting info] [www]In case you use results from FMA for a publication, we kindly ask you to cite the reference. Thank you!
Source code
Latest sources: fma.v0901.tar.gz
Please note that the software is distributed with NO WARRANTY OF ANY KIND. The author is not responsible for any losses or damages suffered directly or indirectly from the use of the software. Use it at your own risk.
Installation
There comes a README file with the source tar ball, which (hopefully) explains the compilation requirements and procedure. I have tested the installation under Mac OS X (10.5) and Suse Linux (Version 10), but for other environments the cmake input file might have to be further adapted (than explained in README). If you run into trouble, please do not hesitate to contact us (jochen at xray.bmc.uu.se).
Please note: At present, the FMA code can only be linked to GROMACS 4.0.x, but has not yet been adapted to the recent GROMACS 4.5 code.
Problems, suggestions, bug reports
Please do not hesitate to send an email (jochen at xray.bmc.uu.se) in case you run into trouble, have any suggestions, or if the documentation is unclear. Bug reports are also highly welcome.
How-to-start tutorial
Please also check the fma help (including command line options) via fma -h.The following instructions assume you use Gromacs. Let us assume you have the following input files:
- A protein structure file prot.pdb
- A simulation trajectory in gromacs xtc format, traj.xtc
- A data file observable with the functional variable f(t). The file must have 2 columns, time and f(time).
- Typically, you start with a PCA on the simulation
g_covar -s prot.pdb -f traj.xtc -v eigenvec.trr
Select the group for RMSD fit (typically backbone or C-alpha), and the group you want to use to explain the functional quantity (typically C-alpha, backbone, or heavy atoms). - Create the projections on the PCA vectors
using g_anaeig
g_anaeig -s prot.pdb -f traj.xtc -v eigenvec.trr -proj proj.xvg -first 1 -last 50
With option -last, you choose the number of principal components (PCs) to write into proj.xvg. Please note that the fma tool allows afterwords to choose (option -nvec) how many of the PCs in proj.xvg you want to use in FMA.
Unfortunately, g_anaeig writes the projections in the awkward grace format with different PCs printed below each other, and separated by '&' characters. This format is horrible to parse, and the gromacs-internal xvg read routine actually fails when parsing a really large xvg file. Therefore, please change the format using the bash script xvg2col.bash which is coming with the source code.xvg2col.bash proj.xvg
The script creates a file proj.col.xvg with one time column, followed by all the PC columns. - Now, you can run the fma tool. For the
beginning, I suggest to stick the the Pearson coefficient as
correlation measure
fma -ip proj.col.xvg -i observable.xvg -iv eigenvec.trr -tmb ?? -ox -rc -rm -nev -contr -dvp ...
The most important input options are-tmb (final time for model builing): Time frames before -tmb will be used for model building, time frame after -tmb for cross validation
-read-tmin, -read-tmax, -read-dt, defining first and last time to read, and the time step to read
- r-crossv.xvg (option -rc): scatter plot (data vs. model) of the cross validation set, also providing the correlation Rc between data and model of the cross validation building set
- r-modelbuild.xvg (option -rm): scatter plot (data vs. model) of the model building set, also providing Rm, i.e. the correlation between data and model in the model building set
- pear_nev.xvg (option -nev): Rc and Rm as a function of the number of PCA vectors d used as basis set. Likewise, sig_nev.xvg (option -nevs) provides the standard deviation between data and model as a function of d. That output is important to find a reasonable number of PCA vectors that are used as basis set to construct the collective mode.
- validate.xvg (option -od) plots the functional quantity, the model in the model building set, and the model in the validation set as a function of time.
- linmodel.xvg (option -lm): parameters βi of the linear model (compare publication). The subtitle of the plot provides the equation how to compute the functional quantityf from the projection pa on the functional mode.
- collvec.xvg (option -ox): parameters αi of the functional mode a (the maximally correlated motion) with respect to the PCA vectors.
- contr.xvg (option -contr): contributions of different principal components (PCs) to the variance of the model. In addition, the plot shows the variance of the PCs, i.e. the PCA eigenvalues if the same frames were used for PCA and FMA. (The values (βiσi)2 are rather for test issues).
- dataVsPa.xvg (option -dvp): data f versus the projection pa on the functional mode.
- MCM.trr (option -va): Cartesian atomic coordinates of the maximally correlated motion (the vector a in the publication). Can be further used with g_anaeig2 (see below)
- ewMCM.trr (option -vew): Cartesian atomic coordinates of the ensemble-weighted maximally correlated motion (ewMCM). Can be further used with g_anaeig2 (see below).
You can use the g_anaeig2 -extr option to create movies of the maximally correlated motion (MCM) or the ensemble-weighted MCM (ewMCM). Example videos of such motions are available as supporting material of the FMA publication (see here, for example). To visualize the extreme extensions along the MCM in your trajectory, use something like
g_anaeig -s prot.pdb -f traj.xtc -v MCM.trr -extr mcm-extr.pdb -nframes 30Note that since the MCM written in MCM.trr is normalized, the spacial displacement visible in mcm-extr.pdb may appear smaller than in the simulation (and depends on the number of basis vectors used in FMA, option -nvec). Therefore, you may want to use the -max option of g_anaeig to visualize the MCM.
To visualize the ewMCM, you can use the g_anaeig command lines that fma prints to the console (check fma output). The motion that makes the functional quantity f fluctuate within n⋅&sigmaf (where &sigmaf is the standard deviation of f) is generated by
g_anaeig -s prot.pdb -v ewMCM.trr -max n -extr ewMCM.pdb -nframes 30To get a good impression of the motion, you may want to choose n=3. Alternatively, if you want to visualize the motion that generates the extremes of f, you can use the g_anaeig2 tool that comes with the FMA implementation. The only difference between g_anaeig2 and the normal g_anaeig is that it provides a -min option in addition to the -max. (Hopefully the official g_anaeig will soon provide the -min option.) Use arguments for -min and -max are in the fma output. The command line may be similar to:
g_anaeig2 -s prot.pdb -v ewMCM.trr -min -8.25553 -max 7.68362 -extr ewMCM.pdb -nframes 30Finally, the motions in mcm-extr.pdb or ewMCM.pdb may be visualized with common visualization software (such as PyMol).
Frequently Asked Questions (FAQ)
- While compiling, I am getting an error message such as
'/usr/bin/ld: cannot find -lf2c',
what's wrong?
Answer: Either you don't have the f2c libraries installed (try locate libf2c, if you have locate on your system), or you have not adapted the line
set (LIBDIR_F2C /sw/lib)
in the CMakeLists.txt file. - While compiling I get link errors such as
undefined reference to `sgesv_'.
Answer: The compiler does not find the LAPACK/BLAS libraries. Make sure that- the path to the LAPACK/BLAS libs is correctly given in set(LAPACKDIR /home/me/path_to_lapack) in CMakeLists.txt,
- the names of the LAPACK/BLAS libs is given by
set (LIBLAPACK lapack blas). Depending on the file names, that line may
also read set (LIBLAPACK lapack_LINUX blas_LINUX)
or set (LIBLAPACK lapack_APPLE blas_APPLE) or similar.
Important: The names must be given without the "lib" at the beginning of the file name. - And that the LAPACK/BLAS library file names on your hard disk start with `lib'. If the LAPACK/BLAS libs are called (e.g.) lapack.a and blas.a, please use the mv command to rename them to liblapack.a and libblas.a.
- While compiling, I get link errors such as
undefined reference to `for_len_trim'
undefined reference to `for_write_seq_fmt'
undefined reference to `for_write_seq_fmt_xmit'
undefined reference to `for_stop_core'
What's wrong?
Answer: Please try against link to the gfortran libraries, that may help. Please change the line set (LIBS m md_mpi gmx_mpi f2c ${LIBLAPACK})
to
set (LIBS m md_mpi gmx_mpi gfortran f2c ${LIBLAPACK})
in CMakeLists.txt. - Installing f2c on a Fedora Linux. If you are having trouble installing f2c on a Fedora system, this website may help you.
Please note: We are not responsible for contents of external links.