CNS/XPLOR Topology and Parameter file usage:

Attaching Heme to protein as a diaxial prosthetic group

PLUS: Iron-Sulfur cluster attachment

by Bryan Lepore

Having trouble attaching prosthetic groups (not ligands!) - in particular Heme - to your protein? Well, I wrote this to help myself, and hopefully anybody else. I tried to make it general in some parts and specific in others, but always brief. I'm specifically talking about attaching Heme to a Cysteine and Histidine - but see the end for the iron-sulfur cluster attachment stuff. Also, this is what worked for me, so there are probably other ways to skin this cat. Here are some (meta) links that are also helpful out there:

  • This section in the CNS (version 1.1) tutorial
  • Another section of the CNS tutorial
  • The X-PLOR FAQ list
  • Ghemical, an open-source, GNU-GPL'ed molecular/quantum mechanics program from Tommi Hassinen and others
  • Dan Peisach's short but sweet ditty on a single ligand to heme
  • The XPLOR manual, Section 4, on Topology, Parameters, and Molecular Structure.
  • Gerard Kleywegt's HIC-Up database search
  • Daan van Aalten's WAY-COOL PRODRG server at U. Dundee
  • Gerard's HIC-Up server to run HETZE, XPLO2D, and MOLEMAN2
  • the CNS Bulletin Board discussion group at yahoo.com
  • I am not the only person to actually make a webpage about this stuff.


  • Bootstrapping

    You should already be up on your ligand's bond lengths and angles, atom masses, charges, and protonation states. If you trust everyone else out there, check the HIC-Up database or the PDB to borrow a .par, .top and .pdb of your favorite ligand. Start from a high resolution pdb by using the HIC-Up server and check everything it dumps out. If your institutuion is cool, you have access to the CSD.

    Things to know

  • You will need to be able to recognize your atom NAMES (pdb style) along with your atom TYPES (assigned by XPLO2D). Places to look for the various atom names and the corresponding atom types are the .mtf, $CNS_TOPPAR:protein.param, $CNS_TOPPAR:ion.param, etc.
  • The .par should usually contain atom TYPES, while the .top usually contains the atom NAMES.
  • The .top will contain the correlation of atom NAMES and TYPES given within a GROUp statement.
  • You need a newline after the final statements in your .par, .top, and .pdb.
  • Bond restraints are in kcal mol-1 Å2, angle restraints are in kcal mol-1 rad-2. Set these to zero if you want to restrict these parameters.
  • Nonbonded parameters are given as follows: dielectric constant (eps) in kcal mol-1 and sigma in Å
  • Absence of errors in the generate stage are NOT signs that further refinement WILL NOT fail - be vigilant. Hint: grep for % in the outfiles, looking for nasty things like %PATCH-ERR or %READ-WRN.
  • Make sure you don't have conflicting atom types when you are including multiple prostheses or ligands. CNS barfs if you do.
  • If you want to add more prosthetic groups, modify generate.inp so you end up with...
    {* prosthetic group coordinate file *}
    {===>} prost_coordinate_infile_1="nadh.pdb";
    {===>} prost_coordinate_infile_2="heme.pdb";
    (------------BREAK IN FILE------------------------)
    {* prosthetic group topology file *}
    {===>} prost_topology_infile_1="nadh.top";
    {===>} prost_topology_infile_2="heme.top";
    (------------BREAK IN FILE------------------------)
    {* prosthetic group parameter file *}
    {===>} prost_parameter_infile_1="nadh.par";
    {===>} prost_parameter_infile_2="heme.par";
    (------------BREAK IN FILE------------------------)
    
       if ( &BLANK%prost_topology_infile_1 = false ) then
         @@&prost_topology_infile_1
       end if
    ! add this to have a second prosthetic
       if ( &BLANK%prost_topology_infile_2 = false ) then
         @@&prost_topology_infile_2
       end if
    (------------BREAK IN FILE------------------------)
    ! modified from "...infile = " to ...infile_1 =..."
       if ( &BLANK%prost_parameter_infile_1 = false ) then
         @@&prost_parameter_infile_1
       end if
    ! need the following for both prostheses 
       if ( &BLANK%prost_parameter_infile_2 = false ) then
         @@&prost_parameter_infile_2
       end if
    
    
    yeah, you will have to go down below the part that says "You don't normally need to go here" or whatever. I rarely use that html gui, so you'll have to figure out how this relates on your own if that's all you know. UPDATE: "that html gui" has a part where you can enter the patch, proving that manual editing was all i knew.
  • Dust off those old molecular models - they can be handy.
  • Additions to the .par that are missing the corresponding topology in .top might not make CNS barf, but it does not mean that CNS picked them up - vide infra

    The Big Example

    This explanation will likely be malapropos, becoming clearer as you go. It is also biased towards my files. I fed a heme from the pdb into XPLO2D (at home). It thought FE was Fluorine for some reason, so I fixed the .top as follows;


    ! MASS FZ1    18.99800 ! assuming F -> 18.99800 + 1.008 * 0 (Hs)
    MASS FE     55.84700 ! assuming F -> 18.99800 + 1.008 * 0 (Hs)  
    (..... skipping through the .top file.....) 

    ! ATOM FE TYPE FZ1 CHARge 0.0 END ! Nr of Hs = 0 ATOM FE TYPE FE CHARge 0.0 END ! Nr of Hs = 0

    {... yeah, FZ reminds me of the Duke of Prunes also...}

    then after the "END { RESIdue HEM }" in the .top, put in the following declarations using the chemical atomic names (i.e. as found in the .mtf, NOT .pdb atomic names), but never mind understanding the leading numbers until later in this guide... but if you can't wait for that, then i'll tell you that they refer to the 'reference' statement in the generate.inp patch section.... IMPORTANT POINT: each GROUP statement must conclude with an END statement - watch out for the easy mistake of when you see an END {PATCH} statement and think it covers all the GROUPS!...

     PREsidue HEML
    ! 1=cys (RESID 452), 2=his (RESID 465), 3=heme (RESID 610)
    ! either modify 1SG or delete the H on the SH of CYS452
    !GROUP
    !    modify atom 1SG  type=S   charge=0.0  END
    GROUP
         DELETE ATOM 1H END
         DELETE ATOM 2HE2 END       
         ADD ANGL 1SG 3FE 2NE2
         ADD ANGL 1SG 3FE 3NA
         ADD BOND 1SG   3FE
         ADD ANGL 1CB 1SG 3FE
         ADD BOND 2NE2 3FE
         ADD ANGL 2CD2 2NE2 3FE
      END {HEML}
    

    ...then, put an exclamation point in the .par as follows...
    ! NONBonded FE .0200 2.610       .0200   2.610  ! assuming Iron
    

    You don't need the ion FE in the .par b/c CNS gets it for you automagically, and will actually cause problems if you DO include it because CNS will call it twice, and will fail at the n=generate+1 stage. I can't see any reason why making the iron type and name both FE would be a problem. N.B: This would be F for fluorine if you didn't fix it.

    Now here's the other modifications to the .par:

    right before the "set echo=true end" line of the .par, put in...
    BOND NH1  FE           2500.0  2.000  
    BOND SH1E FE           2500.0  2.300  
    ANGLE NH1 FE  SH1E     2500.0 180.0   
    ANGLE SH1E FE  NX6     2500.0 90.0    
    ANGLE CR1H NH1 FE      2500.0 120.0   
    ANGLE CH2E SH1E FE     2500.0 120.0   
    

    note that these are the chemical atomic names, NOT the .pdb file atomic names in this declaration.


    Finally, throw this in the generate.inp after the "any special prosthetic group patches" line (yeah, down BELOW, in the "things that don't normally need to be changed" part...

     {* any special prosthetic group patches can be applied here   *}
     {* note that residue numbers are renumbered so that it is more }
     {  clear in the parameter and topology files                   }
     { Heme      = resid 610,620,630 etc...                         }
     { Cysteine  = resid 452,552,652,etc...                         }
     { Histidine = resid 465,565,665,etc...                         }
    { if you do NOT have unique numbers for each ligand in each               }
    { chain, CNS will complain that it was "NOT FOUND IN MOLECULAR STRUCTURE" }
    {===>}	
    	patch HEML
    	reference=1=(resid 452)
    	reference=2=(resid 465)
    	reference=3=(resid 610)
    	end
    	patch HEML
    	reference=1=(resid 552)
    	reference=2=(resid 565)
    	reference=3=(resid 620)
    	end
    	patch HEML
    	reference=1=(resid 652)
    	reference=2=(resid 665)
    	reference=3=(resid 630)
    	end
    	patch HEML
    	reference=1=(resid 752)
    	reference=2=(resid 765)
    	reference=3=(resid 640)
    	end
    	patch HEML
    	reference=1=(resid 852)
    	reference=2=(resid 865)
    	reference=3=(resid 650)
    	end
    	patch HEML
    	reference=1=(resid 952)
    	reference=2=(resid 965)
    	reference=3=(resid 660)
    	end
    {<===}
    

    You've got some explaining to do

    So you've seen what I did. Those resid's in the generate.inp patch refer to the numbers I gave to Cysteine 52, Histidine 65, and Heme 610-660. And I have six subunits to put heme into, so they increment logically. It's just a handy thing to do. Also, the cysteine and histidine MUST be renumbered, because CNS is going by resid, although I figure you could write a patch to recognize the segids too. CNS uses the reference numbers from the .top patch above in order to do things to the atoms, i.e. the "DELETE ATOM 1H END", or the "ADD ANGL 1CB 1SG 3FE" definition. Also, this numbering convention works with multiple models - i.e. for alternate conformations, hi-resolution, etc., although for more than nine chains, it's better to use numbers with four places e.g. 1110 for prosthetic number one, 1120 for number two, etc. Basically, you have to number each prosthesis or ligand (yes, i tried it) uniquely for each chain. I have used this guide myself on 2 distinct occasions to edit and link a four-iron/four-sulfur (Fe4S4) cluster to a protein as well, which prompted some clarifications to this guide, so it seems to make sense to at least me.

    I make no claim to the accuracy of the parameters given here - they are just numbers to help explain the prosthetic group attachment logic, and some have not been verified experimentally. Also, additional parameters can be defined that I left out - therein lies the style of refinement. there may be more elegant ways - please share them!

    I think those are the most important things, I hope some of it was self-explanatory. I wrote this to fill in some of the big holes left by the links I provided at the top of this document. If you thought this was obscene, useful, or otherwise, please mail me your thoughts. And thanks to those of you who sent feedback.

    BONUS: Iron-Sulfur cluster attachment:

    i had a heart-attack when i had to attach an iron-sulfur cluster to a protein, so i decided to put this in here, especially since this webpage makes it sound like i know everything about prosthetic groups in with CNS. The background: Four-iron-Four-sulfur (Fe4S4) cluster, attached by THREE (not four) irons to THREE (not four) cysteines in one chain. there are two chains with one cluster each. i have learned that CNS always has new error messages to entertain you, and this is no exception - 'referenced sets must be disjoint' is one, i may have changed it a bit. i'll just put the salient parts of the top/par and generate.inp below and leave the rest "as an excercise for the student" (who is most likely me).

    the key is this, but nota bene, the letters do not logically map to the numbers, based on the "if it ain't broke don't fix it" principle. e.g. B does not necessarily map to 2.

    ! 1=SG (RESID 123), 2=FE3 (RESID 510)
    ! 3=SG (RESID 127), 4=FE4 (RESID 510),
    ! 5=SG (RESID 130), 6=FE2 (RESID 510),
    ! FSMC=FE2(reference 6):cys 130(reference 5)
    ! FSMA=FE3(reference 2):cys 123(reference 1)
    ! FSMB=FE4(reference 4):cys 127(reference 3)
    

    .top file:

    !!-- cysteine 1
    PREsidue FS4A
    GROUP
            DELETE ATOM 1H END
            ADD BOND 1SG 2FE3
            ADD ANGL 1CB 1SG 2FE3
            ADD ANGL 1SG 2FE3 2S2
            ADD ANGLE 1SG 2FE3 2S3
            ADD ANGLE 1SG 2FE3 2S4
    END {FS4A}
    !!-- cysteine 2
    PREsidue FS4B
    GROUP
            DELETE ATOM 3H END
            ADD BOND 3SG 4FE4
            ADD ANGL 3CB 3SG 4FE4
            ADD ANGL 3SG 4FE4 4S4
            ADD ANGLE 3SG 4FE4 4S3
            ADD ANGLE 3SG 4FE4 4S1
    END {FS4B}
    !!-- cysteine 3
    PREsidue FS4C
    GROUP
            DELETE ATOM 5H END
            ADD BOND 5SG 6FE2
            ADD ANGL 5CB 5SG 6FE2
            ADD ANGL 5SG 6FE2 6S1
            ADD ANGLE 5SG 6FE2 6S4
            ADD ANGLE 5SG 6FE2 6S2
    END {FS4C}
    

    .par file:

    !-- first connection
    BOND SH1E FE_3          2500.0 2.330 ! FE3
    ANGLE CH2E SH1E FE_3    2500.0 108.8
    ANGLE SH1E FE_3 S_6     2500.0 101.0 ! S2
    ANGLE SH1E FE_3 S_7     2500.0 126.9 ! S3
    ANGLE SH1E FE_3 S_8     2500.0 126.0 ! S4
                                                                                                                   
    !-- second connection
                                                                                                                   
    BOND SH1E FE_4          2500.0 2.330 ! FE4
    ANGLE CH2E SH1E FE_4    2500.0 108.8
    ANGLE SH1E FE_4 S_5     2500.0 109.1 ! S1
    ANGLE SH1E FE_4 S_7     2500.0 117.1 ! S3
    ANGLE SH1E FE_4 S_8     2500.0 103.6 ! S4
                                                                                                                   
    !-- third and last connection
                                                                                                                   
    BOND SH1E FE_2          2500.0 2.330 ! FE2
    ANGLE CH2E SH1E FE_2    2500.0 108.8
    ANGLE SH1E FE_2 S_5     2500.0 129.7 ! S1
    ANGLE SH1E FE_2 S_8     2500.0 106.2 ! S4
    ANGLE SH1E FE_2 S_6     2500.0 120.9 ! S2
    

    generate.inp prosthetic group patch section:

           patch FS4A
           reference=1=(resid 123 and segid A )
           reference=2=(resid 510 and segid A )
           end
           patch FS4B
           reference=3=(resid 127 and segid A )
           reference=4=(resid 510 and segid A )
           end
           patch FS4C
           reference=5=(resid 130 and segid A )
           reference=6=(resid 510 and segid A )
           end
           patch FS4A
           reference=1=(resid 123 and segid B )
           reference=2=(resid 520 and segid B )
           end
           patch FS4B
           reference=3=(resid 127 and segid B )
           reference=4=(resid 520 and segid B )
           end
           patch FS4C
           reference=5=(resid 130 and segid B )
           reference=6=(resid 520 and segid B )
           end
    



    Page last updated: Thu Jul 8 20:33:26 EDT 2004

    Please complain to bryanlepore at gmail dot com