DNA-binding motifs

Datorövning nr 2 inom kursen "Makromolekylers struktur och funktion" (Molekylär Bioteknik X2, 2008)
Practical no. 2 within the course "Macromolecular structure and function" (Molecular Biotechnology X2, 2008)


The purpose of this computer exercise is to give an introduction to DNA structure and to DNA binding motifs in proteins. You will also have a further look at the building blocks of macromolecules and examine some evolutionary aspects of protein sequence.

Lab report Displaying molecules.

Just like in practical 1, you will use the graphics program called Swiss PDB Viewer for visualizing macromolecules. The Swiss PDB Viewer User Guide is the best place to look if you have problems.

The practical

The practical is divided into three parts.
  1. DNA building blocks and structure
  2. Nuclear hormone receptors
  3. Leucine zippers


All the files that you need can be downloaded here.

The DNA building blocks and structure

DNA is built of four different nucleotide bases: guanine (G), cytosine (C), adenine (A) and thymine (T). In the 1950s, James Watson and Francis Crick revealed two types of DNA structure: A-DNA and B-DNA. While A-DNA is obtained under dehydrated nonphysiological conditions, B-DNA is obtained when the DNA is fully hydrated as it is in-vivo. It was previously known that DNA formed a helical structure. What Watson and Crick discovered was that DNA was forming a double-helix and that the base-pairing had a certain pattern: Cytosine paired only with guanine and thymine paired only with adenine. This is the so-called Watson-Crick base-pairing (see the figure to the right). The fantastic thing about this structure was also that through the specific G-C and A-T base pairing it revealed how the genetic information could be copied, since each of the two strands carry the same information.

Indicated with arrows in the figure to the right are the positions where the dexoyriboses are attached to the bases. Note that these positions are located asymmetricly to the hydrogen bonding between the bases, so that both are located on the same side. This asymmetry creates a major and a minor groove of the double-helix, which in turn is the base for sequence specific binding of proteins to DNA.

Each nucleotide also contains a deoxyribose sugar and a phosphate group: the "sugar-phosphate backbone". Many of the interactions between a protein and a DNA molecule occur between positively charged residues on the protein surface and the negatively charged phosphate groups of DNA. This type of interaction is very important in non-sequence specific binding between a DNA-binding protein and its target DNA molecule.

*Now start Swiss-PdbViewer and open the file "DNA.pdb". Open the Control panel.

A short stretch of DNA should appear in the graphics window. This is a piece of DNA corresponding to a sequence recognised by a human peptide hormone receptor.

Chains C and D (in the left part of CTRL panel) denotes the two DNA chains. They are numbered from the 5' end to the 3' end, the conventional direction of a DNA or RNA chain. The 5' end means the end of the ribose 5' hydroxyl group and 3' end is the end of the ribose 3' hydroxyl group. Check that you can find these.

*Move the molecule around and look at the sugar-phosphate backbone. Test to remove the "sidechains" (the nucleotide bases) from the view.
*Try to identify the major and minor grooves.
*Display only the G-C base-pair G10 of chain C and C9 of chain D. Center this base-pair for a closer look
*Select "H-bonds detection threshold" in the Prefs menu, and change to 3.5 Å for max distance and 30° for min angle.
*Compute hydrogen bonds (in the Tools menu; if not visible after computation, select to show them in the "Display" menu).

Q1. Where will the carbonyl group of the cytosine be located, in the major or minor groove?

Even if you already have answered the question:

*Turn the molecule around so that the carbonyl group of the cytosine will point against you when you look into the plane of the G-C base-pair.
*Select only this G-C base-pair in the CTRL panel and colour it red.
*Add the whole DNA-molecule to the view and zoom out slowly, trying to keep track on "your" G-C base-pair.

Look at the sugar-phosphate backbone and note especially the double-helix. If not before, you should be able to answer the question now.

*Select and display any A-T base-pair.

Q2. What chemical groups on an A-T base-pair do you think holds the specificity when "seen" by a protein from the major groove, i.e., how can a protein distinguish an A-T base pair from to a C-G base-pair?

Nuclear (steroid) hormone receptors

The glucocorticoid receptor (GR) belongs to a group of trancription factors called nuclear (steroid) hormone receptors. All steroid hormones are based on the "skeleton" of cholesterol and among these are glucocorticoid, estrogen and progesteron. Common "modules" among the receptors for these hormones are an N-terminal region with variable length and sequence, a DNA-binding domain (DBD) with a high degree of conserved amino acids, and a ligand binding domain (usually referred to as the LBD).
Many biochemical studies have been performed to learn more about these receptors. Especially the N-terminal regions of the receptors are very difficult to work with in the laboratory, but at the same time very interesting since much of the functional differences between the receptor types (beside the different ligand specificities) are thought to reside in these regions. These regions are e.g. thought to recruit other DNA-binding proteins such as TATA-box binding protein to activate transcription. The Swedish medical company Karo Bio AB was founded based on research on this class of receptors, especially the estrogen receptor (ER).

The DNA-binding domain has proven to be much easier to work with. It is folded into a stable domain and can easily be studied in isolation. A lot of biochemical, biophysical and structural studies have been performed on the DBDs from both GR and ER.
In the late 1980s, it was shown that the GR-DBD contained two zinc ions. It was suggested that the DBD contained two Zinc-fingers of a different type than the "classical" Zn-finger, which is a DNA-binding motif that contains a zinc ion coordinated by two histidines and two cysteines.

All steroid hormone receptors are thought to have evolved from a common ancestor. In some early eukaryotic organisms, gene-duplications must have occurred. Through mutations in the genes, these proteins gradually evolved to what we see today (remember that we only see the "survivors"): different specificities and physiological roles. Not only the proteins have evolved. There are also differences in where these receptors are found (in what cells) and to what extent, which means that also control mechanisms of gene expression has evolved.

We can learn a lot from sequence comparisons within a group of similar proteins. For instance, some residues might be exactly the same within the group, indicating that the amino acid at this position is very important for the function or the structure of the protein. Such examples are called conserved residues. The level of conservation does not need to be such that the amino acid residue is exactly the same. It is also possible that even in an important position, the protein function can "survive" a mutation where e.g., a hydrophobic residue is replaced by another hydrophobic residue of similar size or a charged residue is replaced by another charged residue with the same charge.

*Have a look at this multiple sequence alignment of a group of 11 different nuclear hormone receptor sequences from different organisms. Note that all except one of these are full-length sequences of receptors but there is also one sequence of only a DNA-binding domain (DBD). This sequence is thus much shorter than the others.
*Stars below the alignment indicate conserved amino acid residues.
*Find the sequence of the glucocorticoid receptor DNA-binding domain.

Q3. How many totally conserved histidines (H) and cysteines (C) can you find within the DBD of these receptors?

1gdc.pdb contains coordinates of an NMR structure of the DNA-binding domain of the glucocorticoid receptor.
*Before loading this file, close the DNA session and select the "General…" option in the Prefs menu and tick the box "Center molecule upon loading".
*open the 1gdc.pdb in SwissPdbViewer.
*Center on one of the zinc ions.

The zinc ions usually appear as grey dots, small balls or crosses.

Q4. Which residues coordinate the zinc ions? Were these among the conserved residues in the multiple sequence alignment? Why/why not?

The DNA binding site, the response element

The DNA sequence that a sequence specific DNA-binding protein recognizes can be called the "response element".
The response element for the GR-DBD is as follows. "N" means any base.


Note that this is a palindromic sequence, it reads the same backwards on the complementary chain, just like the sentence Was it a car or a cat I saw?
A palindromic binding site can be viewed as two binding sites in opposite directions linked together.

Q5. How would you colve the problem of binding a protein (the DBD) to a palindromic binding site? You will soon see how nature has solved this problem.

The DNA binding of GR
1r4r.pdb contains coordinates of the crystal structure of the GR-DBD in complex with a piece of DNA consisting of the GR response element and some extra basepairs.
This structure should give you the correct answer to Q5.
The two proteins chains are called A and B. You can select a chain by clicking the chain name in the control panel.

Q6. Which region of GR (residue numbers) is involved in the dimerization?

Display one of the momomers of the dimer and the part of DNA that it binds to:

*Display chain A as ribbon.
*Display also C2 -> G10 of chain C and C11 -> G19 of chain D.
*Display hydrogen bonds (Tools =>Compute hydrogen bonds)

One helix is usually referred to as the "recognition helix", since it holds most of the DNA binding.

Q7. Find the recognition helix (give residue numbers). Which DNA groove does it bind to?
Q8. Which residues in this helix interact with DNA? Can you find sequence specific or non-sequence specific interactions?

The GR is monomeric without DNA and dimerizes only upon DNA binding in a cooperative manner.

Q9. What would you expect to happen if base-pairs were to be added or removed between the two response elements? Try to motivate your answer.

Leucine zippers

In the second part of this practical you are going to study another DNA binding motif, called the bZIP. Many transcription factor proteins, for example GCN4, cJun, AP-1, cFos-cJun, contain this motif. On most occasions, these proteins are found as dimers (both homo and heterodimers) but some can also be found as monomers.

You will look at GCN4 and cJun, which are examples of homodimers, and the heterodimer of cJun-cFos. You will investigate the interactions between the proteins as well as the interactions between protein and DNA. Just like the nuclear hormone receptor DBD:s, these structures are in vivo a part of much larger proteins and were divided into smaller, more stable parts for structure determination.

The homodimer of GCN4
*Load the structure model of the GCN4 (mono_zip.pdb).

Look at the double alpha-helical structure of GCN4. This motif is called the bZIP motif. It contains one DNA binding region and one dimerization region.

Q10. Identify these two regions: make a simple drawing of the structure and show where the dimerization and DNA binding regions are.

You have probably noticed that the two helices in this structure are "wrapping" around each other. This is the structural motif called coiled-coils.

Have a closer look at the dimerization region. (It might be easier if you display ‘Ca Atoms Only’ and colour the residues you want to look at in a different colour.)

Q11. What amino acid residues are dominating in the interface between the helices?
Q12 What type of interactions are they involved in?

We will now take a closer look at the DNA binding region.

*Move the structure so you can see this region.
*Display the whole structure from the control panel.
*Colour the structure by "CPK" (different colours for the different atom types).
*Select: Group property: Basic
*Click on the columns “show” and “side” in the control panel.

Repeat the last two commands selecting the other "properties" (acidic, non-polar, polar).
By holding down the <shift> key when clicking on the words “show” and “side” in the control panel, the selected items (residues) will be added to what is displayed.

Q13. What amino acids are dominating in this region of the protein? What is the common property of these residues?
Q14 Can you guess to what part of the DNA these residues bind to?

The ‘b’ in the b/ZIP motif stands for ‘basic’.

Q15. Can you guess what the ‘ZIP’ stands for?

The DNA binding of GCN4

*Close the GCN_1 session (but keep the SwissPdbViewer program open).
*Open GCN4.pdb to display the GCN4-DNA complex .

Interactions with DNA can either be sequence specific or non-sequence specific, depending on with what part of the DNA the residues interact. As you will see, this protein makes many interactions with the DNA.

*Select menu: "Groups close to another chain", tick "Display only groups that are within" and type 4Å.
*Move the structure to focus on the DNA binding domain.
*In the Tools menu: "Compute H-bonds"

*Find hydrogen bonds between the protein and the DNA.

Q16. Which interactions are DNA sequence specific and which are non-sequence specific?

The heterodimer between cJun and cFos
Open the file called 1FOS.pdb, which contain the heterodimer of human cJun and cFos (in the file cFos is chain E and cJun is chain F and they appear after each other in the control panel).

Q17. Look at the dimerization region. Which amino acid type(s) dominate?

Compare with what you saw in the dimerization region of GCN4.

Q18. What are the similarities?

cJun can make both homodimers and heterodimers (with cFos). cFos, on the other hand, cannot form a homodimer, only the heterodimer with cJun.

*Select and colour the basic and acidic residues differently and look at the distribution.

Q19. Looking at the structure of the cJun-cFos dimer, can you guess why cFos cannot form homodimers?

If you have problems with the question you can download this file with some important sidechains already coloured (cfos_cjun.spdbv).
You can also download a file with a theoretical homodimer cFos-cFos (cfos_cfos.spdbv).

This practical was originally written 2003 by Henrik Hansson & Evalena Andersson, Uppsala Universitet.
Edited 2006 by Mats Sandgren, Uppsala Universitet.
Edited 2007-2008 by Maria Selmer, Uppsala Universitet.