David F. Hickok Memorial Cancer Research Laboratory
Abbott Northwestern Hospital
800 E. 28th St., Minneapolis, MN 55407
Correspondence should be addressed to: Lester F. Harris, PhD.
Submitted for publication: December 1997
Keywords: site-specific DNA recognition, steroid receptors, estrogen response element, genetic code origins.
We present findings of genetic information conservation between estrogen response element (ERE) DNA and the cDNA encoding the estrogen receptor (ER) DNA binding domain (DBD). The regions of nucleotide subsequence similarity to the ERE in the ER DBD occur specifically at nucleotide sequences on the ends of exons 2, 3, and 4 at their splice junction sites. These sequences encode the DNA recognition helix in exon 2, a beta strand in exon 3, and a predicted alpha helix in exon 4, respectively. The nucleotide sequence of exon 4 that encodes the predicted alpha helix shares genetic information with the flanking nucleotide regions of the ERE. In addition, this same ER exon 4 subsequence encodes a putative nuclear localization domain. We generated a computer model of the ER DBD using atomic coordinates derived from nuclear magnetic resonance (NMR) spectroscopy to which we attached the exon 4 encoded predicted alpha helix. We docked this ER DBD structure at 29 base pair ERE and flanking nucleotide sequences from the Xenopus laevis Vitellogenin A1 and A2 genes which contained conserved genetic information. We observed that ER DBD amino acids of the exon 2 encoded DNA recognition helix, the exon 3 encoded beta strand and the exon 4 encoded predicted alpha helix are spacially aligned with trinucleotides identical to their cognate codons within the ERE DNA major groove halfsites and flanking nucleotides.
Steroid hormone binding proteins are members of a superfamily of DNA regulatory proteins (1). The proteins specifically interact with DNA at nucleotide sequences termed hormone response elements (HREs) and regulate transcription (2).The proteins contain a DNA binding domain (DBD) consisting of two zinc binding motifs whose amino acid sequences are highly conserved among the steroid receptor proteins (3). However, there are differences in the amino acid sequences among the proteins' DNA recognition helices located adjacent to the first zinc binding motif within the DBDs (4). Likewise, the nucleotide sequences of the HREs to which the proteins' specifically bind differ (5). These differences are instrumental in DNA recognition and suggest the existence of a recognition code.
Our laboratory has long been interested in site specific DNA recognition by DNA regulatory proteins and has made several key observations (4, 6-9). Earlier we compared nucleotide sequences containing known hormone response elements (HREs) with cDNA sequences encoding the DBD of steroid receptor proteins (4). We observed that within the cDNA sequences encoding the steroid receptor DBDs were regions which shared maximal nucleotide subsequence similarity with HREs. These cDNA subsequences encoded predicted alpha helical structures . We proposed that these predicted alpha helices may serve as DNA recognition helices (4). Subsequently, our prediction of the glucocorticoid receptor (GR) DNA recognition alpha helix amino acid sequence, its location within the GR DBD and its orientation toward the DNA within the glucocorticoid response element (GRE) DNA major groove halfsites was confirmed by NMR (10) and X-ray crystallography (11). Likewise, our prediction of the estrogen receptor (ER) DNA recognition alpha helix amino acid sequence and location within the ER DBD was confirmed by NMR (12).
Recently the genomic structure of the human ER gene was determined (13); the two zinc binding motifs of the DBD are separately encoded by two of the eight exons, namely exons 2 and 3. The DNA recognition helix encoded in exon 2 is located at the carboxyl terminus of the first zinc binding motif (4). Adjacent to the DNA recognition helix is a structure which has been determined by NMR (12) to be a beta strand. This beta strand encoded in exon 3 is located on the amino terminus of the second zinc binding motif at the splice junction site of exons 2 and 3. This splice site occurs at a conserved glycine residue within the steroid receptor family which connects the ER DNA recognition helix and beta strand structures as a bridge which joins the two zinc binding motifs. The carboxyl terminus of the ER DBD contains a predicted alpha helix structure. This predicted alpha helix is encoded in exon 4 at the exon 3 and 4 splice junction site.
We report herein that genetic information is conserved between estrogen response elements (EREs) and their flanking nucleotides upstream of the Xenopus laevis Vitelligenin A1 and A2 genes (GENBANK loci XLVITA15 and XLVIT2A) and the nucleotide subsequences within exons 2, 3 and 4 of the ER DBD, (GENBANK locus HSERR). These ER DBD cDNA subsequences that are maximally similar to the EREs and flanking nucleotides encode the ER DNA recognition helix, a beta strand and a predicted alpha helix located at the splice junction sites joining exons 2, 3 and 4, respectively.
MATERIALS AND METHODS
Nucleotide sequence data was taken from cited references and GENBANK, a computer database of DNA and RNA sequences. LOCAL is a program which searches for maximally similar subsequences between any two amino acid or nucleic acid sequences using a dynamic programming matrix algorithm (14). Gap weighting and mismatch values used were: unity for matches, 0.9 for mismatches and -(0.9 + 1.01 * length) for gaps. LOCAL is an academic software package distributed by the Harvard Medical School Molecular Biology Computer Research Resource (MBCRR), Dana-Farber Cancer Institute, Harvard School of Public Health, 44 Binney Street JF815, Boston, Massachusetts 02115.
All computer models were created using QUANTA software running on a Silicon Graphics Inc. IRIS 4D 320-GTX, Octane and O2 graphics workstations. Quanta is a molecular modeling and display tool developed by Molecular Simulations Inc., 9685 Scranton Road, San Diego, California 92121-3752 which allows the construction of molecular models of DNA sequences, point mutations of existing models and the modeling of small peptides with a selected secondary structure. The model of the ER DBD was derived from an average of 30 NMR coordinates sets from Brookhaven Protein Databank entry 1HCP and docking to the DNA was based on Brookhaven Protein Databank entry 1HCQ. However, critical residues following ER Gly 252 on the carboxyl flank of the ER DBD in the NMR (15) and X-ray crystallography (16) structural determinations were disordered, and no coordinates were reported. Since the amino acid sequence ranging from Gly 252 to Gly 262 contained our predicted alpha helix of exon 4, we created an alpha helix of the exon 4 encoded amino acids ranging from 253 to 262 and attached this structure to Gly 252 in our computer model. Atomic coordinates of the ER exon 4 encoded putative DNA binding alpha helix were computed using the SEQUENCE BUILDER module of the QUANTA program. This module allows the construction of molecular models of small peptides and folds them into a selected secondary structure. This module was also used to generate coordinates for the 29 base pair B-type DNA ERE nucleotide sequences upstream of the Vitelligenin A1 and A2 genes from GENBANK loci XLVITA15 and XLVIT2A used in figures 3, 4 and 5.
Using the genomic structure of the ER gene (13) as a guide, we conducted computer based nucleotide sequence similarity searches between a nucleotide sequence (-428 to -278) upstream of the Vitelligenin A1 (VITA1) gene and a cDNA sequence (898 to 1116) of ER DBD exons 2 and 3 which encode the first and second zinc binding motifs, respectively. The results are shown as a schematic in figure 1a. The maximally similar subsequence within VITA1 included two EREs, ERE1 and ERE2, and within the ER DBD a nucleotide sequence of exon 2 which encodes the first zinc binding motif and the DNA recognition helix, see figure 1b.
We also compared the nucleotide sequence (954 to 1005) of ER DBD exon 2 that encodes the ER DNA recognition helix to the same VITA1 sequence as above, see figure 1c. The maximally subsequence similarity for the cDNA of the ER DNA recognition helix occurred within ERE 2 of VITA1 as seen above. Likewise, we compared the VITA1 nucleotide sequence to the nucleotide sequence of ER DBD (964 to 1023) which spans the splice junction site of ER DBD exon 2 and exon 3 encoding the ER DNA recognition helix of exon 2 and a beta strand of exon 3; the results are shown in figure 1d. The maximally similar subsequence occurred within ERE2 for the exon 3 encoded beta strand. We also compared to VITA1 a nucleotide sequence (1120 to 1206) of ER exon 4 which encodes a predicted alpha helix, see figure 1e. The maximal nucleotide subsequence similarity between the nucleotide sequence upstream of VITA1 and the putative alpha helix encoded region of exon 4 occurred within the same ERE2 site as above. It is significant that this ERE2 site, of the two present in the sequence (-428 to -278) upstream of the VITA1 gene, contains a perfect palindrome of the consensus TGACC recognition motif in its right and left DNA major groove halfsites.
Finally, we compared a nucleotide sequence upstream of the Vitelligenin A2 gene (VITA2) (-385 to -256) with the same cDNA ER DBD sequence (898 to 1116) as above. The results are shown as a schematic in figure 2a. The maximal ER DBD cDNA subsequence similarity for the VITA2 ERE occurs within the nucleotide sequence of the ER DBD encoding the DNA recognition helix, see figure 2b. The ERE of VITA2 contains an inverted repeat of the consensus TGACC in both major groove halfsites as seen for VITA1 ERE2 above.
The structural determination of the ER DBD has allowed us to create a model to study exon 2, 3 and 4 encoded structures of the ER DBD in relationship to genetic information conservation at ERE sites. A model of the ER DBD was constructed from NMR atomic coordinates of the ER DBD with a putative exon 4 encoded alpha helix attached. We also created 29 bp B-type DNA computer models of the nucleotide sequences upstream of VITA1 and VITA2 genes containing EREs and flanking regions which showed genetic sequence similarity to the ER DBD exons 2, 3 and 4. Computer models of the ER DBD docked at these 29 bp B-DNA sequences with areas of conserved genetic information highlighted in the protein and EREs and flanking nucleotides are shown in figures 3, 4 and 5. The VITA1 ERE1 sequence described above did not show primary nucleotide sequence similarity with the cDNA encoding the ER DNA recognition helix, see figure 1a, although this ERE sequence is reported to specifically bind the ER protein (19). However, since there are multiple codons for the majority of the 20 amino acids it is necessary to look at every possible reading frame 5' to 3' on both strands in order to determine the extent of genetic information within a given nucleotide sequence. Using this approach we observed codon sites embedded within ERE1 and ERE2 nucleotide sequences of VITA1 and the ERE of VITA2 for amino acids of the ER DNA recognition helix encoded by exon 2, the beta strand of exon 3 and the putative alpha helix of exon 4. The results are shown in figure 3d for ERE1 of VITA1, 4d for ERE2 of VITA1 and 5d for the ERE of VITA2.
Remarkably, the DNA recognition helix and beta strand structures encoded by exons 2 and 3 are aligned with areas of conserved genetic information within the DNA major groove halfsites of the EREs while the putative alpha helix structure encoded by exon 4 aligns with conserved genetic information in the flanking nucleotide regions of the EREs, see figure 3 a-d for ERE 1 of VITA1, figure 4 a-d for ERE 2 of VITA1 and figure 5 a-d for the ERE of VITA2.
It has been reported that estrogen receptor protein site specific DNA recognition and high affinity binding occurs within adjacent ERE DNA major groove halfsites containing a perfect palindrome of the consensus TGACC recognition motif (5). The ER protein also specifically binds at EREs consisting of imperfect palindromes of the TGACC motif but with lower affinity (19). Nucleotides flanking the ERE DNA major groove halfsites are also involved in site specific DNA binding (20). Recently the minimal ER DBD amino acid sequence was determined using a gel retardation assay (21). The minimal ER DBD sequence consists of the 66 amino acids comprising the two zinc binding motifs (a.a 185-250) followed by a 12 amino acid sequence rich in basic residues (a.a 251-262).
We report herein that genetic information is conserved within nucleotides of the ERE DNA major groove halfsites and flanking regions for ER DNA recognition helix amino acids (a.a 203-215) encoded by exon 2, for a beta strand structure containing amino acids (a.a 216-221) encoded by exon 3, and for a predicted alpha helix containing amino acids (a.a 253-262) encoded by exon 4, respectively. These exon 4 encoded amino acids have been reported to be important for specific DNA binding by the ER protein to ERE sites (21) and are putatively responsible for nuclear localization of the ER protein (22). However, critical residues following ER Gly 252 on the carboxyl flank of the ER DBD in the NMR (12) and X-ray crystallography (16) structural determinations were disordered, and no coordinates were reported giving no indication as to the possible role of amino acids 253-262 in DNA binding. Our ER DBD model with the exon 4 encoded putative alpha helix attached as reported herein offers a means for further study.
In this regard, we recently reported that genetic information is conserved between a glucocorticoid response element (GRE) and its flanking nucleotides and the cDNA of the glucocorticoid receptor (GR) DBD at splice junction sites in exons 3, 4 and 5 encoding the GR DNA recognition helix in exon 3, a beta strand in exon 4 and a predicted alpha helix in exon 5, respectively (7). Similar to the reported ER DBD structure (15, 16), critical residues following Arg 510 on the carboxyl flank of the GR DBD in the NMR (10) and X-ray crystallography (11) structural determinations were disordered, and no coordinates were reported giving no indication as to the possible role of exon 5 encoded amino acids 511-517 in DNA binding. By conducting molecular dynamics simulations in solvent using a model of the GR DBD with the putative exon 5 encoded alpha helix attached in complex with a GRE and flanking nucleotides, we observed that amino acids of the GR DBD exon 3 encoded DNA recognition helix, the exon 4 encoded beta strand and the exon 5 encoded putative DNA binding alpha helix interacted with their cognate codon/anticodon nucleotides conserved within the GRE DNA major groove halfsites and flanking regions (9).
Codon recognition has also been observed in Tetrahymena group I self-splicing intronic RNA by arginine (23). The arginine sidechain shows stereo-selective binding for its codons AGA, CGA and AGG, which are conserved at the catalytic site in 66 group I sequences (24). These observations of specific amino acid-codon interactions are consistent with our earlier findings (4, 6-9) as well as those we report herein. It is interesting to note that the putative ER DBD exon 4 encoded DNA binding alpha helix is rich in arginine residues which are aligned with arginine codons within the ERE DNA major groove halfsites and flanking nucleotides (see figure 3 c, d, 4 c, d and 5 c, d). Our observations of genetic information conservation within the EREs and flanking nucleotides for the ER DBD amino acids offers an explanation for site specific DNA recognition among EREs including those consisting of imperfect palindromic sequences differing by one or more nucleotides from the consensus TGACC motif.
Recently Muller-Hill and Kolkhof (25) reported that the specific DNA binding site for the gal S repressor is identical in nucleotide sequence to the cDNA sequence coding for residues 1-6 of the DNA recognition helix of this repressor. These findings support our original findings and hypothesis of a relationship between site specific DNA recognition by regulatory proteins and the genetic code (4, 6-9). However, in contrast to the findings of Muller-Hill and Kolkhof (25), we did not find any wild type operator or response element with identical nucleotide sequence to the coding region for amino acids of its cognate protein's DNA recognition helix. Instead, by converting the nucleotide sequences of the operators or HREs to amino acids in all three reading frames on both strands, we located codon sites embedded in overlapping reading frames, for amino acids of the regulatory proteins' DNA recognition helices. By model building of protein/DNA complexes, we observed that amino acids within regulatory proteins' DNA recognition helices are consistently found oriented toward and lining up with trinucleotides identical to their cognate codon-anticodon sites within their operator or HRE DNA binding major groove halfsites (4, 6-9).
Our findings suggest that regulatory proteins' DNA recognition helices and their cognate DNA binding sites may be conserved remnants of primordial structures capable of molecular recognition. The spacial alignments of amino acids of the exon 2, 3 and 4 encoded structures of the ER DBD with trinucleotides identical to their cognate codons within the EREs and flanking nucleotides as reported herein are consistant with our findings for amino acids of exon 3, 4 and 5 encoded GR DBD structures and their codons within GREs and flanking nucleotides (7). Our observations suggest that these structures may have been template dependent in their evolution (i.e. peptides acting as templates for nucleotide polymerization or vice-versa) (26,27). Therefore we propose that prebiotic, template directed autocatalytic synthesis of mutually cognate peptides and polynucleotides resulted in their amplification and evolutionary conservation in contemporary prokaryotic and eukaryotic organisms as a genetic regulatory apparatus.
Recently, we reported findings from molecular dynamics simulations in solvent of the ER DBD in complex with the 29 base pair "non-consensus" ERE1 and flanking nucleotide sequence of VITA1, as shown in figure 3 a-d above. We observed that ER DBD amino acids of the exon 2 encoded DNA recognition helix, the exon 3 encoded beta strand and the exon 4 encoded predicted alpha helix interacted with their cognate codon-anticodon nucleotides within the ERE1 DNA major groove halfsites and flanking regions, respectively(28).
Our findings are consistent with our hypothesis that the origin of the genetic code and a site specific DNA recognition code have the same underlying mechanism. The basic mechanism of this recognition is stereochemical complementarity between amino acid sidechains and their cognate codon/anticodon nucleotides. Furthermore, our results indicate that our hypothesis, applied to genetic sequence analysis, secondary structural prediction and molecular model building, can be used as a predictive tool for determining sites on DNA regulatory proteins which recognize cognate DNA binding sites.
We thank Molecular Simulations Inc. staff for software support with QUANTA and CHARMm, Michael Fenton of Fentonnet Inc. for data reduction programs, the Minnesota Supercomputer Institute Scientific Director, Don Truhlar for support and encouragement, the Minnesota Supercomputer Center user services representatives for technical support on the CRAY-C-90, and special thanks are due to Charlie Larson of Silicon Graphics Inc. for hardware support with the IRIS 4D 320-GTX, Octane and O2 workstations. We are sincerely grateful to Professor Thomas C. Spelsberg of the Department of Biochemistry and Molecular Biology, Mayo Foundation, Rochester, MN for preliminary review of the manuscript and encouragement. This work was supported in part by a research grant from the Minnesota Supercomputer Institute, Minneapolis MN. This work was also supported by a research fellowship dedicated to the memory of William Lang Jr.