An evolutionary approach for identifying potential transcription factor

special communication
Am J Physiol Regul Integr Comp Physiol 284: R1147–R1150, 2003;
10.1152/ajpregu.00448.2002.
An evolutionary approach for identifying potential
transcription factor binding sites: the renin gene as an example
Ralf Mrowka, Karola Steinhage, Andreas Patzak, and Pontus B. Persson
Johannes-Müller-Institut für Physiologie, Charité,
Humboldt-Universität zu Berlin, D-10117 Berlin, Germany
Submitted 24 July 2002; accepted in final form 25 December 2002
ADVANCES in the sequencing projects of closely related
organisms have opened the door for sequence comparisons. Currently, the complete human, mouse, and
parts of the rat genome are available. Of the approximately three billion base pairs of the human genome,
only a very small fraction contains sequences coding
for proteins. Our understanding of the functions and
importance of the remaining large fraction is rapidly
increasing: promotor and regulatory functions and previously unknown functions are brought about by noncoding regions. For example, noncoding sequences are
discussed to prevent the ends of the chromosome from
fraying during cell division (27).
Evolutionary pressure has resulted in the conservation of certain nucleotide sequences. The conservation
of these regions may be taken as an indication of their
potential functional importance (6). Noncoding motifs
can be conserved over 800 million years, as has been
shown for the HOX gene cluster that is important in
developmental processes (8). Moreover, noncoding sequences can influence gene expression of very distant
genes that are separated by 120 kb, which was demonstrated in an impressive study based on computational identification of conserved sequences (11).
Obtaining candidates for gene regulatory sites is
more difficult than identifying exons, because they are
small in size and may be situated far from their target
genes (11). A first step to their verification is the
identification of noncoding sequences among closely
related organisms, such as humans, mice, and rats. If a
noncoding sequence is important for the organism, one
expects a certain evolutionary pressure on that sequence. In consequence, this sequence evolves at a
slower rate compared with sequence regions, which do
not have a function constrained to sequence. The feasibility of this method to identify functionally important noncoding sequences is established. For instance,
Wasserman et al. (29) found that 74 of 75 transcription
factor (TF) binding sites (TFBS) are located within
these noncoding sequences. They applied a type of
consensus searching algorithm (Gibbs sampler) on
flanking regions of skeletal muscle-specific genes and
conclude that the identified consensus motifs are only
biologically meaningful if the search is restricted to
conserved noncoding regions.
To identify noncoding sequences of the renin gene,
we compare the upstream (5⬘) noncoding DNA. In this
study we focus on the renin gene because it has great
importance for cardiovascular and renal homeostasis
(2–5, 7, 12, 17, 25, 26, 28). To explore possible important regions for the regulation of the human renin gene
(hREN), we conducted a bioinformatics approach comparing human, mouse, and rat noncoding sequences
upstream of the gene. Our approach further applies a
combination with other independent database information of weight matrices for TFBS.
We estimated the homology of noncoding DNA between the human, the mouse, and the rat DNA sequences around the renin gene, which are presented as
a percent identity plot (PIP, Fig. 1) (20). About 11–15
kb upstream of the human renin gene, a 3.9-kb-long
block of human DNA hRENc was identified that con-
Address for reprint requests and other correspondence: R. Mrowka,
Johannes-Müller-Institut für Physiologie, Humboldt-Universität zu
Berlin, Tucholskystr. 2, D-10117 Berlin, Germany (E-mail: ralf.
[email protected]).
The costs of publication of this article were defrayed in part by the
payment of page charges. The article must therefore be hereby
marked ‘‘advertisement’’ in accordance with 18 U.S.C. Section 1734
solely to indicate this fact.
noncoding sequences; cross-species conservation
http://www.ajpregu.org
0363-6119/03 $5.00 Copyright © 2003 the American Physiological Society
R1147
Downloaded from http://ajpregu.physiology.org/ by 10.220.33.4 on July 12, 2017
Mrowka, Ralf, Karola Steinhage, Andreas Patzak,
and Pontus B. Persson. An evolutionary approach for identifying potential transcription factor binding sites: the renin
gene as an example. Am J Physiol Regul Integr Comp Physiol
284: R1147–R1150, 2003; 10.1152/ajpregu.00448.2002.—
Evolutionary pressure has resulted in the conservation of
certain nucleotide sequences. These conserved regions are
potentially important for certain functions. Here we give an
example of a comparison between noncoding sequences combined with other independent database information to shed
light onto the regulation of the renin gene, a gene that has
great importance for cardiovascular and renal homeostasis.
To combine the information regarding conservation and
weight matrices of transcription factor (TF) binding sites, an
algorithm was developed (TFprofile). Notably, a local peak in
the resulting binding profile coincides with a previously experimentally identified regulatory region for the renin gene.
The existence of further peaks in the binding profile in the
conserved 3.9-kb-long hRENc DNA block upstream of the
renin gene suggests additional regions of potential importance for gene regulation. The algorithm TFprofile may be
used to integrate information on cross-species evolutionary
conservation and aspects of TF binding characteristics to
provide putative regulatory DNA regions for experimental
verification.
R1148
NEW APPROACH TO IDENTIFYING POTENTIAL TF BINDING SITES
tained a large number of conserved elements (for detailed description please refer to the APPENDIX). We
calculated the percent identity estimates of hRENc to
the corresponding DNA regions for mouse and rat
using blastz (20). The hRENc DNA block was searched
for TFBS using MatInspector (16) with matrices for
vertebrates. To combine the information regarding
conservation and binding sites, a special algorithm was
developed (TFprofile).
The way TFprofile calculates a weighted binding TF
profile to identify regulatory regions is as follows. We
combine three independent parameters that suggest a
functional regulatory relevance of DNA sequence.
These parameters include 1) the number of TFs that
have a putative binding site at this location. This
provides valuable information because enrichment of
putative TFBS in conserved noncoding regions (9) indicates functional importance; 2) the quality of the
match of each putative TF quantified by the core and
matrix similarity of the TF with the DNA. This is done
because a high score of the binding matrix corresponds
to a stronger probability of binding (18, 19). A TF that
does not bind may not have a functional relevance at
the given position; hence, a higher functional relevance
may be associated with stronger binding; and 3) the
AJP-Regul Integr Comp Physiol • VOL
degree of conservation, because regulatory elements
are strongly enriched in conserved noncoding DNA (29).
These three parameters were combined, resulting in
the following mathematical procedure.
First, TFprofile computed the TF density, which is
the number of different TFs that span each particular
base position. This was done for the hRENc DNA
sequence. In a second step, obeying conservation and
weight matrix information, TFprofile calculated the
weighted TF density by multiplying the TF density
with the product of core and matrix similarity of the
weight matrix, multiplied by the product of the identity
scores in each species. Identities of ⬍50% were excluded, i.e., set to zero. The result of this mathematical
procedure yielded a binding profile as shown in Fig. 2.
There is no uniform distribution of the binding profile across the hRENc block; however, several local
peaks do exist. One peak coincides with a known experimentally verified regulatory region (13, 14, 21–24).
This regulatory region was first identified in mouse
by Petrovic et al. (15) and in humans by Yan et al. (30).
Shi et al. (21) first reported on component elements in
detail. They determined by gel competition and supershift analysis that nuclear factor-Y (NF-Y), a ubiquitous CAAT-box binding protein, binds to a part of that
284 • APRIL 2003 •
www.ajpregu.org
Downloaded from http://ajpregu.physiology.org/ by 10.220.33.4 on July 12, 2017
Fig. 1. Percent identity plot (PIP) for
renin gene (hREN) on chromosome
1q32 and the 2 human neighboring
genes KISS1 and FLJ10861. Each plot
shows the position in the human sequence (horizontal axis) and the percent identity (vertical axis) of each
aligning sequence of mouse (mREN1,
mREN2) and rat (rnREN). This plot
was used to identify the region of a
3.9-kb DNA block hRENc containing
conserved sequences approximately
11–15 kb upstream of hREN. (Note:
The x-axis refers to the human sequence, i.e., distances refer to the human and do not reflect distances for the
other organisms. To see the distance
relationships, please refer to the dotplots in supplementary material at
http://www.charite.de/bioinformatics/
tfprofile). UTR, untranslated region.
NEW APPROACH TO IDENTIFYING POTENTIAL TF BINDING SITES
R1149
sequence. Furthermore, Shi et al. (21) detected a lossof-function mutation in the human conserved sequence
that can restore its trans-activating function when
reverted to match the mouse sequence. Further experimental studies have shown that the TF NF-Y is involved in the blocking of stimulatory TFs (22). In addition, it has been demonstrated by electrophoretic
mobility shift assays that TFs bind to the cAMP-responsive element (CRE) box and E-box of the conserved
region, and further experiments indicate that they are
important for activation (13). Interestingly, the conserved renin enhancer contains a putative vitamin D
receptor binding site, and based on clinical observations and mouse knockout experiments, Li et al. (10)
suggest that renin expression and blood pressure are
directly dependent on vitamin D3 (24).
The potential functional relevance of the two remaining peaks in the binding remain to be assessed by
future studies.
Our TFprofile approach differs from the algorithm
applied by Wassermann et al. (29). Their approach
involved a consensus search in conserved flanking regions of many coexpressed genes. However, considering the specific function of the renin gene, it is unlikely
to find many genes with similar expression patterns.
Hence this approach is not comparable with that of the
study analyzing muscle-specific genes (29).
In conclusion, we describe the results of a computational method (TFprofile) that combines information
from weight matrices of TF binding with the information
of evolutionary conservation, resulting in a binding profile. Notably, a local peak in the binding profile coincides
with a previously experimentally identified regulatory
region for the renin gene. The existence of further peaks
AJP-Regul Integr Comp Physiol • VOL
in the binding profile in the conserved 3.9-kb-long hRENc
DNA block upstream of the renin gene suggests additional regions of potential importance. The human,
mouse, and rat sequencing efforts as well as computational tools such as PipMaker, MatInspector, and TFprofile, as well as databases like GenBank, make a first-step
DNA analysis possible in silico. Genome sequencing of
further closely related species such as the monkey will
provide further information and may improve the basis
for statistical analysis related to the search for regulatory
elements. The algorithm TFprofile may be used to integrate information on cross-species evolutionary conservation and aspects of TF binding characteristics to provide
putative regulatory DNA regions for experimental verification.
APPENDIX
Identification of hRENc, a 3.8-kb human DNA sequence
upstream of the renin gene containing conserved elements. An
80-kb DNA sequence (hRcS) containing the renin gene from
human chromosome 1q32 was extracted from GenBank. Repetitive elements in the human genomic sequence were masked
using Repeat Masker program (A. Smit P. Green, unpublished
work). We identified the orthologous mouse genomic sequence
using BLAST (1) at a local mouse chromosome database downloaded from National Center for Biotechnology Information. We
found two genomic DNA blocks harboring the mREN sequence,
whereby the latter block contains two tandem duplications of
mREN1 and mREN2. The corresponding rat sequence rnREN
was found in the renin clone CH230–198L8 of Rattus norvegicus. The pieces were reverted to its reverse complement sequence when appropriate, to have all sequences in the same
orientation. We estimated the homology between the human,
the two mouse, and the rat sequences (20), which are presented
as a PIP (Fig. 1). About 11–15 kb upstream of the human renin
284 • APRIL 2003 •
www.ajpregu.org
Downloaded from http://ajpregu.physiology.org/ by 10.220.33.4 on July 12, 2017
Fig. 2. Top: weighted transcription factor (TF) binding profile in the 3.9-kblong hRENc DNA block, upstream of
the renin gene. TFprofile computes for
each position the number of possible
different TFs spanning that region
weighted by TF core and matrix similarity to the sequence (gray), and this is
additionally weighted by the degree of
conservation, giving the final result
(bold). One peak of the calculated binding profile coincides with an experimentally verified regulatory region of
the renin gene. The curves represent a
150 bp moving average of each profile.
Bottom: corresponding percent identity
of each aligning sequence from mouse
(mRENc; black) and rat (rnRENc; red).
Calculations and this figure do not contain the mouse tandem duplications.
R1150
NEW APPROACH TO IDENTIFYING POTENTIAL TF BINDING SITES
REFERENCES
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z,
Miller W, and Lipman DJ. Gapped BLAST and PSI-BLAST: a
new generation of protein database search programs. Nucleic
Acids Res 25: 3389–3402, 1997.
2. Bergeron R, Kjaer M, Simonsen L, Bulow J, Skovgaard D,
Howlett K, and Galbo H. Splanchnic blood flow and hepatic
glucose production in exercising humans: role of renin-angiotensin system. Am J Physiol Regul Integr Comp Physiol 281:
R1854–R1861, 2001.
3. Brown R, Ollerstam A, Johansson B, Skott O, Gebre-Medhin S, Fredholm B, and Persson AE. Abolished tubuloglomerular feedback and increased plasma renin in adenosine A1
receptor-deficient mice. Am J Physiol Regul Integr Comp Physiol
281: R1362–R1367, 2001.
4. Cheng HF, Wang SW, Zhang MZ, McKanna JA, Breyer R,
and Harris RC. Prostaglandins that increase renin production in
response to ACE inhibition are not derived from cyclooxygenase-1.
Am J Physiol Regul Integr Comp Physiol 283: R638–R646, 2002.
5. Cholewa BC and Mattson DL. Role of the renin-angiotensin
system during alterations of sodium intake in conscious mice.
Am J Physiol Regul Integr Comp Physiol 281: R987–R993, 2001.
6. Hardison RC. Conserved noncoding sequences are reliable
guides to regulatory elements. Trends Genet 16: 369–372, 2000.
7. Kammerl MC, Richthammer W, Kurtz A, and Kramer BK.
Angiotensin II feedback is a regulator of renocortical renin,
COX-2, and nNOS expression. Am J Physiol Regul Integr Comp
Physiol 282: R1613–R1617, 2002.
8. Kim CB, Amemiya C, Bailey W, Kawasaki K, Mezey J,
Miller W, Minoshima S, Shimizu N, Wagner G, and Ruddle
AJP-Regul Integr Comp Physiol • VOL
22.
23.
24.
25.
26.
27.
28.
29.
30.
F. Hox cluster genomics in the horn shark, Heterodontus francisci. Proc Natl Acad Sci USA 97: 1655–1660, 2000.
Levy S, Hannenhalli S, and Workman C. Enrichment of
regulatory signals in conserved non-coding genomic sequence.
Bioinformatics 17: 871–877, 2001.
Li YC, Kong J, Wei M, Chen ZF, Liu SQ, and Cao LP.
1,25-Dihydroxyvitamin D3 is a negative endocrine regulator of
the renin-angiotensin system. J Clin Invest 110: 229–238, 2002.
Loots GG, Locksley RM, Blankespoor CM, Wang ZE, Miller
W, Rubin EM, and Frazer KA. Identification of a coordinate
regulator of interleukins 4, 13, and 5 by cross-species sequence
comparisons. Science 288: 136–140, 2000.
Marsh AC, Gibson KJ, Wu J, Owens PC, Owens JA, and
Lumbers ER. Chronic effect of insulin-like growth factor I on
renin synthesis, secretion, and renal function in fetal sheep.
Am J Physiol Regul Integr Comp Physiol 281: R318–R326, 2001.
Pan L, Black TA, Shi Q, Jones CA, Petrovic N, Loudon J,
Kane C, Sigmund CD, and Gross KW. Critical roles of a cyclic
AMP responsive element and an E-box in regulation of mouse
renin gene expression. J Biol Chem 276: 45530–45538, 2001.
Pan L, Xie Y, Black TA, Jones CA, Pruitt SC, and Gross
KW. An Abd-B class HOX. PBX recognition sequence is required
for expression from the mouse Ren-1c gene. J Biol Chem 276:
32489–32494, 2001.
Petrovic N, Black TA, Fabian JR, Kane C, Jones CA, Loudon JA, Abonia JP, Sigmund CD, and Gross KW. Role of
proximal promoter elements in regulation of renin gene transcription. J Biol Chem 271: 22499–22505, 1996.
Quandt K, Frech K, Karas H, Wingender E, and Werner T.
MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res 23: 4878–4884, 1995.
Sayago CM and Beierwaltes WH. Nitric oxide synthase and
cGMP-mediated stimulation of renin secretion. Am J Physiol
Regul Integr Comp Physiol 281: R1146–R1151, 2001.
Schneider TD. Information content of individual genetic sequences. J Theor Biol 189: 427–441, 1997.
Schneider TD, Stormo GD, Gold L, and Ehrenfeucht A.
Information content of binding sites on nucleotide sequences. J
Mol Biol 188: 415–431, 1986.
Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer C, Bouck
J, Gibbs R, Hardison R, and Miller W. PipMaker—a web
server for aligning two genomic DNA sequences. Genome Res 10:
577–586, 2000.
Shi Q, Black TA, Gross KW, and Sigmund CD. Speciesspecific differences in positive and negative regulatory elements
in the renin gene enhancer. Circ Res 85: 479–488, 1999.
Shi Q, Gross KW, and Sigmund CD. NF-Y antagonizes renin
enhancer function by blocking stimulatory transcription factors.
Hypertension 38: 332–336, 2001.
Shi Q, Gross KW, and Sigmund CD. Retinoic acid-mediated
activation of the mouse renin enhancer. J Biol Chem 276: 3597–
3603, 2001.
Sigmund CD. Regulation of renin expression and blood pressure by vitamin D3. J Clin Invest 110: 155–156, 2002.
Skott O. Renin. Am J Physiol Regul Integr Comp Physiol 282:
R937–R939, 2002.
Todorov V, Muller M, Schweda F, and Kurtz A. Tumor
necrosis factor-␣ inhibits renin gene expression. Am J Physiol
Regul Integr Comp Physiol 283: R1046–R1051, 2002.
Vogel G. The human genome. Objection #2: Why sequence the
junk? Science 291: 1184, 2001.
Wagner KD, Essmann V, Mydlak K, Wirth M, Gmehling G,
Bohlender J, Stauss HM, Günther J, Schimke I, and
Scholz H. Decreased susceptibility of cardiac function to hypoxia-reoxygenation in renin-angiotensinogen transgenic rats.
Am J Physiol Regul Integr Comp Physiol 283: R153–R160, 2002.
Wasserman WW, Palumbo M, Thompson W, Fickett JW,
and Lawrence CE. Human-mouse genome comparisons to locate regulatory sites. Nat Genet 26: 225–228, 2000.
Yan Y, Jones CA, Sigmund CD, Gross KW, and Catanzaro
DF. Conserved enhancer elements in human and mouse renin
genes have different transcriptional effects in As4.1 cells. Circ
Res 81: 558–566, 1997.
284 • APRIL 2003 •
www.ajpregu.org
Downloaded from http://ajpregu.physiology.org/ by 10.220.33.4 on July 12, 2017
gene, a 3.9-kb-long block of human DNA hRENc was identified.
It contains a large number of conserved elements. The corresponding sequence parts mRENc of mREN1 and rnRENc of
rnREN in the mouse and rat sequences were then obtained,
respectively. The tandem duplications of the mouse renin gene
were not used for further analysis. Finally, the percent identity
between hRENc and [m,rn]RENc was assessed (Fig. 2, bottom)
using blastz (20) on a local Linux computer.
Specification of sources of the DNA elements. Specification
of sources of the DNA elements was as follows: 1) hRcS: gi
22044063, position 529046–609046, reverse complement; 2)
hRENc: gi 22044063, position 529046–609046, reverse complement, position 25789–29689; 3) rnREN: gi 23321661, reverse complement, position 80000–150000; 4) rnRENc gi
23321661, reverse complement, position 105150–108660; 5)
mREN1: gi 20340684, position 47000–122000; 6) mREN2: gi
20340631, reverse complement, position 75000–175000; 7)
mRENc: gi 20340684, position 82900–86360.
Neighboring relationships between the genes. Please note
that the PIP identity scores in Fig. 1 are projected on the human
sequence. To estimate distances and neighboring relationships
of the renin gene of human, mouse, and rat, please refer to the
dotplots at http://www.charite.de/bioinformatics/tfprofile.
Availability of the algorithm. The c⫹⫹ source code of the
TFprofile implementation is freely available for the Linux/
Unix platform (GNU General Public License; www.gnu.org).
Second example of application of TFprofile. To provide
further evidence of the potential usefulness of the described
algorithm, we have calculated in a second example the
weighted TF binding profile for a 2-kb noncoding DNA region
⬃10 kb upstream of the human IL-4 gene containing conserved elements. Like the hRENc, this 2-kb DNA region was
identified using PiPmaker (20). Again, the peak in the profile
of this 2-kb DNA segment coincides with a DNA region,
which has been previously experimentally verified to have a
functional relevance to gene expression (11). This additional
binding profile calculated with our algorithm TFprofile may be
found at http://www.charite.de/bioinformatics/tfprofile.