Splicing signals and factors in plant intron removal

Biochemical Society Transactions (2002) Volume 30, part 2
Splicing signals and factors in plant intron removal
J.W. S. Brown', C. G. Simpson, G. Thow, G. P. Clark, S. N. Jennings, N. Medina-Escobar,
S. Haupt, S. C. Chapman and K. J. Oparka
Scottish Crop Research Institute, Invergowrie, Dundee DD2 SDA, Scotland, U.K.
Abstract
recruited to the 5' splice site where U1 small
nuclear RNA (snRNA) base pairs with the premRNA. Branchpoint bridging protein (BBP/SFl)
and U2 auxiliary factor of 65 kDa (U2AF65)bind
co-operatively to the branchpoint and polypyrimidine tract, after which U2snRNP is recruited
and U2snRNA base pairs with to the branchpoint
sequence. Interactions between protein factors at
the 5' splice site and branchpoint/poly-pyrimidine
tract/3' splice site lead to formation of the commitment complex and spliceosome [l-31. I n
addition, splice-site selection is modulated by
other factors interacting with intron and exon
splicing enhancer and silencer elements. Thus,
splice-site selection in multiple intron pre-mRNA
transcripts is co-ordinated through the activity
of a number of splicing regulatory factors, including serine-arginine-rich (SR) proteins, that
permit co-operative interaction between splice
sites along the pre-mRNA transcript and lead to
highly regulated splice-site selection [l-61.
In the absence of plant nuclear in vitro
splicing extracts, characterization of plant intron splicing signals such as branchpoint, polypyrimidine tract and UA-rich sequences, and the
determination of the roles of putative splicing
factors and RNA-binding proteins, has been extremely difficult. T h e requirement for strong
splicing signals in the invertase mini-exon system
makes it highly sensitive to mutation and provides
a system to investigate, in detail, the sequence and
spatial requirements of intron signals, and the
effects of overexpression of splicing factors. This
paper describes the utility of the mini-exon system
in such characterizations and more general
approaches to identifying proteins involved in
RNA processing in plants.
Constitutive splicing of the potato invertase miniexon 2 (9 nt long) requires a branchpoint sequence
positioned around 50 nt upstream of the 5' splice
site of the adjacent intron and a U,, element found
just downstream of the branchpoint in the upstream intron [Simpson, Hedley, Watters, Clark,
McQuade, Machray and Brown (2000) RNA 6,
4224331. T h e sensitivity of this in vivo plant
splicing system has been used to demonstrate exon
scanning in plants, and to characterize plant
intronic elements, such as branchpoint and polypyrimidine tract sequences. Plant introns differ
from their vertebrate and yeast couterparts in
being UA- or U-rich (up to 85 yo UA). One of the
key differences in splicing between plants and
other eukaryotes lies in early intron recognition,
which is thought to be mediated by UA-binding
proteins. We are adopting three approaches to
studying the RNA-protein interactions in plant
splicing. First, overexpression of plant splicing
factors and, in particular, UA-binding proteins, in
conjunction with a range of mini-exon mutants.
Secondly, the sequences of around 6 5 % of vertebrate and yeast splicing factors have high-quality
matches to Arabidopsis proteins, opening the door
to identification and analysis of gene knockouts.
Finally, to discover plant-specific proteins involved in splicing and in, for example, rRNA or
small nuclear RNA processing, green fluorescent
protein-cDNA fusion libraries in viral vectors
are being screened.
Introduction
Precursor mRNA (pre-mRNA) splicing requires
the early recognition of pre-mRNA splicing
signals. In metazoa, the 5' and 3' splice sites, and
an internal branchpoint associated with a downstream poly-pyrimidine tract, are essential intron
signals. Early in spliceosome assembly, U1 small
nuclear ribonucleoprotein particle (snRNP) is
The potato invertase mini-exon
system
Key words: mini-exon, pre-mRNA splicing, splicing factor.
Abbreviations used: pre-mRNA, precursor mRNA; SnRNP, small
nuclear ribonucleoprotein particle; SR, serine-arginine-rich; UBP,
U-rich-bindingprotein; GFP, green fluorescent protein.
'To whom correspondence should be addressed (e-mail
[email protected]).
T h e potato invertase mini-exon is constitutively
included in invertase transcripts and is dependent
on strong constitutive splicing signals [7]. T h e key
elements which determine mini-exon inclusion
are a branchpoint sequence, an adjacent downstream U-rich region and the distance between
0 2002 Biochemical Society
I46
RNA-Protein Machines
these signals and the 5’ splice site downstream of
the mini-exon. T h e model for mini-exon splicing
is that exon-bridging interactions, between factors at the branchpoint/polypyrimidine tract in
intron 1 and the 5’ splice site of intron 2, promote
splicing of intron 2. This is then followed by removal of intron 1 (Figure la) [7]. T h e increased
distance of the branchpoint/poly-pyrimidinetract
from the mini-exon is needed to allow the exonbridging interactions to occur. Mutation of
the splice sites, branchpoint or polypyrimidine
tract, or reduction of the distance between the
branchpoint/polypyrimidine tract and the miniexon, leads to increased or complete skipping of
the mini-exon. T h e skipping phenotype is readily
identifiable and quantifiable by reverse transcriptase PCR.
is summarized by the sequences CIJRAY or
YURAY, where the underlined U and A are
essential nucleotides, and purines and pyrimidines
are preferred nucleotides at positions 3 and 5
respectively. In position 1 , pyrimidines are preferred but also C is preferred to U (Figure lb).
These results represent the first detailed characterization of a branchpoint in a plant intron.
Furthermore, the preferred sequences match
consensus branchpoint sequences of both vertebrate and plant introns [8,11].
Plants do not, in general, contain strong polypyrimidine tracts as found in vertebrate introns,
but tend to have U-rich sequences between putative branchpoints and the 3’ splice site. U- or UArichness is a general feature of plant introns,
essential for efficient splicing [12], and mutations
to U-rich regions in a number of plant introns
alter splicing efficiency and can activate cryptic
splice sites ([13-171 for reviews; [7,18]). Although
the position of U-rich sequences, adjacent and
downstream of branchpoints, in plant invertase
genes suggests that they may be polypyrimidine
tracts, it was necessary to distinguish between a
polypyrimidine tract function and that of a UAor U-rich intronic element required for efficient
plant intron splicing. Mutational series of the U,,
sequence in potato invertase intron 1, where
increasing numbers of Us were replaced by Cs or
As (‘C’ and ‘A’ series respectively), were tested
Branchpoint and polypyrimidine tract
sequences
Various lines of evidence suggest that branchpoint
sequences that share similarity with those of
vertebrate introns may be important in splicing
and splice-site selection in plants [8,9]. However,
the contribution of sequence context to the
strength of plant branchpoints has not been determined. A complete mutational analysis of the
branchpoint sequence in intron 1 of the mini-exon
system identified the relative requirements for
particular nucleotides in every position [lo]. This
Figure I
Structure of potato invertase mini-exon and characterization of branchpoint
and polypyrimidine tract
(a) Model ofthe exon-definition process that promotes splicing ofthe 9 nt invertase mini-exon
[7]. Exons are indicated as boxes and introns (IVS) by lines. Splice sites are shown as GU and
AG. The branchpoint (A) and the associated polypyrimidine tract (pV; a U, sequence) is over
50 nt from the downstream 5’ splice site in all dicotyledon invertase genes isolated so far.
(b) Mutational analysis of the branchpoint and polypyrimidine tract generates the CURAY
branchpoint sequence, where the size of the letters represents the contribution of particular
nucleotides to splicing. The minimum polypyrimidine sequence for efficient splicing is two
groups of Us separated by three t o six pyrimidines.
,
Branchpoint
CURAY
Poiypytlmldlnetract
I47
0 2002 Biochemical Society
Biochemical Society Transactions (2002) Volume 30, part 2
for splicing of the mini-exon. Inclusion of > 2As
drastically reduced splicing efficiency, while Cs
could compensate to a large extent for Us [lo].
These results suggested that the U-rich region
functioned as a polypyrimidine tract rather than a
U-rich intronic element. T o function efficiently,
the poly-pyrimidine tract had to contain a minimum of two groups of three or four Us up to 6 nt
apart (Figure lb). Thus at least in some plant
introns, polypyrimidine tracts are required for
splicing, as is generally the case in vertebrate
introns.
U-rich intronic elements, were used in conjunction with overexpression of UBP-1 and other
proteins, following co-transfection of tobacco
protoplasts. Despite their similar affinities for
U-rich sequences in vitro, differential splicing efficiencies were observed for each of the proteins
with the C and A series of mutants (C. G. Simpson
and J. W. S. Brown, unpublished work). This
suggests that, although quite similar in structure
and in some cases sequence, subtle differences
exist among the proteins in terms of specificity
of sequence binding and their effects on splicing of
the mini-exon.
U-rich-binding proteins (UBPs)
Splicing factor orthologues in
Archidopsis
Since the initial realization that UA-rich sequences were required for efficient splicing in
plants [12], UA- or U-rich-binding proteins have
been postulated to interact with and identify intron
sequences within a pre-mRNA transcript [13-171.
RNA-binding proteins with affinity for U-rich
sequences have been identified in Arabidopsis
[ 19-21]. All have been shown to have high affinity
for U-rich sequences in vitro and therefore represent candidates for proteins involved in early
intron recognition. The most studied of these
is UBP-1, which can enhance splicing of poorly
spliced introns when overexpressed in protoplasts,
but also appears to have a role in mRNA stability
[20]. We have made use of the range of mutants in
the mini-exon splicing system to begin to investigate differential interactions of such proteins.
The A and C series of mutants, designed to
distinguish between the polypyrimidine tract and
A number of splicing factors have been isolated
from plants. These include snRNP, SR and heterogeneous ribonucleoprotein particle (‘ hnRNP ’)
proteins (see [17] for review). The Arabidopsis
genome sequence has greatly aided the identification of splicing proteins that are conserved
among eukaryotes. In an analysis of such proteins,
splicing factors with significant homology to
counterparts in yeast and vertebrates were identified for around 6 5 % of the eukaryotic proteins
(C. G. Simpson and J. W. S. Brown, unpublished
work). In some cases sequence similarity was
striking, while in others similarity resided in only
portions of the protein, such that identification
was tentative. In particular, the majority of
protein factors involved in mammalian branchpoint, polypyrimidine-tract and 3’-splice-site
selection have well-conserved orthologues in
Figure 2
Proteins identified to interact at the branchpoindpolypyrimidinetracd3’ splice site during early spliceosomal
complex assembly in mammalian introns
E, complex E, the earliest ATP-independent complex: A, complex A, the ATP-dependent pre-spliceosomal complex PI. The sizes of
the human proteins in amino acids are given in brackets. Human and Arabidopsis orthologues were aligned using LALIGN
(http://w.ch.embnet.ot@ and the percentage identity over overlapping regions is given (the size ofthe overlap is in brackets). UZAF, U2
auxiliary factor; mBBP, mammalian branchpoint binding protein; SAP, splicing associated protein.
Complex E
Human
mBBP (639aa)
U 2 A P (475aa)
U 2 A P (UOM)
p14
(12Saa)
SAP166 (1-a)
SAP145 (872aa)
SAP130 (1217aa)
SAP114 (793aa)
SAW2 (-a)
SAPS1 (SOlaa)
SAP49 (424aa)
Comolex A
0 2002 Biochemical Society
I48
Arabidopsis
32% ( m a )
40% (491aa)
s6% (247aa)
64% (121aa)
68% (13osaa)
42% 1-(
58% (1219aa)
31$6 (773aa)
62% (m)
45% (Sl2aa)
6896 (378aa)
RNA-Protein Machines
Arabidopsis (Figure 2 ) . That many yeast and
vertebrate splicing factors failed to identify similar
proteins in Arabidopsis suggests that these genes
and proteins have diversified in plants or that
their functions have been taken over by other
plant-specific proteins. T o begin to address the
functions of identified genes, various tools are
available : overexpression of proteins with miniexon mutants, transfer-DNA or transposon-gene
knockouts in Arabidopsis, or virus-induced or
RNA-induced gene silencing.
such as the potato mini-exon, provide new and
exciting opportunities to study constitutive and alternative splicing in plants.
References
I
2
3
4
5
6
Identification of nuclear proteins
7
Plant viral vectors have been developed for ectopic
expression of RNA and proteins in plant cells. In
particular, tobacco mosaic virus-based vectors
have been designed for efficient, high-throughput
expression in tobacco hosts. cDNA libraries have
been constructed from Nicotiana benthamiana
mRNA in tobacco mosaic virus vectors, where
partial cDNAs are expressed as 3'- or 5'-green
fluorescent protein (GFP) fusion proteins. Infection of leaves of tobacco plants produced many
hundreds of individual infection sites per leaf,
each of which represents a different fusion product. Lesions are rapidly screened by fluorescence
microscopy and those showing nuclear or nucleolar localization of GFP are selected. Following
isolation of the lesion and passaging through
another infection cycle, the cDNA is cloned and
sequenced. This allows large-scale screening of
cDNAs for particular cellular localization. Thus
far, GFP localization has been observed throughout the whole nucleoplasm, to putative nuclear
pore structures, strands within the nucleus, nuclear speckles and the nucleolus (K. J. Oparka,
unpublished work). T h e lack of an in vitro splicing
extract from plants has hampered progress in
understanding the nuances and subtleties of plant
intron splicing in comparison with other eukaryotic systems. T h e identification of splicing factors
conserved in plants and of novel plant proteins, the
technologies for examining gene function in plant
cells and sensitive in vivo splicing assay systems,
8
9
10
II
I2
13
14
15
16
17
18
19
20
21
Kr'dmer, A. ( I 996) Annu. Rev. Biochem. 65, 367409
Reed, R (2000) C u r . Opin. Cell Biol. 12,340-345
Hastings, M. L. and Krainer, A. R. (2001) Cur. Opin.
Cell Biol. 13,302-309
Black D. L. (1995) RNA I, 763-771
Blencow, B. J. (2000) Trends Biochern. Sci. 25, 106- I I0
Smith, C. W. J. and Valcirel, J. (2000) Trends Biochem. Sci.
25, 38 1-388
Simpson, C. G., Hedley, P. E., Watten, J. A,, Clark G. P.,
McQuade, C., Machray, G. C. and Brown, J. W. S. (2000)
RNA 6,422433
Simpson, C. G., Clark. G. P.. Davidson, D., Smith, P. and
Brown, J. W. S. ( I 996) Plant J. 9,369-380
Liu, H.-X. and Filipowicz, W. ( I 996) Plant J. 9,38 1-389
Simpson, C. G., Thow, G., Clark G. P., Jennings, S. N.,
Watten, J. A. and Brown, J.W. S. (2002) RNA. 8,47-56
Tolstrup, N., Rouze, P. and Brunak S. ( I 997) Nucleic Acids
Res. 25, 3 I 59-3 I63
Goodall, G. J. and Filipowicz, W. ( I 989) Cell 58, 473483
Luehen. K. R and Walbot, V. ( I 994) Plant Mol. Biol. 24,
449463
Simpson. G. G. and Filipowicz, W. ( I 996) Plant Mol. Biol.
32,141
Brown, J. W. S. and Simpson, C. G. ( I 998) Annu. Rev.
Plant Physiol. Plant Mol. Biol. 49,77-95
Schuler, M.A. ( I 998) in A Look Beyond Transcription:
Mechanisms Determining mRNA Stability and Translation
in Plants (Bailey-Serres, J. and Gallie, D., eds), pp. 1-19,
American Society of Plant Physiologists, Rockeville, MD
Lorkovit, Z. J., Kirk D. A. W., Larnbetmon, M. H. L. and
Filipowicz, W. (2000) Trends Plant Sci. 5, 160-1 67
Latijnhouwen, M. J., Pairoba, C. F., Brendel, V.. Walbot, V.
and Carle-Urioste, J, C. ( I 999) Plant Mol. Biol. 41,637-644
Gniadkowski, M., Hemmings-Mieszcak M., Klahre, U., Liu,
H.-X. and Filipowicz, W. ( I 996) Nucleic Acids Res. 24,
6 19-627
Lambermon, M. H. L, Simpson, G. G., Wieczorek-Kirk
D. A,, Hemmings-Mieszczak M., Klahre, U. and Filipowicz,
W. (2000) EMBO J. 19, 1638- I 649
Lorkovit, Z. J., Wieczorek-Kirk D. A,, Klahre, U.,
Hemmings-Mieszczak M. and Filipowicz,W. (2000) RNA
6, 1610-1624
Received 2 I November 200 I
I49
0 2002 Biochemical Society