Carcinogenesis vol.19 no.4 pp.557–566, 1998 Spectrum of point mutations in the coding region of the hypoxanthine-guanine phosphoribosyltransferase (hprt) gene in human T-lymphocytes in vivo Andrej Podlutsky1, Anne-May Österholm1, Sai-Mei Hou1, Andreas Hofmaier2 and Bo Lambert1,3 1The Karolinska Institute, Department of Biosciences, CNT/Novum, 141 57 Huddinge, Sweden and 2BIBRA International, Woodmansterne Road, Carlshalton, Surrey, SM5 4DS, UK 3To whom correspondence should be addressed Email: [email protected] The hypoxanthine-guanine phosphoribosyl transferase (hprt ) locus in 6-thioguanine (TG) resistant T-lymphocytes is a useful target for the study of somatic in vivo mutagenesis, since it provides information about a broad spectrum of mutation. Mutations in the hprt coding region were studied in 124 TG-resistant T-cell clones from 38 healthy, non-smoking male donors from a previously studied population of bus maintenance workers, fine-mechanics and laboratory personnel. Their mean age was 43 years (range 23–64) and their hprt mutant frequency was 9.3 K 5.2 H 10–6 (mean K SD, range 1.4–22.6 H 10–6). Sequence analysis of hprt cDNA identified 115 unique mutations; 76% were simple base substitutions, 10% were K 1 bp frameshifts, and 10% were small deletions within exons (3–52 bp). In addition, two tandem base substitutions and one complex mutation were observed. Simple base substitutions were observed at 55 (20%) of 281 sites known to be mutable in the hprt coding sequence. The distribution of these mutations was significantly different than would be expected based upon a Poisson distribution (P < 0.0001), suggesting the existence of ‘hotspots’. All of the 87 simple base substitutions occurred at known mutable sites, but eight were substitutions of a kind that have not previously been reported at these sites. The most frequently mutated sites were cDNA positions 197 and 146, with six and five independent mutations respectively. Four mutations were observed at position 131, and three each at positions 143, 208, 508 and 617. Transitions (52%) were slightly more frequent than tranversions (48%), and mutations at GC base pairs (56%) more common than mutations at AT base pairs (44%). GC > AT was the most common type of base pair substitution (37%). The majority of the mutations at GC base pairs (78%) occurred at sites with G in the nontranscribed strand. All but one of eight mutations at CpGsites were of the kind expected from deamination of methylated cytosine. Deletion of a single base pair (–1 frameshift) was three times more frequent than insertion of a single bp (F1 frameshift). Almost half (6/13) of the small (3–52 bp) deletions within the coding sequence clustered in the 59end of exon 2. Short repeats and other sequence motifs that have been associated with replication error were found in the flanking regions of most of the *Abbreviations: hprt, hypoxanthine-guanine phosphoribosyl transferase; LNS, Lesch-Nyhan syndrome; TG, 6-thioguanine; GSTM1, glutathione transferase; NAT2, N-acetyltransferase; TCR, T-cell receptor; RFLP, restriction fragment length polymorphism; RT-PCR, reverse transcription-polymerase chain reaction. © Oxford University Press frameshifts and small deletions. However, several differences in the local sequence context between K1 frameshift and deletion mutations were also noticed. The present results identify positions 197, 146 and possibly 131 as hotspots for base substitution mutations, and confirm previously reported hotspots at positions 197, 508 and 617. In addition, the earlier notion of a deletion hotspot in the 59end of exon 2 was confirmed. The observations of these mutational cluster regions in different human populations suggest that they are due to endogeneous mechanisms of mutagenesis, or to ubiquitous environmental influences. The emerging background spectrum of somatic in vivo mutation in the human hprt gene provides a useful basis for comparisons with radiation or chemically induced mutational spectra, as well as with gene mutations in human tumors. Introduction Knowledge about the mechanisms of mutagenesis in human somatic cells in vivo is important for the understanding of cancer and other diseases. In vivo mutations are also of potential use as biomarkers of genetic effects of occupational and other environmental exposures (1,2). The study of gene specific mutational spectra has provided valuable information about mechanisms of mutagenesis in human germ cells (reviewed in 3) and tumors (4,5). However, there is still limited information on the mechanisms of mutagenesis in human somatic cells in vivo, and the relationship between mutations arising in normal cells and mutations detected in tumors. The background spectrum of somatic mutation in healthy people in vivo is likely to be the result of many different causes, including spontaneous events as well as long time exposure to environmental mutagens, and is probably modified by nonmutagenic life style factors and endogenous host factors affecting metabolism, DNA damage formation and repair. Few endogenous and environmental causes of human in vivo mutation have been identified, and it is still largely unknown to what extent individuals differ in susceptibility toward environmental mutagens. In addition to the elucidation of molecular mechanisms of mutagenesis, the analysis of background mutation in people may reveal important in vivo mutagens as well as constitutional susceptibility factors, and give clues to the early steps of genetic change in cancer development. The usefulness of the X-linked hypoxanthine-guanine phosphoribosyl transferase (hprt*) locus in T-lymphocytes for the analysis of gene-specific mutations in vivo has been demonstrated in several studies of healthy people and patients receiving chemotherapy and other treatments (reviewed in 6). The individual frequency of mutant T-lymphocytes can be determined by cloning in medium containing 6-thioguanine (TG) (7,8). The molecular nature of the hprt mutations in Tcells can be studied using standard PCR and DNA-sequencing 557 A.Podlutsky et al. methods. The crystal structure of the human hprt protein has been determined (9), and the complete 55 kb nucleotide sequence of the human hprt gene is known (10). Thus, almost all different kinds of hprt mutation can be detected, and the frequency distribution of different types of alterations in hprt DNA (mutation spectrum) can be studied. Chemical-specific mutation spectra at the hprt locus can be studied in T-cells in vitro (11–13) and compared with mutations arising in Tcells in vivo (14), as well as with germ line mutations in patients with Lesch-Nyhan syndrome (LNS) and hprt-related gout (reviewed in 15). However, since hprt is an X-linked gene with only one functional copy in somatic cells, this locus is not suitable for studies of mechanisms involving both alleles, e.g. gene conversion and recombination. Human hprt mutations have been compiled in a comprehensive database (16–18), which, however, contains relatively few in vivo mutations representing true somatic background mutations. Preferably, such mutations should be studied in well characterized populations of healthy, non-smoking individuals of both sexes. The first extensive study of this kind, including 217 hprt mutations in T-lymphocyte clones from 172 male and female smokers and non-smokers, was recently published by Burkhart-Schultz et al. (14). However, considering the relatively large target and diverse nature of genetic alterations at the hprt locus (19), many more mutations have to be identified to establish a complete somatic in vivo spectrum, and to disclose any characteristic features of background and induced mutations in healthy people. We have identified 115 point mutations and small deletions in the hprt coding region of T-lymphocytes from 38 healthy, non-smoking males employed as bus maintenance workers, fine-mechanics or laboratory personnel. This study population has been characterized previously with regard to hprt mutant frequency and aromatic DNA adduct levels in peripheral blood lymphocytes (20,21). The spectrum of mutation, including several tentative base substitution hotspots, show striking smilarities with the spectrum published by Burkhart-Schultz et al. (14). Furthermore, our present results support the previous observation of a cluster of small deletion breakpoints in hprtexon 2 (22). The recurrence of mutations at specific positions in cells from individuals of widely separated study populations, suggests that these are true, independent background mutations, probably caused by endogenous mechanisms of mutagenesis, or ubiquitous environmental influences. This identification of hotspots for somatic in vivo mutation provides a basis for interesting comparisons with mutational spectra in tumor cells, and germ line genes causing heritable diseases. Materials and methods Study population and isolation of mutant T-cell clones Hprt mutant T-cell clones were obtained by direct TG selection of peripheral blood lymphocytes from healthy, non-smoking males who were either bus maintenance workers exposed to diesel exhausts, or non-exposed fine-mechanics and laboratory personnel. This study population has been characterized earlier with regard to genotypes for glutathione transferase (GSTM1) and Nacetyltransferase (NAT2), as well as hprt mutant frequency and aromatic DNA adduct levels in peripheral lymphocytes, and the results have been published (20,21). The exposed workers showed increased levels of aromatic DNA adducts, but the hprt mutant frequency was not different from that of the unexposed group. A weak but significant correlation betweeen adduct levels and hprt mutant frequency was observed when data from both groups were combined (20). Initially, a total of 462 directly selected TG resistant T-cell clones were available from 43 individuals (29 exposed and 14 non-exposed) (23). In the preliminary screening for mutations by multiplex and RT-PCR, cDNA was 558 obtained from 323 clones. The reasons for the failure to obtain RT-PCR products from one third of the clones have been discussed and ascribed to missing or unstable transcripts, poor quality of the cell sample, or methodological difficulties (23). The RT-PCR positive mutants were classified by PAGE-analysis as splice mutants (74 mutants with cDNA of abnormal length), coding errors (241 mutants with cDNA of normal length) or genomic deletions (8 mutants with abnormal multiplex PCR-product). Four mutants were genomic deletions by multiplex-PCR, and made no PCR product (23). After 2 years of freeze-storage, repeated RT-PCR and successful sequence analysis of cDNA were carried out on 183 of the 315 clones with splicing and coding errors. Thus, mutations were identified in 58% (183/315) of the mutants under these circumstances. Sequencing of the cDNA revealed 59 splicing mutations, which will be reported separately (A.-M.Österholm et al., in preparation). The remaining base substitutions and small deletions within the exon sequences in a total of 124 T-cell clones from 38 individuals are reported here. Reverse transcription-polymerase chain reaction (RT-PCR) Pellets of 6000 cells were thawed and hprt cDNA was synthesised essentially according to Yang et al. (24), with primers and modifications as described in Österholm et al. (23). Cells were incubated in 20 µl of cDNA cocktail (50 mM Tris–HCl, pH 8.5; 75 mM KCl; 3 mM MgCl2; 2.5% NP-40; 10 mM DTT), 500 µM of each dNTP (Pharmacia, Uppsala, Sweden); 1.6 µM of reverse primer 2, 1 U/µl RNAsin (Pharmacia); 2.5 U/µl M-MLV reverse transcriptase (Promega, Madison, WI) for 1 h at 37°C. Five µl of the cDNA reaction mixture was used in PCR with 0.16 µM of forward primer 1 and 0.08 µM of reverse primer 2, 200 µM of each dNTP, PCR buffer (15 mM Tris–HCl, pH 8.5; 60 mM KCl; 1.5 mM MgCl2) and 1 U of Taq polymerase (Promega). After initial denaturation for 4 min at 94°C, 30 cycles were run at 94°C for 30 s, 50°C for 30 s, 72°C for 1 min, followed by 7 min polymerisation at 72°C. Nested PCR was performed with 2–10% of the first PCR product and 0.2 µM each of forward primer 3 and reverse primer 4 (biotinylated), using the same reaction conditions as for the first PCR. The PCR was performed in DNA-Engine™ (MJ Research, Watertown, MA, PTC200) or GeneAmp 2400® (Perkin Elmer, Foster City, CA) thermal cyclers. Ten percent (5 µl) of the reaction product was visualised on a 3.75% polyacrylamide gel. Direct sequencing Biotinylated PCR product (45 µl) was immobilised on streptavidin-coated magnetic beads (Dynal, Oslo, Norway). DNA strands were separated in alkali (1 M NaOH, 0.075% Tween 20) with the help of magnetic concentrator (Dynal). Non-biotinylated DNA strand was precipitated with 1/10 volume of 3 M sodium acetate, pH 5.2 and 2 vol of ice-cold 95% ethanol, washed twice with 70% ethanol, vacuum dried and dissolved in water. Sequencing reactions used PRISM Sequenase® terminator single-stranded DNA sequencing kit (Applied Biosystems, Foster City, CA), and 0.2 µM of one of the following primers: Forward primer 5: (124)59-ATTATGGACAGGACTGAA-39(141) Forward primer 6: (166)59-GAGATGGGAGGCCATCACAT-39(185) Reverse primer 7: (302)59-CTGATAAAATCTACAGTCAT-39(283) Reverse primer 9: (373)59-AAGTTGAGAGATCTTCTCCAC-39(353) The reactions were run on a 373A or 377A Automated Sequencer (Applied Biosystems). Most of the clones were sequenced in a two-step procedure. First, one of the two forward primers was used, giving a readable sequence from cDNA position 200 to 670, i.e. about two thirds of the target sequence. If no mutation was found in this region of the cDNA, the sequencing was completed with one of the reverse primers. In 41 mutants (indicated in Table III), the whole coding region was sequenced. With one exception (discussed in the text) all clones showed one mutation only. Analysis of TCR-pattern for clonal identity Clones from the same donor which were found to have identical mutation, were studied with regard to T-cell receptor (TCR) γ-gene rearrangement essentially as described by de Boer et al. (25). Clonal cell lysates were used in a two step, nested PCR reaction with primers originally described by Bourguin et al. (26), and the nested PCR-product was subjected to restriction fragment length polymorphism (RFLP) analysis as described by Bastlova and Podlutsky (13). Statistical analysis The Goodness of Fit Test (27) was used to study the probability that the observed simple base substitutions (listed in Table III) were randomly distributed in the hprt coding sequence (the null hypothesis). The underlying assumptions were that (i) all observed mutations are independent (ii) there are 300 mutable sites in the hprt coding sequence (see text for explanation and references). The expected frequency of sites with 1, 2, 3 etc. mutations was calculated, using the Poisson distribution, and compared to the observed HPRT mutational spectra Table I. Summary of the study population, mutant clones and mutations Group and no. of individuals Mean age (range) Mutant frequency (range)a No. of clones No. of mutationsb Bus maintenance workers, 26 Laboratory personnel & fine mechanics, 12 All workers, 38 45 (28–64) 37 (23–54) 43 (23–64) 9.6 6 5.3 (3.2–22.6) 8.3 6 4.7 (1.4–18.0) 9.3 6 5.2 (1.4–22.6) 70 54 124 67 48 115 aMean 6 SD 3 10–6. Data from (21). mutations within the coding region. bUnique frequency distribution. Differences between expected and observed frequencies were tested for statistical significance (P , 0.05) using the chi-square test. The probability that two or more base substitutions in a set of random mutations would occur at the same site was calculated using the Poisson distribution. Bonferroni correction was applied (28) to take into consideration 300 assumed mutable sites and thus provides a conservative method of identifying probable hotspots. For a set of 87 simple base substitutions, as in the present work, the probability after using the Bonferroni correction of observing 4 or more, 5 or more and 6 or more mutations at any single site is 0.07, 0.004 and 0.0002 respectively. These probabilities were used to define clusters of five or six mutations as indicating hotspots in the present data set. Results Point mutations and small deletions in the hprt coding region were identified in mutant T-cell clones from 38 individuals belonging to a previously studied group of workers exposed to diesel exhaust and a non-exposed control group (20,23). All subjects were non-smoking, healthy males. Our previous results showed increased levels of aromatic DNA adducts in the exposed group, but no significant difference between the groups with regard to hprt mutant frequency. The distributions of age and mutant frequency in the present study group (Table I), were similar to the results for the entire study population (20,23). A total of 124 mutant clones were studied. The number of clones per individual ranged from one to eight, with a mean and median of three. Sibling clones (i.e. clones with identical mutations and TCR-rearrangements) were observed in three subjects (three doublets), and non-sibling clones (i.e. clones with same mutation but different TCR-rearrangements) were observed in five subjects (one triplet and four doublets) (indicated in Table III). Considering the uncertainty with regard to the origin of the mutation in these clones (i.e. a single event in a cell prior to TCR-rearrangement or separate events in clones with different TCR-rearrangements) and to avoid bias in the frequency distribution of mutations, only one mutation of each kind from each individual was regarded as a unique mutation. In one clone K-36–3 (Table III), two mutations were identified. One of these occurred at position 135, the first base of exon 3, where G.C as well as G.T substitutions have been observed previously in the hprt mutation database (16, here and below in the text and tables this reference relates to database release six, August 1996). The other mutation in this clone, a G.C transversion at position 176, is the first one to be reported at this site. The predicted amino acid substitution caused by this mutation is gly58.ala, a change from a polar to a non polar residue. Nevertheless, until other mutations are reported to occur at this site, this base substitution is regarded as a silent mutation, and is therefore not included among the unique mutations. Thus, a total of 115 unique mutations were identified, 89 (77%) base substitutions and 26 (23%) frameshifts and small deletions (Table II). The nature and sequence context of each mutation is shown in Tables III and Table II. Types and frequencies of mutations in the coding region of the hprt gene Type of mutation Base substitution Simple Tandem Frameshift 11bp –1bp –2bp Deletion, 3–52 bp Complex Total No. (%) 87 2 (76) (2) 3 9 1 12 1 115 (3) (8) (1) (10) (1) (100) VIII (V-clones derive from bus maintainance workers, and Kclones from fine-mechanics and laboratory personnel). Among the base substitutions, 87 were simple and two tandem mutations (Table III). Both tandem mutations were of a kind (CC.TT or GG.AA) that have been observed previously at a low frequency (1–2% of base substitutions) at other positions in the hprt gene. One of the simple base substitutions was a change of the translation initiation codon, 78 (88%) were missense and 10 (11%) were nonsense mutations. Two mutations occurred at the extreme ends of exons; a G.C transversion of the first base of exon 3 (K-36–3), and a T.G transversion of the last base of exon 7 (K-37–1). In both cases, the mutation would be expected to cause little if any attenuation of the corresponding splicing sequence, and no exon skipping or intron inclusion were observed in the cDNA of these mutants. All of the base substitutions occurred at sites where mutations have been reported previously in the hprt mutation database (16). Eight new mutations were observed at sites where other mutations have been reported earlier (Table IV). One of these is a nonsense mutation (A.T) at position 307. With the addition of these new mutations to the ones already existing in the hprt mutation database, a total of 439 different simple base substitutions have been recorded at 281 sites in the hprt coding region. The distribution of base substitutions among the nine exons of the hprt coding region was roughly proportional to the distribution of mutable sites (Table V). In agreement with earlier observations (19), mutations were relatively common in the 59-half of exon 3 (positions 143–220), and there was a relative paucity of mutations in exon 1, the 39-half of exon 3 (positions 221–318), exon 4 and exon 5 (Table V, and Figure 1). These results indicate an overall consistency with regard to the frequency distributions of mutations in the present data set and the hprt mutation database. There was a non-uniform distribution of the 87 simple base substitutions among the 55 positions in the coding region (Figure 1). Different mutations at the same site were observed 559 A.Podlutsky et al. Table III. Simple and tandem base substitutions in the hprt coding sequence Mutant ID cDNA Exon Base Target sequenceb Amino acid changesb changes positiona V-143-12c K-29-13c K-32-3c K-34-8/10c,e V-136-7c V-140-9 V-131-6c K-33-9c V-143-8c V-150-13c V-131-5c K-25-5 V-119-8 V-127-6c V-141-7c K-36-3c, f V-127-8c V-129-2c V-144-4c K-25-13c K-26-16c V-123-6c V-131-3c V-140-5/15c,e V-133-14 V-141-15c K-25-2c K-34-2 V-148-5c K-33-6 K-26-14/18c,d K-28-6c K-29-2c K-37-8c V-136-6c V-150-4c V-124-10c K-29-10 V-144-1c V-150-8c K-25-14c V-123-8c V-145-11 V-130-4 K-27-10 V-145-1 K-26-3 V-141-11 V-133-2 V-146-3 K-25-12c K-27-8 K-27-5c V-141-13 V-139-4 V-141-9 K-33-3/8d V-137-10/13e V-133-8 V-129-6 V-124-6 V-130-2 V-137-3/11d K-29-12 K-36-1/6/8d K-36-5 V-132-7 K-37-1c K-29-11 K-37-4c 3 29 47 47 64 95 109 110 119 119 122 131 131 131 131 135 143 143 143 146 146 146 146 146 151 151 158 170 173 194 197 197 197 197 197 197 203 208 208 208 209 209 216 220 236 236 299 307 344 389 418 418 419 424 428 454 463 463 464 475 508 508 508 527 529 530 530 532 539 539 560 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 8 8 G.C T.G G.A G.A T.G T.C A.T T.G G.A G.A T.C A.G A.G A.T A.T G.C G.A G.A G.T T.C T.C T.C T.C T.C C.T C.T T.G T.A G.A T.A G.A G.A G.T G.T G.A G.A T.G G.A G.C G.A G.A G.A T.A T.G T.C T.C T.G A.T A.T T.A G.C G.C G.T A.C T.G C.T C.T C.T C.T A.G C.T C.T C.T C.G G.T A.T A.T T.G G.A G.A t t AT G GCGA GTGA T TAGT CCAG G TTAT CCAG G TTAT TTTA T TTTG GATT T GGAA GTTT A TTCC TTTA T TCCT CATG G ACTA CATG G ACTA GGAC T AATT ATGG A CAGG ATGG A CAGG ATGG A CAGG ATGG A CAGG g t a g G ACTG GAAC G TCTT GAAC G TCTT GAAC G TCTT CGTC T TGCT CGTC T TGCT CGTC T TGCT CGTC T TGCT CGTC T TGCT TGCT C GAGA TGCT C GAGA GATG T GATG GAGA T GGGA ATGG G AGGC GCCC T CTGT CTCT G TGTG CTCT G TGTG CTCT G TGTG CTCT G TGTG CTCT G TGTG CTCT G TGTG GTGC T CAAG CAAG G GGGG CAAG G GGGG CAAG G GGGG AAGG G GGGC AAGG G GGGC GCTA T AAAT TAAA T TCTT CTGC T GGAT CTGC T GGAT TTTA T CAGA ACTG A AGAG ATAA A AGTA AATG T CTTG CACT G GCAA CACT G GCAA ACTG G CAAA CAAA A CAAT ACAA T GCAG CAGG C AGTA TAAT C CAAA TAAT C CAAA AATC C AAAG GGTC A AGGT CCCA C GAAG CCCA C GAAG CCCA C GAAG AAGC C AGAC GCCA G ACT g CCAG A CT g t CCAG A CT g t AGAC T g t a a GTTG G ATTT GTTG G ATTT Initiation codon Ile9.Ser Gly15.Asp Gly15.Asp Phe21.Val Leu31.Ser Ile36.Phe Ile36.Ser Gly39.Glu Gly39.Glu Leu40.Pro Asp43.Gly Asp43.Gly Asp43.Val Asp43.Val Arg44.Ser Arg47.His Arg47.His Arg47.Leu Leu 48.Pro Leu48.Pro Leu48.Pro Leu48.Pro Leu48.Pro Arg50.TERM Arg50.TERM Val52.Gly Met56.Lys Gly57.Glu Leu64.His Cys65.Tyr Cys65.Tyr Cys65.Phe Cys65.Phe Cys65.Tyr Cys65.Tyr Leu67.Arg Gly69.Arg Gly69.Arg Gly69.Trp Gly69.Glu Gly69.Glu Tyr71.TERM Phe73.Val Leu77.Pro Leu77.Pro Ile99.Ser Lys102.TERM Lys114.Ile Val129.Asp Gly139.Arg Gly139.Arg Gly139.Val Thr141.Pro Met142.Arg Gln151.TERM Pro154.Ser Pro154.Ser Pro154.Leu Lys158.Glu Arg169.TERM Arg169.TERM Arg169.TERM Pro175.Arg Asp176.Tyr Asp176.Val Asp176.Val Phe177.Ala Gly179.Glu Gly179.Glu Table III. Cont. Mutant ID cDNA Exon Base Target sequenceb Amino acid changesb changes positiona K-32-6 K-34-4c K-34-9 V-145-7 K-33-10 K-27-9 V-131-7 V-150-1 K-28-3 V-119-6 V-135-1 V-137-1 K-39-4 V-127-3 V-132-4 V-143-6 V-141-3 K-29-3/6c,d V-143-4 541 568 568 569 581 599 599 602 606 606 611 612 614 617 617 617 648 112-113 399-400 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 2 5 T.G G.A G.T G.C A.T G.A G.C A.T G.C G.T A.G T.A T.G G.A G.A G.A C.G CC.TT GG.AA TGGA T TTGA TGTA G GATA TGTA G GATA GTAG G ATAT CTTG A CTAT TTCA G GGAT TTCA G GGAT AGGG A TTTG ATTT G AAT g ATTT G AAT g t a g C A TGTT a g CA T GTTT CATG T TTGT GTTT G TGTC GTTT G TGTC GTTT G TGTC AATA C AAAG TATT CC TCAT TTGT GG AAgt Phe180.Val Gly189.Arg Gly189.TERM Gly189.Ala Asp193.Val Arg199.Lys Arg199.Thr Asp200.Gly Leu201.Phe Leu201.Phe His203.Arg His203.Gln Val204.Gly Cys205.Tyr Cys205.Tyr Cys205.Tyr Tyr215.TERM Pro37.Phe Val132.Val, Glu133.Lys aThe A of the ATG start codon is number 1. strand sequence is shown. which the whole cDNA was sequenced. dNon-sibling clone(s) with same mutation but different TCR-pattern was observed in this donor. eSibling clone(s) with same mutation and TCR pattern was observed in this donor. fA second mutation, G.C at position 176 was observed in this clone, but regarded as a silent mutation (see text). bThe non-transcribed cIndicates mutants in Table IV. New mutable sites and new mutations at previously mutated sites New mutationsa Site Type (no.) aa-substitution Previously observed mutations (no.)b 64 110 158 194 307 418 532 599 T.G (1) T.G (1) T.G (1) T.A (1) A.T (1) G.C (2) T.G (1) G.C (1) phe.val ile.ser val.gly leu.his lys.stop gly.arg phe.ala arg.thr aThis bData T.A (1) T.A(4), T.C(2) T.A(1), T.C(1) T.C(4), T.G(1) A.G (1) G.A(1) T.C(2) G.A (10) G.T (3) work. from the human hprt mutation database (16). at seven positions (Table VI). The goodness of fit test was calculated to compare the distribution of 87 observed mutations among the estimated 300 mutable sites (281 identified so far, see Discussion) within the hprt gene with an expected Poisson distribution with the parameter m 5 0.29 (87/300). The underlying Ho hypothesis was that the observed frequencies of occurrences of base pair substitutions at a single site does not differ significantly from those expected from a Poisson random distribution. A significant test result indicates a lack of fit and may suggest the existence of hotspots within the observed mutable sites. The test statistic shows an overrepresentation of base pair substitutions occurring two or more times at a single site (18 observed, 9.9 expected), an overrepresentation of sites showing no mutation (244 observed, 224.5 expected) and an under-representation of sites showing only a single base pair substitution (38 observed, 65.1 HPRT mutational spectra Table VII. Simple base substitutions in the coding region Fig. 1. Distribution of single base substitutions at the hprt locus in T-cells of healthy, non-smoking males. Data from Table III. Table V. Distribution of mutable sites and mutations in the hprt coding sequence Hprt Exon (no. of bp) A Mutable sitesa B Mutated sitesb C No. of bp substitutionsb B/A C/B 1 2 3 4 5 6 7 8 9 654 9 49 76 16 10 37 23 42 19 281 1 9 17 1 1 8 5 8 5 55 1 14 33 1 1 10 8 12 7 87 0.11 0.18 0.22 0.06 0.10 0.21 0.21 0.19 0.26 0.20 1 1.6 1.9 1 1 1.3 1.6 1.5 1.4 1.55 aNo. (27) (107) (184) (66) (18) (83) (47) (77) (45) of known mutable sites in the human hprt mutation database (16). work, data from Table III. bThis Table VI. Sites with three or more mutations of the same kind, or two different mutations Site This work The database Ranka 131 143 146 197 208 508 568 599 606 617 A.G(2), A.T(2) G.A(2), G.T T.C(5) G.T(2), G.A(411b) G.A(2),G.C C.T(311b) G.A, G.T G.A, G.C G.T, G.C G.A(3) A.G (4) G.A (4), G.C, G.T T.G (5), T.C G.A (23), G.C (5), G.T (9) G.A (24), G.C (4), G.T (6) C.T (29) G.A (13), G.C (4), G.T (4) G.A (10), G.T (3) G.C (3), G.T (7) G.A (9), G.T (7) 99 60 59 1 2 3 7 17 25 13 aIndicates rank order among the most frequently mutated site in the human hprt mutation database (16). mutant in same individual. bNon-sibling expected). The chi-square test statistic was highly significant (chi-square value 105.8, P , 0.0001). Based on the observed frequencies of base pair substitutions at a single site and the Type of mutationa This work No. (%) Data baseb No. (%) All No. (%) All mutations Transition (all): G.A C.T A.G T.C 87 45 23 9 4 9 87 45 22 13 6 4 (100) (52) (25) (15) (7) (5) 174 90 45 22 10 13 Transversion (all): G.T C.A G.C C.G A.T T.A A.C T.G Mutations at AT Mutations at GC 42 (48) 7 (8) 0 8 (9) 2 (2) 8 (9) 5 (6) 1 (1) 11 (13) 38 (44) 49 (56) 42 6 2 10 5 3 8 0 8 29 58 (48) (7) (2) (11) (6) (4) (9) 84 (48) 13 (7) 2 (1) 18 (10) 7 (4) 11 (6) 13 (7) 1 (1) 19 (11) 67 (39) 107 (61) (100) (52) (26) (10) (5) (10) (9) (33) (67) aBases listed are on the non-transcribed strand. bData from the human hprt mutation database (16) (100) (52) (26) (13) (6) (7) including non-smokers only, and excluding mutations at splice sites. lack of fit of this distribution to a Poisson random distribution there is strong indication for the existence of mutational hotspots. The probability that four or more, five or more and six or more mutations by chance would occur at the same site in the present set of 87 mutations was calculated using a Poisson distribution followed by the use of Bonferroni correction to be 0.07, 0.004 and 0.0002 respectively (see Materials and methods section). This suggests that position 146 with five mutations and position 197 with six mutations are hotspots, while position 131 with four mutations is close to the criterion of P , 0.05 after using the Bonferroni correction (Figure 1). Positions 197, 508 and 617 were previously identified as in vivo mutation hotspots in a different study population (14). In the present data set, positions 508 and 617 showed three mutations each. Thus, in addition to identifying positions 197 and 146 as hotspot, our present results confirm the hotspots at positions 508 and 617, and indicate position 131 as tentative hotspots for base substitution mutations in the hprt coding region of T-cells in vivo. Almost a quarter (21/87) of the base substitutions were found to occur at these five positions. The hotspot position 508 is at a CpG-site. In addition to the three mutations at position 508, five other mutations were found to occur at CpG-sites; three at position 143 and two at position 151. All but one of the CpG-mutations were of the kind expected from spontaneous deamination of methylated cytosine; seven were GC.AT and one was GC.TA (Table III). Together, these eight mutations represent 9% (8/87) of the simple base substitutions. There are seven positions in four CpG-sites where mutations have been shown to occur in the hprt coding region (16). If the 87 simple base substitutions were randomly distributed among the 281 mutable sites, two would be expected to occur at a CpG-site, which is significantly less than the eight mutations that were observed. This result, showing a relative excess of hprt mutations at CpG-sites in Tcells in vivo, is in agreement with previous observations (19). Among the 87 simple base substitutions, 52% were transitions, and 48% transversions (Table VII). More mutations occurred at GC base pairs (56%), despite the fact that there 561 A.Podlutsky et al. Table VIII. Frameshifts and deletions in the hprt coding sequence Mutant ID cDNA positiona Exon Base changes Target sequenceb Remarksc K-34-5 V-143-2 K-26-20 V-141-2 K-26-17 V-135-11 K-28-5 V-126-6 V-127-12 V-147-5 K-33-2 V-127-10e K-36-9 K-37-3d V-151-1d K-34-7 V-141-12d V-145-6 K-28-7 V-145-14 V-132-2 V-136-5e V-131-1 K-29-1 V-138-2 K-30-10 144 196-197 223-225 294-297 318 368 503-506 536-537 552 589 595-596 610 617-619 29-32 45 45-46 44-46 48-50 49-51 80-82 299-300 400 405, 421 496-497 546 561 3 3 3 3 3 4 7 8 8 8 8 9 9 2 2 2 2 2 2 2 3 5 6 7 8 8 1A 1T 1T –T –T –C –C –T –A –G –T –C –2 bp –19 bp –52 bp –5 bp –35 bp –8 bp –22 bp –3 bp –6 bp –3 bp –16 bp, 1G –3 bp –4 bp –26 bp GAACG (A) TCTTGCT CCCTC (T) TGTGTGC AATTC (T) TTTGCTG GTAGA T TTTAT TATTG T gtgagta ATCTCT C AACTT AAGGA C CCCAC tttagTTG T TGGAT ATTCC A GACAA ATAAT G AATAC AATAC T TCAGG tttatag C ATGTTT GTTT GT GTCATT cagA TTA...AGG TTAT AACC AGGTT...TTG GAA AACC AGGTT ATGAC GAAC CAG...AAT CATT CAGG TTATGACC TTGA AGGT TAT...GCA TACC AATC ATT ATGC TTTA TCAGAC TGAAGA TGTG GAA gtaa aaGA TAT...GGC (G)AAA GGTG AAA AGGA TTGA AATT CCAG GTT TGT...ATA ATGAATA IR, CTY (CT)2, (TG)3, CTY (T)4, CTY (T)4 (TG)3 (TC)3, CTY (C)4 (TG)2 Palindrome (AAT)2, DCS (T)2, CTY AT-repeat (TG/GT)2 TTA-repeat IR, Palindrome Palindrome, TGA CA-repeat TT-repeat, CTY, TGA TA repeat AT-repeat, Palindrome Palindrome, DCS (G)2, (TG)2 GA-repeat (A)4, TGAA Palindrome, TGAA TG-repeat, DCS aThe bThe position of the first altered base and any ambiguity with regard to this position is indicated. Base 1 is the A of the initiation codon. altered base and sequence (in bold) of the non-transcribed strand is spaced according to the most 59 position. In deletions longer than 8 bp, the first and last few bases are separated by dots. Coding bases are in upper case, intron bases in lower case letters. Underlined sequence refers to the first remark in the last column. cIR refers to inverted repeat. CTY refers to the vertebrate topoisomerase I consensus cleavage site CTT or CTC. DCS refers to the deletion consensus sequence TG A/G A/G G/T A/C, as described in (3). In addition, TGA(A)– motifs within two positions from the altered base are indicated. dSequenced in genomic DNA and reported in (22). K-28-7 contains a double deletion, only the intra-exonic one is shown here. eVerified by sequencing of genomic DNA. are fewer mutable GC-base pairs (48%) than AT-base pairs. The frequencies of transitions and transversions differed significantly between GC and AT base pairs; transitions were more frequent than transversions at GC base pairs, whereas the opposite was observed at AT base pairs. The predominant base substitutions were GC.AT transitions (36%) and AT.TA transversions (15%). In 78% (38/49) of the substitutions at GC base pairs, guanine was in the non-transcribed strand, which is more than expected, considering that only 63% (85/ 134) of guanines at mutable sites are in the non-transcribed strand. Thus, a possible strand bias for in vivo mutation at GC base pairs was indicated by the results in this limited data set, which possibly reflects a higher damage frequency and/or less efficient repair at guanine bases in the non-transcribed strand. Substitutions at thymines in the non-transcribed strand (25/38) were twice as common as substitutions at thymines in the transcribed strand (13/38). However, this is predicted in a random distribution, because mutable thymines are approximately twice as common in the non-transcribed strand as in the transcribed strand. Overall, the frequencies of the different base substitutions in the present data set were very similar to previous results from non-smoking donors (Table VII). Frameshifts and small deletions were identified in 26 mutants (Table VIII). Deletions of one or several bp at the 59or 39 end of exons were observed in three mutants (K-26–17, V-127–10 and V-136–5), with no evidence of associated splicing error. All of these alterations tend to increase the consensus value of the corresponding splice site by changing the last or first exon base into a G or an A. Among the 61 frameshift 562 mutations, deletion of a single base pair was three times as common as insertion of a single base pair. Several of the frameshift mutations were identical to mutations in the hprt mutation database (16), e.g. at positions 368, 503 and 617. The latter mutation, as well as the insertion at position 196/ 197 where a –1 frameshift and a 3 bp-deletion have been reported earlier, coincide with base substitution hotspots. The 1 bp-insertion at position 144 follows base no. 143, where three independent base substitutions were recorded in the present data set (Figure 1). At positions 294 and 503, where –1 frameshifts were observed in the present work (Table VIII), 11 frameshifts have also been reported. Position 536, which is the site of a –1 frameshift in the present data set, is included in several small deletions in the database (16). In total, eight of the 13 frameshifts, including the 2 bp-deletion, were identical to or occurred at positions where frameshifts or small deletions have been reported previously. In contrast, no frameshift mutation was observed at position 207, the first in a run of GC base pairs where 7 in vivo and 15 in vitro 61 frameshift mutations are recorded in the database (16). Almost 50% (6/13) of the intra-exonic small deletions (ù3 bp) had at least one breakpoint in the previously defined deletion hotspot region in the 59-end of exon 2 (positions 40–51). This work adds two new mutations to this region (K-34–7 and V-145–6); four have been reported previously (22, see Table VIII). The 3 bp-deletion at position 80 in exon 2 (clone V-145–14) is identical to a mutation reported in the hprt mutation database (16) and its 59 breakpoint coincides with a another 15 bp deletion in the database. The breakpoints HPRT mutational spectra of the –16 bp/1G compound mutation (V-131–1) coincides with three previously described 61 frameshift mutations, and the small deletions at positions 496, 546 and 561 occurred at positions where –1 frameshift have been reported in the database (16). These observations suggest that frameshift and small deletion mutations are non-randomly distributed, and may tend to cluster in the same regions of hprt DNA. Most of the frameshift mutations and deletions occurred in a sequence context of short nucleotide repeats, or such repeats were created by the mutation. Some of these structural features are indicated in Table VIII, and further discussed below. The base composition of the sequences in which deletions and frameshifts occurred showed an excess of AT base pairs in comparison with the whole hprt coding region. The AT content of the hprt coding sequence is 59%. The mean AT content of the three bases flanking the frameshifts and deletions (first 3 bp deleted 1 3 bp downstream of the deletion according to Table VIII, total 3323265156 bp) is 66%. Two thirds (9/13) of the frameshifts and 77% (20/26) of the deletion ends occurred at an AT bp. In the 3 bp-sequences flanking the deletions, 32% (25/78) of the bases in the non-transcribed strand were thymines and 40% (31/78) were adenines. In contrast, the corresponding sequences flanking the frameshift mutations were significantly different; only 21% (16/78) were adenines and 40% (31/78) were thymines. The sequence context of the frameshift mutations seemed to differ from that of the small deletions in some other respects as well. Many (8/12) frameshifts occurred in a non-transcribed sequence of 4–7 pyrimidines or thymines, and thymine was the deleted or inserted base in half of these mutations. The only G-deletion (V-147–5) was not in a polypyrimidine run, but between AAT-repeats. These features were not prominent among the small deletions; only two of them occurred in a run of four pyrimidines, none of which contained more than two thymines (Table VIII). The CTT/C trinucleotide motif, which is a consensus cleavage sequence for vertebrate topoisomerase I, was found within 2 bp from the altered base in five of the 13 frameshift mutations, but only in one of the small deletion mutations. Moreover, TGTG motifs occurred more often at or close to frameshift mutations (4/13) than deletions (1/13). In contrast, TGA(A) motifs were more common at or near deletion breakpoints (6/13) than at frameshift positions (1/13). Thus, it seems that the local sequence context of frameshift mutations may be different from that of small intra-exonic deletions, in spite of their tendency to occur in the same regions of the hprt coding sequence. Discussion The analysis of mutational spectra in mammalian cells has the potential to reveal the mechanisms of mutagenesis and to provide clues to the identification of human somatic and germ line mutagens (29,30). One important step in this research is the study of in vivo mutation in well characterized human populations. The observable background spectrum of in vivo mutation in healthy people is influenced by a number of factors, such as (i) the selection system used to isolate the mutant cells, (ii) the rates of different kinds of spontaneous and induced mutations and (iii) host factors modifying the effect of environmental mutagen exposures. The selection system restricts the genetic target for mutagenesis, i.e. the types of mutation that will be observed and the distribution of mutable sites. The vast majority of TG selected T-cells have Fig. 2. Alignment of functional regions and positions for missense mutations in the hprt protein. The small bars at the bottom of the top graph show all positions where amino acid substitutions are known to occur, as predicted from all sites where single base substitutions have been recorded in the human hprt mutation database (16). The tall bars represent those sites where two or more missense mutations were recorded in the present data set (data from Table III). Asp43 corresponds to hprt-cDNA position 131, leu48 to hotspot position 146, cys65 to hotspot position 197, gly69 to cDNA positions 208–209, gly139 to positions 418–419, pro154 to positon 463– 464, asp176 to position 529–530 and cys205 to hotspot position 617. The lower graph represents the functional domains as described in (9). a severely reduced or undetectable HPRT-enzyme activity (31). Few mutants with residual HPRT-enzyme activity are likely to survive TG-selection, and contribute to the mutational spectrum, which therefore is biased toward null mutations. Thus, hprt mutations which cause deletions, defective RNA splicing, frameshifts and stop codons are expected to give rise to a selectable phenotype, regardless of their position in the gene (nonsense mutations have been reported at position 648, close to the end of the coding sequence). However, not all missense mutations will do so. Since the observed mutation spectrum is filtered by phenotypic selection, the distribution of amino acid substitutions (as predicted from the distribution of missense mutations) is expected to reflect functionally important regions of the protein. The crystal structure of the human HPRT-protein indicates several functional domains, including amino acid residues in the catalytic and substrate binding sites (9). Amino acid substitutions that give rise to a TG-selectable or LNS-phenotype have been observed in as many as 70% (152/217) of the residues in the HPRT-protein monomer (Figure 2). They seem to distribute in the functional domains, such as the GMPbinding sites (GBS), but also in regions where no particular function has been allocated. The amino acid substitutions corresponding to the most frequent missense mutations in the present data set are indicated in Figure 2. None of these amino acids have been associated with a specific protein function, except for Gly 139, which is part of the phosphoribosepyrophosphate-binding motif (PRPP) (9). The central part of the protein (amino acid residues 82–128, corresponding to the 39 half of exon 3 and exon 4), shows a relatively low density of mutations. This part includes the so called flexible loop domain (9), and it is likely that many amino acid substitutions in this region do not give rise to a selectable phenotype. With the exception of this region, substitutions of most of the amino acids in the highly conserved HPRT-monomer seems to produce a non-functional protein, as defined by TG-selection. This 563 A.Podlutsky et al. suggests that the spectrum of missense mutation that is observed at the hprt locus is not much restricted by phenotypic selection, and has the potential of truly reflecting a broad range of molecular events involved in spontaneous as well as induced mutations. The human hprt mutation database (16, release 6, August 1996) contains information on 1166 simple base substitutions causing 910 missense and 256 nonsense mutations at 281 different bp positions in the coding region. The mutations derive from a variety of TG-selected human cell lines and from patients with LNS and gout. The phenotype in LNS is characterized by severely reduced HPRT-enzyme activity like the TG-selected cells, whereas gout patients often show residual enzyme activity (reviewed in 15). Consequently, there is a considerable overlap between the mutations in LNS and TGselected cells, whereas some mutations are unique for gout (32). Specifically, there are 9 bp positions (46, 155, 157, 160, 232, 239, 329, 396 and 472), for which only gout mutations have been reported in the database (16), and where mutations may not lead to a TG-selectable phenotype. Excluding these, there are 272 bp in the hprt coding region that are known to be able to mutate to a TG-selectable or LNS-phenotype by a simple base substitution, which corresponds to almost 50% (272/570) of all the base pairs in the coding region of the hprt gene that theoretically could produce an amino acid exchange by a single base substitution. However, the spectrum with regard to base substitutions is not yet saturated. This work adds seven new missense mutations and one new nonsense mutation. The estimate that .300 mutable sites will eventually be identified in the hprt coding region (19) seems reasonable, and we therefore used this number in our statistical calculations (see Materials and methods). In the present work, mutations were identified in all 124 mutants from which RT-PCR product could be obtained for sequence analysis. This is approximately half the number of mutant clones that were classified as having point mutations in the first screening analysis shortly after collection (23). The difficulty to get high quality RT-PCR products two years later is probably related to the long storage time, and the shortage of material in some of the clones. It cannot be excluded that this has caused some bias in the spectrum of mutations, but there is no evidence that this is the case. On the contrary, the similarity between the present and previous results strongly indicates that these spectra are representative for the true in vivo spectrum of hprt mutation in T-cells of non-smoking adults. Approximately half of the mutations in the human hprt database (16) are derived from a coherent study of a large and well characterized study population (14), similar to the one in this work, whereas the other half is composed of results from many smaller studies, in which the donor status and mutant collection was not always well characterized. Nevertheless, the hotspots positions (14) and frequency distribution of mutations in these data sets are very similar (Table VII). The overall distribution of mutations per mutable site was found to deviate significantly from a random Poisson distribution (Figure 1). The main contribution to the significant chi-square value was the over-representation of sites showing three or more base pair substitutions. Burkhart-Schultz et al. (14) identified three hotspots at positions 197, 508 and 617 in a mixed population of smokers and non-smokers, but did not find statistically significant differences between the mutation spectra of the two subpopulations. Our present data confirm the existence of these hotspots in a quite distant and unrelated 564 non-smoking study population. Moreover, our results identify another independent hotspot at position 146, and a possible hotspot at position 131. The mutated guanine base at position 197 is the first G in a (TG)3-repeat, which is preceded by a CTC-trinucleotide conforming to the vertebrate topoisomerse I consensus cleavage site (CTC or CTT) which has been implicated in frameshift and small deletion mutations (see below). Both G.A and G.T subsitutions were found to occur at this site. The mutated thymine at position 146 is the second base of a CTTtrinucleotide. This hotspot is flanked by two other frequently mutated sites; 143, which is a CpG-site spaced by one base from the CTT trinucleotide and 151, which is part of a CTCtrinucleotide. Altogether seven of the 55 mutated sites occurred in or close to a CTT or CTC trinucleotide. Position 131 is in a sequence resembling a ‘consensus deletion sequence’ denoted by Cooper and Krawczak (3), which was also observed close to eight other mutated sites. Position 617 involves a (TG)2repeat, and position 508 is a CpG-site. Twelve of the 55 mutated sites were located within 2 bp from a frameshift mutation or deletion breakpoint, which may indicate that some structural features may be common to these types of mutations, as suggested by Rodriguez and Loechler (33). Cariello and Skopek (19) identified five tentative hairpin structures in the hprt DNA; three in exon 3 and two in exon 8, which together contain 83 base pairs. Although only 24% (68/281) of the mutable sites are located in these structures, they contained 38% (33/87) of the base substitutions in the present data set, including 11 hotspot mutations at positions 146 and 197. This indicates that these palindromes may promote mutagenesis, or it may simply reflect the functional importance of this part of exon 3. There is a need for more mutations and proper statistical methods for the analysis of the possible contribution of these and other structural features to the observed spectrum of in vivo mutation. With the possible exception of the two tandem mutations, the present spectrum of base substitutions does not highlight any particular kind of exogenous factor involved in the causation of these mutations. Both tandem mutations (CC.TT at positions 112–113 and GG.AA at positions 399–400) were of the kind that has been associated with UV-induced mutagenesis in skin tumors (34), and with mutations induced by reactive oxygen species in vitro (35). On the other hand, positions 399–400 overlap a 3 bp deletion mutation adjacent to a (TG)2 repeat (Table VIII), and positions 112–113 coincide with a CTC-trinucleotide (see above). This suggests that other factors may contribute to sequence instability at these sites as well. Two different mutations at the same site were observed in seven positions, all but one at GC-base pairs (Table VI). This indicates that different mechanisms operate at these sites, either as a result of different types of DNA damage or sequence instability. In general, transitions were slightly more frequent than transversions (Table VII), as is the case for human germ line mutations causing genetic diseases, where transitions also dominate (3). Mutation at GC bp, especially with G in the non-transcribed strand, were more common than mutation at AT bp, in spite of the fact that more mutable sites exist at AT bp in the hprt coding sequence. This preference for mutagenesis at GC-bp is mainly caused by the predominance of GCmutations among the few hotspots and frequently mutated sites; the base composition among all the 55 mutated sites in the present data set (46% GC) is not much different from that HPRT mutational spectra of the entire hprt coding region. Therefore, it is not clear if the preference for mutagenesis at GC bp is due to an increased rate of (spontaneous or induced) modifications at G-bases (possibly combined with a slower repair of G’s in the nontranscribed strand), or simply related to the local sequence context around the hotspot positions. It is interesting to note, however, that while base substitution mutations showed a preference for GC-bp, the frameshift and deletion mutations showed a preference for AT-bp. Among human germ line mutations, the predominance of CG.TA mutations have been attributed to the deamination of 5-methylcytosine at CpG-sites (3). It is not known which of the eight CpG-sites are methylated in the hprt coding region, but mutations have been reported at all seven mutable positions in four CpG-sites, 142/143, 151/152, 481 and 508/509. In the present data set, three C,T transitions were found at site 508. This in vivo-mutation hotspot (see above) is the third most frequently mutated position in the human hprt mutation database (16). All mutations are C.T, creating a TGA stop codon (Table VI). Two C.T transitions were found at position 151, the sixth most frequently mutated site in the database. Also at this position the mutation gives rise to a stop codon, and all known mutations are C.T transitions. Taken together, these results strongly suggest that the cytosines at positions 508 and 151 are methylated, and that spontaneous deamination of 5-methylcytosine is a likely mechanism for the frequent occurrence of mutation at these sites. Interestingly, G.A transitions are much less often observed at these sites compared to C.T transitions, suggesting a pronounced strand bias for 5-methylcytosine mediated mutagenesis in the hprt gene (36). In contrast, it seems unlikely that 5-methylcytosine deamination is involved in mutagenesis at the CpG site at position 142/143. Three mutations occurred at this site, two G.A and one G.T (Table III). The human mutation database (16), contains seven mutations at position 142/143, four of them being G.A, but no C.T-mutation has been reported to occur at this CpG-site. We have recently characterized a deletion hot spot in the 59-end of hprt exon 2 (22). Almost half (6/13) of the deletion mutations in the present data set had one or both breakpoints within a 9-base pair palindromic structure (positions 41–49), flanked by a number of TGA-repeats. In addition to the many small deletion breakpoints in this region, there are also four independent 61 frameshift mutations in the human hprt mutation database (16), suggesting that it may be a hot spot region for frameshift mutation as well. Previous analysis of somatic mutation in mammalian cells (37) and human germ line mutations (3) have suggested that the mononucleotide composition and DNA sequence context surrounding deletions is different from that of bulk DNA. The present deletion mutations seemed to occur in sequences with a possible excess of AT-base pairs and bias in the distribution of A and T bases between the transcribed and untranscribed DNA strand. Several more or less specific sequences have been associated with frameshifts and small deletions in human genes (discussed in 3). The most frequently recurring sequence features appears to be short di- or triucleotide repeats, inverted repeats, runs of 4–5 pyrimidines, the so called deletion consensus sequence which resembles certain polymerase arrest sites, symmetric elements, and the consensus clevage sequence for vertebrate topoisomerase I (CTT/C). The latter sequence motif was found to be associated with the breakpoints of large somatic hprt deletions in human cells (38). As shown in Table VIII and indicated in the Results section, several of the present frameshift and deletion mutations seem to be associated with one or more of these characteristic sequence features. But there are also important differences between the frameshift and deletion mutations with regard to these sequence characteristics, suggesting that the mechanisms involved are different. Overall, the structural features of these mutations conform with earlier studies of spontaneous gene deletions in mammalian cell lines in vitro (reviewed in 39) and human germ line mutations (reviewed in 3). The further evaluation of the biological significance of these sequence motifs and their relation to mechanisms of base substitution, frameshift and deletion mutagenesis requires larger data sets and improved statistical methods allowing the analysis and comparison of variations in the local sequence context in regions where mutations occur and in other parts of the gene. The consistency of the present mutational spectrum with that of Burkhart-Schultz et al. (14) and the human hprt mutation database (16), provides strong evidence that these mutations truly reflect the background spectrum of hprt mutation in T-cells of healthy, non-smoking adults. However, it is also obvious that any mutation spectrum, being composed of one or a few mutations from each of a large number of individuals, conceals a considerable heterogeneity with regard to individual differences in mutation rates, metabolism and life-style related exposures. So far, the analysis of mutation spectra in smokers and non-smokers have not provided evidence of significant differences (14,40), which may be due to the limited size and variability within the study populations. In addition to smoking, age and some occupational and therapeutic exposures (reviewed in 6), and possibly genetic variations in the metabolism of xenobiotics (41), have been associated with increased hprt mutation frequencies. Further studies are needed to elucidate the possible influence of these and other host and life style related factors on the background spectrum of hprt mutation in T-cells of healthy people. The presently compiled database will be a useful basis for such studies and for future comparisons with hprt mutations in people exposed to environmental mutagens and carcinogens, as well as with mutation spectra in tumors and inherited genetic diseases. Acknowledgements Andrej Podlutsky and Anne-May Österholm contributed equally to this work. Andrej Podlutsky was supported by stipends from the European Science Foundation (ESF), and the Royal Academy of Sciences (KVA). Andreas Hofmaier is a recipient of a EUCAHM-fellowship on leave from the School for Medical Documentation at the University of Ulm. We are grateful to Dr William Thilly and Dr Aoy Tomita, MIT, Cambridge, USA, and Dr David Lovell, BIBRA, UK, for statistical advice. Financial contributions was received from the Swedish Cancer Society (1179-B96–1XAB), The Swedish Work Environmental Fund, The Swedish Environment Protection Board and Swedish Match AB. Part of this work was conducted within the framework of the EU-BioMed 2 Project on Occupational and Environmental Mutagenesis: Validation and Application of the HPRT in vivo Mutation Assay for Risk Assessment in Humans (EUCAHM, BMH4 CT96 0120, http:// www.ulst.ac.uk./faculty/science/EUCAHM). References 1. Albertini,R.J., Nicklas,J.A., O9Neill,J.P. and Robison,S.H. (1990) In vivo somatic mutations in humans: Measurement and analysis. Annu. Rev. Genet., 24, 305–326. 2. Albertini,R.J. (1994) Why use somatic mutations for human biomonitoring. Env. Molec. Mutag., 23/S24, 18–22. 3. Cooper,D.N. and Krawczak,M. (1993) Human Gene Mutation. Bios Scientific Publishers, Oxford, UK. 565 A.Podlutsky et al. 4. Greenblatt,M.S., Bennett,W.P., Hollstein,M. and Harris,C.C. (1994) Mutations in the p53 tumor suppressor gene: clues to cancer etiology and molecular pathogenesis. Cancer Res., 54, 4855–4878. 5. Krawczak,M, Smith-Sørensen,B., Schmidtke,J., Kakkar,V.V., Cooper,D.N. and Hovig,E. (1995) Somatic spectrum of cancer-associated single basepair substitutions in the TP53 gene is determined mainly by endogeneus mechanisms of mutation and by selection. Human Mutation, 5, 48–57. 6. Cole,J. and Skopek,T. (1994) Somatic mutant frequency, mutation rates and mutational spectra in the human population in vivo. Mutat. Res., 304, 33–106. 7. Albertini,R.J., Castle,K.L. and Borcherding,W.R. (1982) T-cell cloning to detect the mutant 6-thioguanine resistant lymphocytes present in human peripheral blood. Proc. Natl Acad. Sci. USA, 79, 6617–6621. 8. Morley,A.A., Trainor,K.J., Seshadri,R. and Ryall,R.B. (1983) Measurement of in vivo mutations in human lymphocytes. Nature, 302, 155–156. 9. Eads,J.C., Scapin,G., Xu,Y., Grbmeyer,C. and Sacchettini,J.C. (1994) The crystal structure of human hypoxanthine-guanne phosphoribosyltransferase with bound GMP. Cell, 78, 325–334. 10. Edwards,A., Voss,H., Rice,P., Civitello,A., Stegemann,J., Schwager,C., Zimmermann,J., Erfle,H. and Caskey,C.T. (1990) Automated DNA sequencing of the human hprt-locus. Genomics, 6, 593–608. 11. Andersson,B., Fält,S. and Lambert,B. (1992) Strand specificity for mutations induced by (1)-anti BPDE in the hprt gene in human Tlymphocytes. Mutat. Res., 269, 129–140. 12. McGregor,W.G., Maher,V.M. and McCormick,J.J. (1994) Kinds and location of mutations induced in the hypoxanthine-guanine phosphoribosyltransferase gene of human T-lymphocytes by 1nitrosopyrene, including those caused by V(D)J recombinase. Cancer Res., 54, 4207–4213. 13. Bastlova,T and Podlutsky,A. (1996) Molecular analysis of styrene oxideinduced hprt mutation in human T-lymphocytes. Mutagenesis, 11, 581–591. 14. Burkhart-Schultz,K., Thompson,C.L. and Jones,I.M. (1996) Spectrum of somatic mutation at the hypoxanthine phosphoribosyltransferase (hprt) gene of healthy people. Carcinogenesis, 17, 1871–1883. 15. Sculley,D.G., Dawson,P.A., Emmerson,B.T. and Gordon,R.B. (1992) A review of the molecular basis of hypoxanthine-guanine phosphoribosyltransferase (HPRT) deficiency. Hum. Genet., 90, 195–207. 16. Cariello,N.F. (1994) Software for the analysis of mutations at the human hprt gene. Mutat. Res., 312, 173–185. (Data base release no. six, August 1996.) 17. Cariello,N. (1996) Relational data-base model for DNA mutations and software program for implementation of the model. Mutat. Res., 359, 103–117. 18. Cariello,N.F., Douglas,G.R., Dycaico,M.J., Gorelick,N.J., Provost,G.S. and Soussi,T. (1997) Data-bases and software for the analysis of mutations in the human p53 gene, the human hprt gene and both the lacI and lacZ gene in transgenic rodents. Nucl. Acids Res., 25, 136–137. 19. Cariello,N.F. and Skopek,T.R. (1993) Analysis of mutations occurring at the human hprt locus. J. Mol. Biol., 231, 41–57. 20. Hou,S.-M., Lambert,B. and Hemminki,K. (1995a) Relationship between hprt mutant frequency, aromatic DNA adducts and genotypes for GSTM1 and NAT2 in bus maintenance workers. Carcinogenesis, 16, 1913–1917. 21. Hou,S.-M., Fält,S and Steen,A.-M. (1995b) HPRT mutant frequency and GSTM1 genotype in non-smoking healthy individuals. Env. Molec. Mutag., 25, 97–105. 22. Österholm,A.-M., Bastlova,T., Meijer,A., Podlutsky,A., Zanesi,N. and Hou,S.-M. (1996) Sequence analysis of deletion mutations at the HPRT locus of human T-lymphocytes: association of a palindromic structure with a breakpoint cluster in exon 2. Mutagenesis, 11, 511–517. 23. Österholm,A.-M., Fält,S., Lambert,B. and Hou,S.-M. (1995) Classification of mutations at the human hprt-locus in T-lymphocytes of bus maintenance workers by multiplex-PCR and reverse transcriptase-PCR analysis. Carcinogenesis, 16, 1909–1912. 24. Yang,J.L., Maher,V.M. and McCormick,J.J. (1989) Amplification and direct nucleotide sequencing of cDNA from the lysate of low numbers of diploid human cells. Gene, 83, 347–354. 25. de Boer,J.G., Curry,J.D. and Glickman,B.W. (1993) A fast and simple method to determine the clonal relationship among human T-cell lymphocytes. Mutat. Res., 288, 173–180. 26. Bourguin,A., Tung,R., Galili,N. and Sklar,J. (1990) Rapid, nonradioactive detection of clonal T-cell receptor gene rearrangements in lymphoid neoplasms. Proc. Natl Acad. Sci. USA, 87, 8536–8540. 27. Snedecor,G.W. and Cochran,W.G. (1967) Sampling from the binomial distribution. In Snedecor,G.W. and Cochran,W.G. Statistical Methods, 6th Edn. The Iowa State University Press, IA, pp. 223–227. 28. Adams,W.T. and Skopek,T.R. (1987) Statistical tests for the comparison of samples from mutational spectra. J. Mol. Biol., 194, 391–396. 566 29. Keohavong,P and Thilly,W.G. (1992) Mutational spectrometry: A general approach for hot-spot mutations in selectable genes. Proc. Natl Acad. Sci. USA, 89, 4623–4627. 30. Kat,A.G. and Thilly,W.G. (1994) Mutational spectra of endogeneous genes in mammalian cells. In Hemminki,K. et al. DNA Adducts: Identification and Biological Significance, IARC Scientific publications No. 125. International Agency for Research on Cancer, Lyon, France, pp. 371–383. 31. Steen,A.-M., Sahlén,S., Hou,S.-M. and Lambert,B. (1993) Hprt-activities and RNA phenotypes in 6-thioguanine resistant human T-lymphocytes. Mutat. Res., 286, 209–215. 32. Lambert,B., Marcus,S., Andersson,B., Hou,S.-M., Steen,A.-M. and Hellgren,D. (1992) Missense mutations and evolutionary conserved amino acids at the human hypoxanthine phosphoribosyl-transferase locus. Pharmacogenetics, 2, 329–336. 33. Rodriguez,H and Loechler,E.L. (1995) Are base substitution and frameshift mutagenesis pathways interrelated? Mutat. Res., 326, 29–37. 34. Brash,D.E., Rudolph,J.A., Simon,J.A., Lin,A., McKenna,G.J., Baden,H.P., Halperin,A.J. and Pontén,J. (1991) A role for sunlight in skin cancer: UVinduced p53 mutations in squamous cell carcinoma. Proc. Natl Acad. Sci. USA, 88, 10124–10128. 35. Reid,T.M. and Loeb,L.A. (1993) Tandem double CC→TT mutations are produced by reactive oxygen species. Proc. Natl Acad. Sci. USA, 90, 3904–3907. 36. Skandalis,A., Ford,B.N. and Glickman,B.W. (1994) Strand bias in mutation involving 5-methylcytosine deamination in the human hprt gene. Mutat. Res., 314, 21–26. 37. Nalbantoglu,J., Hartley,D., Phear,G., Tear,G. and Meuth,M. (1986) Spontaneous deletion formation at the aprt locus of hamster cells: the presence of short sequence homologies and dyad symmetries at deletion termini. EMBO J., 5, 1199–2004. 38. Monnat,R.,Jr, Hackmann,A.F. and Chiaverotti,T.A. (1992) Nucleotide sequence analysis of human hypoxanthine phosphoribosyltransfrase (HPRT) gene deletions. Genomics, 13, 777–787. 39. Meuth,M. (1990) The structure of mutation in mammalian cells. Biochim. Biophys. Acta, 1032, 1–17. 40. Vrieling,H., Thijssen,J.C.P., Rossi,A.M., van Dam,F.J., Natarajan,A.T., Tates,A.D. and van Zeeland,A.A. (1992) Enhanced hprt mutant frequency but no significant difference in mutation spectrum between a smoking and a non-smoking human population. Carcinogenesis, 13, 1625–1631. 41. Lambert,B., Bastlova,T., Österholm,A.-M. and Hou,S.-M. (1995) Analysis of mutation at the hprt locus in human T-lymphocytes. Toxicol. Lett., 82/83, 323–333. Received on July 16, 1997; revised on November 7, 1997; accepted on December 10, 1997
© Copyright 2026 Paperzz