Speed of adaptation and genomic signatures in arms race and trench warfare models of hostparasite coevolution Aurélien TELLIER1 Stefany MORENO-GÁMEZ2 Wolfgang STEPHAN2 1 Section of Population Genetics, Center of Life and Food Sciences Weihenstephan, Technische Universität München, 85354 Freising, Germany 2 Section of Evolutionary Biology, LMU University of Munich, LMU BioCenter, Grosshaderner Str. 2, 82152 Planegg-Martinsried, Germany Corresponding author: Aurélien TELLIER Section of Population Genetics, Center of Life and Food Sciences Weihenstephan, Technische Universität München, 85354 Freising, Germany Email: [email protected] Running head: Genomic signatures of coevolution Keywords: genetic drift, selective sweeps, balancing selection, frequency-dependent selection ABSTRACT Host and parasite population genomic data are increasingly used to discover novel major genes underlying coevolution, assuming that natural selection generates two distinguishable polymorphism patterns: selective sweeps and balancing selection. These genomic signatures would result from two coevolutionary dynamics, the “trench warfare” with fast cycles of allele frequencies and the “arms race“ with slow recurrent fixation of alleles. However, based on genome scans for selection, few genes for coevolution have yet been found in hosts. To address this issue, we build a gene-for-gene model with genetic drift, mutation and integrating coalescent simulations to study observable genomic signatures at host and parasite loci. In contrast to the conventional wisdom, we show that coevolutionary cycles are not faster under the trench warfare model compared to the arms race, except for large population sizes and high values of coevolutionary costs. Based on the generated SNP frequencies, the expected balancing selection signature under the trench warfare dynamics appears to be only observable in parasite sequences in a limited range of parameter, if effective population sizes are sufficiently large (>1,000) and if selection has been acting for a long time (> 4N generations). On the other hand, the typical signature of the arms race dynamics, i.e. selective sweeps, can be detected in parasite and to a lesser extent in host populations even if coevolution is recent. We suggest to study signatures of coevolution via population genomics of parasites rather than hosts, and caution against inferring coevolutionary dynamics based on the speed of coevolution. 1 INTRODUCTION Diseases are major agents of natural selection. In both natural and domesticated species, parasites limit plant and animal growth, alter development, and reduce seed and offspring production. There is selection pressure on hosts for resistance to parasites and equally on parasites to overcome host defenses. This confrontation drives coevolution, in which gene frequencies in one species determine the fitness of genotypes of the other species, and should result in genetic diversity for resistance and tolerance in hosts and in infectivity and virulence in parasites. Progress in our molecular understanding of the genetic basis of resistance in hosts (humans, animals, or plants) and infectivity in parasites (bacteria, fungi, viruses) reveal that few major defense genes underlie these traits (1-3). Numerous theoretical analyses of gene-for-gene (GFG) or matchingallele models describe the coevolutionary dynamics at these loci based on the phenotypic outcome of infection determined by host genotype × parasite genotype (G×G) interactions. These models describe the coevolutionary dynamics driven by negative indirect frequency-dependent selection (niFDS): rare alleles exhibit a fitness advantage as selection in the host population depends on allele frequencies in the parasite population, and vice versa (4). Two types of coevolutionary dynamics (5, 6) are predicted: 1) recurrent fixation of alleles and transient polymorphism (the so called arms race dynamics, ref. 7), 2) continuous cycling of allele frequency changes (the trench warfare dynamics, ref. 8). The trench warfare dynamics is expected to exhibit faster coevolutionary cycles than the arms race (5, 9). Studying genomic signatures at host and parasite genes of coevolution has been used to infer the type of coevolutionary dynamics (5). Based on deterministic models of coevolution with infinite host and parasite population sizes, the current paradigm states that the two types of coevolutionary dynamics should result in different observable signatures of polymorphism (5, 6). The arms race dynamics is characterized by recurrent fixation of alleles in host and parasite populations, and is expected to generate recurrent selective sweeps (10) at the genes of coevolution (5, 7). The typical hitch-hiking signatures of selective sweeps include an excess of low and high frequency variants (thus negative Tajima’s D (11) values) and a region of reduced polymorphism around the selected site (10). On the other hand, the trench warfare dynamics (8) is expected to maintain polymorphism at both host and parasite coevolutionary loci, resulting in the occurrence of balancing selection with two or more alleles (5, 8). Typically, this would be observed as high levels of polymorphism and an excess of intermediate frequency variants around the selected site reflected by positive Tajima’s D values (12). With technological advances and the sequencing of numerous plant and animal genomes among and within populations, e.g. in Arabidopsis thaliana, humans, and Drosophila, it becomes feasible to detect host genes under selection which underlie host-parasite coevolution (3, 13-15). However, results from recent studies suggest that few new genes underlying coevolution are detected in hosts based on selective sweeps and balancing selection signatures (for example in 13-15). In fact, most genes underlying coevolution showing a genomic signature of selection had been identified based on a limited set of candidate genes and sequenced individuals (7, 8, 16, 17). The relative lack of additional evidence for genes under coevolution may lie in 1) the difficulty to disentangle the genomic signatures of selection from demographic effects (18) and/or 2) the inadequacy of the theoretical coevolutionary models to predict the observable complex patterns of molecular diversity. Predicting the polymorphism signatures at loci underlying coevolution is also of importance for parasite population genomics studies, as scans for genomic signatures of selection are becoming key methods to discover novel genes with essential function for infection (19-21). A logical link, potentially based on naïve premises, has traditionally been suggested between three levels of study of coevolution: 1) the mathematical behavior of the deterministic coevolutionary model, i.e. the stability of the polymorphic equilibrium point, 2) the occurrence of the arms race or trench warfare dynamics, and 3) the incidence of selective sweeps or balancing selection in population genomic data (5). Most theoretical models of coevolution have thus focused on the ecological and epidemiological conditions for the stability of the polymorphic equilibrium while assuming infinite population sizes (22-24). It has been shown that in such models costs of resistance and infectivity are necessary but not sufficient for the stability of the polymorphic equilibrium point. A mathematical condition, negative direct frequency-dependent selection (ndFDS), is however required for stability of the equilibrium and is promoted by various life-history traits such as parasite polycyclicity (23, 24). Nonetheless, theoretical studies have not investigated so far the parameter conditions and model assumptions under which ndFDS would generate 1) a trench warfare dynamics for a realistic finite 2 population model, and 2) balancing selection signatures in the genome. Understanding the link between mathematical stability, coevolutionary dynamics and genomic signatures is essential for predicting the occurrence of genomic signatures of coevolution in natural populations, but also for future statistical inference of coevolutionary parameters based on host and parasite sequence data. To address this major shortcoming of coevolutionary theory, we construct here realistic models of coevolution in finite size populations and generate polymorphism data at coevolution genes. We focus in the following on the gene-for-gene (GFG) relationship which receives strong support from empirical studies in plants and their parasites (1), as well as in invertebrates (2). We propose a clear distinction between three levels of coevolutionary theory. First, the stability or instability of a model is defined as the mathematical behavior of the deterministic coevolutionary model (in the form of a dynamical system) around the polymorphic equilibrium point. Second, in finite populations of hosts and parasites, the allele frequency dynamics can be defined as arms race dynamics when alleles are recurrently fixed in the populations (or at frequency below a threshold of detection), or as a trench warfare dynamics, when there is maintenance of alleles in the host and parasite populations at observable frequencies. Finally, selective sweep and balancing selection are defined as the genomic signatures of polymorphism to be observed at these genes in a sample of several host and parasite DNA sequences. We aim to demonstrate the connection between these three levels of coevolutionary theory, and to predict the genomic signatures at genes of coevolution. We first quantify the properties of the coevolutionary dynamics by comparing finite population size versions of models without and with ndFDS (23). It appears that the speed of the arms race dynamics, measured as the period of cycles, is not slower than the trench warfare models for a wide range of parameter values. Moreover, stable polymorphism is found to generate sensu stricto trench warfare dynamics only for high costs of resistance and infectivity. We then simulate observable patterns of genetic polymorphism (SNPs) at host and parasite interacting loci, using a coalescent simulator (25). We show that the typical signatures of selective sweeps and balancing selection are only observed in a very small range of parameter values, and that balancing selection is more likely to be detected in parasite populations. RESULTS Arms race versus trench warfare dynamics We compare two GFG models with resistance (RES) and susceptible (res) alleles in the host and infectivity (INF) and non-infectivity (ninf) in the parasite. The deterministic “unstable model” assuming only niFDS has an unstable polymorphic equilibrium point, and is thus expected to generate an arms race dynamics in a finite population (23). Four costs are associated with the simple GFG model. The costs of resistance (u) and infectivity (b) are fitness costs associated with the corresponding allele (where 0 < u, b <1). Hosts can also exhibit a fitness cost of being diseased (denoted by s, with 0 < s <1). Finally, c is the cost to the non-infective parasite of not being able to infect the host (where c ≈ 1 in GFG models). On the other hand, the deterministic “stable model” assumes that the parasites undergo two generations per host generation, which generates ndFDS for some parameter values (23). The mathematical condition of stability of the non-trivial polymorphic equilibrium is expected to give rise to a trench warfare dynamics over a large parameter range. Finite population sizes (N) and mutation (µGFG), as well as the population mutation rate (θGFG = 2NµGFG) are introduced for both models. Genetic drift is caused by binomial sampling based on allele frequencies at each generation. Parasites undergo thus once genetic drift in the unstable model, but twice in the stable model. In the unstable model, the main determinant of the speed of coevolution is the population mutation rate (θGFG), while the costs (u, b, s) play a minor role. The speed of coevolution increases with higher values of θGFG but does not depend on the population size alone (Figure S1). Population size, i.e. genetic drift, determines thus only the fixation of alleles and the time alleles spend near fixation (Figure S2), while the population mutation rate (θGFG) determines the rate at which new mutations arise in the population and thus the incidence of coevolutionary cycles. Importantly, fixation of resistance alleles can occur only for a small population size and an intermediate to high cost of disease, i.e. when drift and selection for resistance are strong (N < 5,000, Figure S2). In the absence of costs (u = b = 0) infective and susceptible alleles are fixed. Following expectations from the deterministic unstable model (23), increasing values of the costs u and s move the equilibrium point 3 away from the boundaries in the phase plan (SI Appendix section 1), which increases the strength of selection for resistance and infective alleles, and the speed of coevolution (Figures S1 and S2). For the stable model, by fixing the population mutation rate θGFG we show that genetic drift per se determines the probability of allele fixation leading to dynamics similar to the arms race in small populations (Figure 1). In the stable model, smaller population sizes, i.e. stronger genetic drift, pulls away the allelic frequencies from the interior equilibrium, thus increasing the fixation probability of alleles even under high mutation rates. Stable long-term polymorphism, defined as a near zero probability of allele fixation, arises only for large population size (N = 10,000) and a limited range of cost values (intermediate to high costs u and b, low to intermediate cost s, Figure 1). In this range of parameter values, natural selection driving allele frequencies towards the stable equilibrium point is stronger than genetic drift. The area of trench warfare dynamics sensu stricto represents thus only a small proportion of the parameter space for which stability of the internal equilibrium occurs in the deterministic model (Figure 1). Genetic drift is thus a key factor determining the transition from the trench warfare to an arms race dynamics under the stable model, in addition to the known effects of the cost values on the stability of the interior equilibrium point (22, 23). We then compare the speed of coevolution between the two models assuming a fixed population mutation rate (θGFG =0.02). With large population size (N = 10,000) the arms race dynamics appears to run more slowly than the trench warfare only for high values of the cost u = b (u, b around 0.2) and intermediate values of the cost of disease (s = φ = 0.95; Figure 2). Noticeably, the difference in speed of coevolution occurs for parameter values (u, b, s) for which fixation of alleles never occurs in the stable model, the so-called trench warfare sensu stricto (compare Figure 1 and 2). For small population size (N = 1,000) the speed of evolution is only slightly faster in the stable model for costs u = b higher than 0.1 (Figure S3). Our results narrow down the generality of the claim that the arms race model is always slower than trench warfare. In fact, for most parameter combinations the speed of coevolution is driven by the population mutation rate (θGFG), whereas the strength of natural selection is higher than that of genetic drift only for high costs u and b, and an intermediate cost of disease (s), where selection drives the speed of coevolution. Genomic signatures of coevolution We generate expected genomic signatures at the host and parasite coevolution loci based on the frequency dynamics of the host and parasite alleles computed under the stable and unstable models with finite population sizes. We use the coalescent simulator msms (25) to generate neutral polymorphism segregating at these loci (parameters and set-up described below and in Table S1, Figures S6-S11). We summarize the expected genomic signature as the Tajima’s D (DT) for a given population mutation parameter θGFG = 0.02 assuming two haploid population sizes: N = 1,000 (Figures S4 and S5) and N = 10,000 (Figures 3 and S6) and various parameter combinations (u = b, s = φ). Balancing selection is not observed under N=1,000 due to the action of genetic drift in the stable model (Figures S4, S5). In both host and parasite unstable models, negative Tajima’s D (DT < -1) is observed for intermediate to high costs u = b and intermediate cost of disease (s), corresponding to the characteristic signature of recurrent selective sweeps (dark blue in Figures S4, S5). As expected, the unstable model exhibits more negative Tajima’s D than the unstable model. However, for small and potentially realistic values of the costs u and b (u, b < 0.05), the site frequency spectrum does not appear different from that expected under neutrality (light blue in Figures S4, S5, DT ≈ 0). This occurs because fixations of alleles occur very rapidly so that and host and parasite populations are monomorphic, i.e. have most of the time one fixed or nearly fixed allele (Figures 1 and S1). As a consequence, the polymorphic neutral SNPs at the ceovolutionary loci do not segregate following a structured coalescent. For small population sizes, N=1,000, the observed site frequency spectrum at coevolutionary loci cannot be distinguished from that of loci under neutral evolution for most of the parameter combinations (especially in host populations). We increase the population size N but keep θGFG constant (Figures 3 and S6). First, for the unstable model, similar values of Tajima’s D and thus signatures of selective sweeps are observed as for N = 1,000 (Figures S4, S5). For the unstable model, genetic drift therefore does not determine the resulting genomic signature. Second, a balancing selection signature, i.e. high Tajima’s D (DT > 1), is observed for intermediate to high values of costs u = b and an intermediate cost of disease s (Figures 3 and S6). Importantly, this signature is only observed in the parasite population but not in host sequences. 4 In the stable model (Figures 3 and S6), balancing selection is observed for parameter combinations for which the trench warfare sensu stricto is observed in Figures 1 and 2. When the costs u and b are small (below 5%) the stable model does not generate a trench warfare dynamics or balancing selection. In fact, as allele fixation takes place for such cost values (Figure 1), the structured coalescent with two alleles in host and parasite occurs only over short periods of time, thus balancing selection is not observed. Importantly, balancing selection is generated only in parasite populations in a limited area of the parameter space (Figure 3), even though the trench warfare dynamics is observed in host and parasite populations when N=10,000 (Figure 1). When comparing Figure 3 and Figure S6, we note that higher Tajima’s D (DT > 1) is observed under balancing selection if the mutation rate is smaller. Finally, the signatures of selection may be observed at different time scales. Selective sweep signatures, i.e. more negative Tajima’s D and a smaller number of segregating sites than expected under neutrality, are observable in host and parasite sequences even if selection is as recent as N haploid host generations (Figure S7). In contrast, balancing selection, i.e. more positive Tajima’s D and a higher number of segregating sites than expected under neutrality, is detectable only if selection has been acting for at least 4N generations (Figure S8). Finally, even if balancing selection is not reflected in the host site frequency spectrum (DT), it can be distinguished from neutrality by a higher amount of segregating sites (Figure S8). We explain these outcomes as follows. The trench warfare dynamics occurs for intermediate to high cost values of u and b, meaning that both host and parasite alleles are maintained in the populations without fixation. However, for such parameter values, the equilibrium frequency of the host resistance (RES) allele is small (in fact ≈ b). The effective population size of the RES type is therefore small and experiences regular bottlenecks due to allele frequency change around the equilibrium value. When generating a coalescent genealogy at the host locus, most coalescent lineages associated with the RES type may be lost due to the low frequency. Gene flow from the res class (via the mutation rate µGFG) which has a higher effective size, into the RES class homogenizes thereafter the genomic signatures between types. As a result, despite the maintenance of both RES and res types in the population, balancing selection (DT > 1) is not observed in the host population. On the contrary, as the equilibrium frequencies for parasite types are both higher than for RES types, bottlenecks due to frequency changes in parasites are less extreme than in hosts. As a result, at least two coalescent lineages for each type (ninf and INF) are maintained over long periods of time, generating typical balancing selection signatures in parasite populations. DISCUSSION A large body of the theoretical literature on coevolution, e.g. for GFG interactions, has focused on studying the stability of the internal polymorphic equilibrium in deterministic infinite size models (22, 23). We show that the stability of the internal equilibrium point in the deterministic model is a sufficient, but not necessary condition to observe the trench warfare coevolutionary dynamics. Moreover, it may be too naïve to draw a direct link between genomic signatures at host and parasite loci and coevolutionary dynamics (5, 24). We study here a simple but widely applicable genetic model of host-parasite coevolution for genes with major phenotypic effects on the outcome of infection. In animal hosts, these genes may be i) upstream or downstream components of the innate immunity system (MHC, Interferons, Toll like receptors in mammals; ref. 3), or targets of parasite effectors (3, 15), or ii) genes involved in RNA silencing pathways for virus resistance (2, 16). In plants, these are found to be i) involved in basal defense and non-host resistance (26), ii) intra-cellular targets of parasite effector molecules (guardee; ref. 27), or iii) R-genes interacting directly or indirectly with parasite effectors (1, 8). By generating expected genomic signatures at these loci, we show that balancing selection may occur only in a limited range of coevolutionary parameter values, and be mainly observable in parasite genomic data if population sizes are large enough (here N > 1,000). We thus expect genome scans to be successful to detect balancing selection in parasite genomes based on the site frequency spectrum (using Tajima’s D or Fay and Wu’s H). In hosts, population genomics studies based on the site frequency spectrum would be unsuccessful to discover genes under balancing selection as those may not deviate from neutrality. A fruitful approach, however, would be to scan for significantly high values of genetic diversity at defense genes, for example using the number of segregating sites and the Hudson-Kreitman-Aguadé (HKA) test. Our results shed thus light on the relative paucity of immunity genes under balancing selection and selective sweeps in host model organisms: Drosophila 5 melanogaster, Arabidopsis thaliana, and humans (3, 14, 15, 17). In addition, we show that selective sweeps can be observed at genes undergoing recent coevolution taking place only since N host generations. Selective sweeps may thus likely be observed at key genes for infection of crop pathogens (20, 21, 28) because selection is recent and coevolutionary dynamics in agriculture most likely follows an unstable model (24), supporting the scans for positive selection to detect genes with novel function (19-21). In contrast, balancing selection observed at a locus would have been acting for a long time and be very strong (12, 29). Our theoretical results support the findings of old balancing selection signatures, for example trans-specific at MHC loci in vertebrates (3) or between copies of a duplicated gene in plants (27). Based on this time scale difference for the detection of selection, population genomic studies would thus inflate the chances to detect arms race compared to trench warfare dynamics, even if these are comparatively common scenarios in natural populations. Population sizes and various fitness costs determine the rate of allele fixation, which occurs especially when the costs of resistance (u) and infectivity (b) are realistically small (< 5%; refs. 30, 31 but see 32). Two different scenarios of allele fixation are of interest. First, fixation of the resistance allele only occurs for small population sizes in both the stable and unstable models. Natural populations are thus expected to comprise higher frequencies of susceptible plants (33). We predict that proliferation of resistance genes in the genome by duplication may arise preferentially in small populations under strong parasite pressure (high disease incidence and prevalence, and high cost of disease). Second, fixation of infectivity occurs for a large range of parameter values in parasite populations, favoring the increase in the number of effectors in parasite genomes following gene duplication (34). By varying the population mutation rate (θGFG) we disentangle in our study the influence of genetic drift and mutation on the maintenance of polymorphism in coevolutionary GFG models (35, 36). Mutations between types (µGFG) prevent allelic fixation and shorten the waiting time for new alleles to arise in the population. Drift has on the other hand different roles in the two models: 1) it influences only the time that an allele spends at fixation (or near fixation) in the unstable model, while 2) it determines the probability of allele fixation in the stable model. As a result, the maintenance of polymorphism in the presence of genetic drift and mutation (36), is a necessary but not sufficient condition to generate observable balancing selection signatures at these genes. In fact all but one coalescent lineage may be lost for a given allelic type (RES or res, INF or ninf) due to allele frequency variation around a small equilibrium value. The cost of the resistance (u) and infectivity (b) alleles determine the value of the equilibrium values (24, 31), and are thus key in coevolutionary models to generate observable non-neutral genomic signatures. In contrast to previous suggestions (5, 9), we find that the arms race dynamics in our GFG model may not be slower compared to trench warfare. Defining the trench warfare dynamics sensu stricto by the absence of allele fixation, fast coevolutionary cycles in the stable model occurs for only for large population sizes and when costs u = b are intermediate to high, and the cost of disease is intermediate (Figures 1 and 2). We suggest that distinguishing between the two coevolutionary dynamics based on measures of host and parasite population fitness at different time points (37) is restricted to this subset of the parameter values and to populations with large sizes. As coevolution in the arms race model becomes faster when increasing the mutational input (θGFG), predicting the speed of the coevolutionary dynamics requires measuring both mutation rates and effective population size in host and parasite populations. Under our conservative model assumptions the classical expected genomic signatures of coevolution, selective sweeps or balancing selection, may often not be observed at coevolutionary loci. Combining several ecological and epidemiological characteristics which individually promote ndFDS (24) may increase stability in a deterministic model, and potentially enhance the likelihood to observe balancing selection. Our models assume for simplicity constant host and parasite population sizes, even though these sizes may vary in time due to random demographic events and density-dependent disease transmission. Such demographic changes are known random processes affecting the genome wide diversity and frequency spectrum, which would decrease the likelihood to detect genes under selection (18) in natural populations or experimental coevolution studies. 6 METHODS GFG models with finite population sizes In the classic GFG model (22) the outcome of infection is determined by one locus in haploid hosts and haploid parasites. Two alleles, resistance (RES) and susceptibility (res), are present at the host locus (similar notations as in 23). The parasite has two alleles: for infectivity (INF) and non-infectivity (ninf). Note that in the plant pathology literature, the infectivity allele is called virulent (avr), and the non-infectivity is called avirulent (AVR; see ref. 23). Infection occurs if the host is susceptible or if the parasite is infective. The first model denoted as the “unstable model” assumes one parasite generation per host generation, and all hosts receive parasite spores at every generation with frequency-dependent disease transmission. Recursion equations for the allelic frequencies and values of the equilibrium points can be formulated based on the host and parasite fitness costs (SI Appendix Section 1, ref. 23). The polymorphic equilibrium point is always unstable (23). The second model denoted as the “stable model” assumes that the parasites undergo two generations per host generation. As a plant grows between the two parasite generations, each new leaf may be infected by a spore produced either on the same plant or on another plant (called auto-infection and allo-infection, respectively, 38). We define ψ as the auto-infection rate. The recursion equations are found in the SI Appendix Section 2 (23). The fitness costs of being diseased after one or two parasite generations are ε and φ, respectively. For simplicity when comparing the stable and unstable models we assume that φ = s, and that non-infective parasites cannot infect resistant hosts (c =1). In this model, the stability of the polymorphic equilibrium point is generated by negative direct frequency-dependent selection (ndFDS; (23); SI Appendix section 2). In the deterministic model, when auto-infection is high (we use thereafter ψ = 0.95), the stability of the equilibrium point occurs over a wide range of parameter values: for 0.02 < s < 0.31, when u = b = 0.02, and for 0.05 < s < 0.55 when u = b = 0.05 (see Figure 3 in ref. 23). Host and parasite populations have sizes NH and NP, respectively, and genetic drift is introduced by binomial sampling based on allele frequencies in each host or parasite generation. Population size is assumed to be constant over time and for simplicity in our simulations N = NH = NP. Mutations between allelic types are introduced from RES to res or INF to ninf alleles and vice versa (35). Backward and forward mutations occur at a symmetric rate µ GFG per generation following a Poisson distribution. Mutation (µ GFG) and population size (N) determine the population mutation rate parameter θGFG (θGFG = 2Nµ GFG) which defines the rate of appearance of new alleles. To disentangle the influence of mutation from that of drift on the dynamics of allele frequencies, we compare characteristics of the coevolutionary dynamics which are relevant for population genetics (see below) for fixed values of the population mutation parameter (θGFG), while varying the population size between small (N = 1,000) or high (N = 10,000) values, and mutation rate µ GFG taking values of 10-4, 10-5 and 10-6. A summary of all parameters can be found in Table S1. Statistical analysis of the dynamics We analyze both models by simulating the allele dynamics over 10,000 host generations using R codes. We measure the percentage of time that host or parasite alleles are fixed. High mutation rates can prevent the fixation of alleles even if the equilibrium is unstable (35, 36). For this reason, we compute the percentage of time that allele frequencies are near fixation or extinction assuming a 5% threshold for allele detection. The speed of coevolution is measured by counting the total number of cycles for both the resistance and infectivity alleles over 10,000 generations. A cycle is defined between two consecutive maximum values of the host (parasite) allele frequencies (including allele fixation). This computation is realized by fitting a smooth spline curve to the trajectories of allele frequency with the “smooth.spline” function from the R package “stats” (smoothing parameter equal to 0.15). Various smoothing parameters (0.05 to 0.5) are preliminarily tested to find the value 0.15 which ensures robustness and accuracy of computations. The initial allele frequencies (a0 and R0 at the start of simulations) may affect the behaviour of the system due to the existence of unstable limit cycles (35). Even if we do not expect to find limit cycles in our models, all statistics are averages over 100 runs with varying initial frequencies sampled from a uniform distribution (a0 and R0 in the interval [0.01, 0.5]). We study the coevolutionary dynamics over a realistic range of parameters with u = b by varying between no costs (0) to high costs (0.3), and the costs of disease s and φ range from low (0.01) to high (0.6) values (Table S1). 7 Generating expected genomic signatures We assume constant population size, and that the INF or RES type is caused by a single SNP located in the center of each locus. Using the above described R scripts, we generate the host and parasite allele frequency dynamics for a given parameter combination assuming that selection acts over 6N haploid host generations. This is referred to as the path of allele frequency. Allele frequency paths are generated assuming an initial allele frequency of R0 = a0 = 0.1. These paths are used as separate inputs for msms. In msms, a neutral coalescent tree is generated backward in time for each locus under the allele frequency path given and the sample size n. We assume a sample size of n = 40 haploid hosts and n = 40 haploid parasites at present for which we obtain sequences. Note that similar results are obtained with larger sample size (n = 200, SI Appendix Section 3). The haploid host and parasite population sizes are NP = NH = N = 1,000 and 10,000, and two mutation rates are defined for the coalescent simulations. First, the coevolution mutation rate µ GFG defines as above the symmetrical mutation rate with values of 10-4, 10-5 and 10-6 at the central SNP determining the types (RES and res, and INF and ninf). Second, a neutral mutation rate, µ neutral, determines the appearance of neutral SNPs within each locus. The neutral population mutation rate is set to θneutral = 20 per locus (with θneutral = 2Nµ neutral). This means that for N = 1,000 and assuming a locus length of 2kb, the mutation rate is µ neutral = 2.5×10-6 (for N = 10,000, µ neutral = 2.5×10-7). This mutation rate is chosen to generate approximately 85 segregating sites per locus under a neutral model without selection (for n = 40), which allows us to be confident in the statistical comparisons to be drawn between loci. A smaller neutral mutation rate would generate smaller numbers of SNPs and thus decrease the statistical power to distinguish between different genomic signatures. Using a set of C++ codes, we compute summary statistics from our samples such as the number of segregating sites (S), the site frequency spectrum and Tajima’s D (DT, 11). For a chosen parameter combination, two levels of stochasticity have to be accounted for: 1) stochasticity among allele frequency paths due to genetic drift and mutation in the finite population size coevolutionary models, and 2) stochasticity of the coalescent process for a given frequency path. Preliminary analyses show that the variability in genomic signatures is higher among frequency paths than among replicates of the coalescent process. Therefore we simulate for each parameter combination, 2,000 host and parasite frequency paths with one coalescent simulation per path. The mean of the distributions of S and DT over the 2,000 simulations is computed to represent the expected polymorphism signature at the locus. Note that our coevolution model can be described as a structured coalescent in each host and parasite populations with the two types (RES and res, INF and ninf) linked by migration of lineages defined by the mutation rate µ GFG. Within the host and parasite populations, allele frequencies vary in time due to the coevolution process, generating a variable size for each allelic type. Preliminary analyses are used to evaluate the simulation conditions (SI Appendix Section 3). The major determinants of observable genomic signatures are the strength of selection, and the time during which selection occurs (29). As our aim is to compare the genomic signatures for different strength of selection (coevolution), the time of selection is fixed for all simulations to 6N haploid host generations. Under a simple model of strong balancing selection with two allelic types at a fixed frequency of 0.5, an excess of intermediate frequency variants in the site frequency spectrum (and high DT values) are only observed when selection occurs for at least 4N generations (SI Appendix Section 3). This corresponds to the necessary time for the structured coalescent to generate distinct SNP frequencies within allelic types. An increase of migration between allelic types due to mutation (µ GFG) or intra-locus recombination (ρ) decreases the difference in neutral allele frequencies between types, and DT converges towards zero as expected under neutrality. Being conservative, we fix the coevolution time to 6N host generations and assume no intra-locus recombination (ρ = 0). ACKNOWLEDGMENTS AT acknowledges support from DFG grant HU 1776/1 to Stephan Hutter and the German Federal Ministry of Education and Research (BMBF) within the AgroClustEr “Synbreed – Synergistic plant and animal breeding” (FKZ: 0315528I). WS was funded by DFG grants HU 1776/1 and STE 325/14. SM was funded by the European Union through the Erasmus Mundus Master Program in Evolutionary Biology. 8 REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. Dodds PN & Rathjen JP (2010) Plant immunity: towards an integrated view of plantpathogen interactions Nat. Rev. Genet. 11, 539-548. Wilfert L & Jiggins FM (2010) Host-parasite coevolution: genetic variation in a virus population and the interaction with a host gene J. Evol. Biol. 23, 1447-1455. Quintana-Murci L & Clark AG (2013) Population genetic tools for dissecting innate immunity in humans Nat. Rev. Immunol. 13, 280-293. Clarke B (1976) in Genetic aspects of host-parasite relationships, Symposium of the British Society for Parasitology, eds. Taylor AER & Muller R (Blackwell Scientific Publications, Oxford, UK), pp. 97-103. Woolhouse MEJ, Webster JP, Domingo E, Charlesworth B, & Levin BR (2002) Biological and biomedical implications of the co-evolution of pathogens and their hosts Nat. Genet. 32, 569-577. Holub EB (2001) The arms race is ancient history in Arabidopsis, the wildflower Nat. Rev. Genet. 2, 516-527. Bergelson J, Kreitman M, Stahl EA, & Tian DC (2001) Evolutionary dynamics of plant R-genes Science 292, 2281-2285. Stahl EA, Dwyer G, Mauricio R, Kreitman M, & Bergelson J (1999) Dynamics of disease resistance polymorphism at the Rpm1 locus of Arabidopsis Nature 400, 667671. Ebert D (2008) Host-parasite coevolution: Insights from the Daphnia-parasite model system Curr. Opin. Microbiol. 11, 290-301. Maynard Smith J & Haig J (1974) The hitch-hiking effect of a favourable gene Genet. Res. (Cambridge) 23, 23-35. Tajima F (1989) Statistical-method for testing the neutral mutation hypothesis by DNA polymorphism Genetics 123, 585-595. Charlesworth D (2006) Balancing selection and its effects on sequences in nearby genome regions PLoS Genet. 2, e64. Cao J, Schneeberger K., Ossowski S., Günther T., Bender S., Fitz J., Koenig D., Lanz C., Stegle O., Lippert C., et al. (2011) Whole-genome sequencing of multiple Arabidopsis thaliana populations Nat. Genet. 43, 956-963. Horton MW, Hancock AM, Huang YS, Toomajian C, Atwell S, Auton A, Muliyati NW, Platt A, Sperone FG, Vilhjalmsson BJ, et al. (2012) Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel Nat. Genet. 44, 212-216. Obbard DJ, Welch JJ, Kim KW, & Jiggins FM (2009) Quantifying adaptive evolution in the Drosophila immune system PLoS Genet 5, e1000698. Obbard DJ, Jiggins FM, Halligan DL, & Little TJ (2006) Natural selection drives extremely rapid evolution in antiviral RNAi genes Cur. Biol. 16, 580-585. Bakker EG, Toomajian C, Kreitman M, & Bergelson J (2006) A genome-wide survey of R gene polymorphisms in Arabidopsis Plant Cell 18, 1803-1818. Pavlidis P, Hutter S, & Stephan W (2008) A population genomic approach to map recent positive selection in model species Mol. Ecol. 17, 3585-3598. McCann HC, Nahal H, Thakur S, & Guttman DS (2012) Identification of novel innate immunity elicitors using molecular signatures of natural selection Proc. Natl Acad. Sci. U.S.A. 109, 4215-4220. Brunner PC, Torriani SFF, Croll D, Stukenbrock EH, & McDonald BA (2013) Coevolution and life cycle specialization of plant cell wall degrading enzymes in a hemibiotrophic pathogen Mol Biol Evol 30, 1337-1347. 9 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. Yoshida K, Schuenemann VJ, Cano LM, Pais M, Mishra B, Sharma R, Lanz C, Martin FN, Kamoun S, Krause J, et al. (2013) The rise and fall of the Phytophthora infestans lineage that triggered the Irish potato famine eLife 2, e00731-e00731. Leonard KJ (1977) Selection pressures and plant pathogens Ann. NY Acad. Sci. 287, 207-222. Tellier A & Brown JKM (2007) Stability of genetic polymorphism in host-parasite interactions Proc. R. Soc. B-Biol. Sci. 274, 809-817. Brown JKM & Tellier A (2011) Plant-parasite coevolution: bridging the gap between genetics and ecology Annu. Rev. Phytopathol. 49, 345-367. Ewing G & Hermisson J. (2010) MSMS: A coalescent simulation program including recombination, demographic structure, and selection at a single locus Bioinformatics 26, 2064-2065. Vetter MM, Kronholm I, He F, Haweker H, Reymond M, Bergelson J, Robatzek S, & de Meaux J (2012) Flagellin perception varies quantitatively in Arabidopsis thaliana and its relatives Mol. Biol. Evol. 29, 1655-1667. Hörger AC, Ilyas M, Stephan W, Tellier A, van der Hoorn RAL, & Rose LE (2012) Balancing selection at the tomato RCR3 gene family maintains variation in strength of pathogen defence PLoS Genet 7, e1002813. Stukenbrock EH, Bataillon T, Dutheil JY, Hansen TT, Li R, Zala M, McDonald BA, Wang J, & Schierup MH (2011) The making of a new pathogen: Insights from comparative population genomics of the domesticated wheat pathogen Mycosphaerella graminicola and its wild sister species Genome Res. 21, 2157-2166. Barton NH & Etheridge AM (2003) The effect of selection on genealogies Genetics 186, 1115-1131. Brown JKM (2003) A cost of disease resistance: paradigm or peculiarity? Trends Genet. 19, 667-671. Bergelson J, Dwyer G, & Emerson JJ (2001) Models and data on plant-enemy coevolution Annu. Rev. Genet. 35, 469-499. Tian D, Traw MB, Chen JQ, Kreitman M, & Bergelson J (2003) Fitness costs of Rgene-mediated resistance in Arabidopsis thaliana Nature 423, 74-77. Thrall PH, Laine A-L, Ravensdale M, Nemri A, Dodds PN, Barrett LG, & Burdon JJ (2012) Rapid genetic change underpins antagonistic coevolution in a natural hostpathogen metapopulation Ecol Letters 15, 425-435. Spanu PD, Abbott JC, Amselem J, Burgis TA, Soanes DM, Stüber K, Ver Loren van Themaat E, Brown JKM, Butcher S. A., Gurr S. J., et al. (2010) Genome expansion and gene loss in powdery mildew fungi reveal functional tradeoffs in parasitism Science 330, 1543-1546. Kirby GC & Burdon JJ (1997) Effects of mutation and random drift on Leonard's gene-for-gene coevolution model Phytopathology 87, 488-493. Salathe M, Scherer A, & Bonhoeffer S (2005) Neutral drift and polymorphism in gene-for-gene systems Ecol. Letters 8, 925-932. Gandon S, Buckling A, Decaestecker E, & Day T (2008) Host-parasite coevolution and patterns of adaptation across time and space J. Evol. Biol. 21, 1861-1866. Barrett JA (1980) Pathogen evolution in multilines and variety mixtures Z. Pflanzenk. Pflanzens.-J. Plant Dis. Prot. 87, 383-396. 10 FIGURE LEGENDS Figure 1: Percentage of time that susceptible alleles (res) are fixed or near fixation (frequency > 95%) in the stable model, as a function of the costs of resistance and infectivity (u = b) and cost of being diseased (s = φ). Other parameters are: c = 1; ψ = 0.95. The GFG population mutation rate is fixed (θGFG = 0.02), but population size N varies (N = 1,000 in the left graph, N = 10,000 in the right graph). 11 Figure 2: Number of coevolutionary cycles per 10,000 host generations for the arms race (unstable model; left graph) and trench warfare (stable model; right graph) dynamics as a function of the costs of resistance and infectivity (u = b) and cost of being diseased (s = φ). Other parameters are: c = 1; ψ = 0.95. The GFG population mutation rate is fixed (θGFG = 0.02), N = 10,000 and µGFG = 10-6. 12 Figure 3: Mean of Tajima’s D (DT) for the arms race (unstable model; left graph) and trench warfare (stable model; right graph) dynamics as a function of the costs of resistance and cost of infectivity (u = b) and the cost of being diseased (s = φ). Population size is N = 10,000, mutation rate is fixed to µGFG=10-5 (so θGFG = 0.2), c = 1; ψ = 0.95; θneutral = 20, selection acts for 6N generations and the sample size is n = 40 haploid hosts and haploid parasites. A) host locus, and B) parasite locus. 13
© Copyright 2026 Paperzz