Speed of adaptation and genomic signatures in

Speed of adaptation and genomic signatures in arms race and trench warfare models of hostparasite coevolution
Aurélien TELLIER1
Stefany MORENO-GÁMEZ2
Wolfgang STEPHAN2
1
Section of Population Genetics, Center of Life and Food Sciences Weihenstephan, Technische
Universität München, 85354 Freising, Germany
2
Section of Evolutionary Biology, LMU University of Munich, LMU BioCenter, Grosshaderner Str. 2,
82152 Planegg-Martinsried, Germany
Corresponding author:
Aurélien TELLIER
Section of Population Genetics, Center of Life and Food Sciences Weihenstephan, Technische
Universität München, 85354 Freising, Germany
Email: [email protected]
Running head: Genomic signatures of coevolution
Keywords: genetic drift, selective sweeps, balancing selection, frequency-dependent selection
ABSTRACT
Host and parasite population genomic data are increasingly used to discover novel major genes
underlying coevolution, assuming that natural selection generates two distinguishable polymorphism
patterns: selective sweeps and balancing selection. These genomic signatures would result from two
coevolutionary dynamics, the “trench warfare” with fast cycles of allele frequencies and the “arms
race“ with slow recurrent fixation of alleles. However, based on genome scans for selection, few genes
for coevolution have yet been found in hosts. To address this issue, we build a gene-for-gene model
with genetic drift, mutation and integrating coalescent simulations to study observable genomic
signatures at host and parasite loci. In contrast to the conventional wisdom, we show that
coevolutionary cycles are not faster under the trench warfare model compared to the arms race, except
for large population sizes and high values of coevolutionary costs. Based on the generated SNP
frequencies, the expected balancing selection signature under the trench warfare dynamics appears to
be only observable in parasite sequences in a limited range of parameter, if effective population sizes
are sufficiently large (>1,000) and if selection has been acting for a long time (> 4N generations). On
the other hand, the typical signature of the arms race dynamics, i.e. selective sweeps, can be detected
in parasite and to a lesser extent in host populations even if coevolution is recent. We suggest to study
signatures of coevolution via population genomics of parasites rather than hosts, and caution against
inferring coevolutionary dynamics based on the speed of coevolution.
1
INTRODUCTION
Diseases are major agents of natural selection. In both natural and domesticated species, parasites limit
plant and animal growth, alter development, and reduce seed and offspring production. There is
selection pressure on hosts for resistance to parasites and equally on parasites to overcome host
defenses. This confrontation drives coevolution, in which gene frequencies in one species determine
the fitness of genotypes of the other species, and should result in genetic diversity for resistance and
tolerance in hosts and in infectivity and virulence in parasites.
Progress in our molecular understanding of the genetic basis of resistance in hosts (humans,
animals, or plants) and infectivity in parasites (bacteria, fungi, viruses) reveal that few major defense
genes underlie these traits (1-3). Numerous theoretical analyses of gene-for-gene (GFG) or matchingallele models describe the coevolutionary dynamics at these loci based on the phenotypic outcome of
infection determined by host genotype × parasite genotype (G×G) interactions. These models describe
the coevolutionary dynamics driven by negative indirect frequency-dependent selection (niFDS): rare
alleles exhibit a fitness advantage as selection in the host population depends on allele frequencies in
the parasite population, and vice versa (4). Two types of coevolutionary dynamics (5, 6) are predicted:
1) recurrent fixation of alleles and transient polymorphism (the so called arms race dynamics, ref. 7),
2) continuous cycling of allele frequency changes (the trench warfare dynamics, ref. 8). The trench
warfare dynamics is expected to exhibit faster coevolutionary cycles than the arms race (5, 9).
Studying genomic signatures at host and parasite genes of coevolution has been used to infer
the type of coevolutionary dynamics (5). Based on deterministic models of coevolution with infinite
host and parasite population sizes, the current paradigm states that the two types of coevolutionary
dynamics should result in different observable signatures of polymorphism (5, 6). The arms race
dynamics is characterized by recurrent fixation of alleles in host and parasite populations, and is
expected to generate recurrent selective sweeps (10) at the genes of coevolution (5, 7). The typical
hitch-hiking signatures of selective sweeps include an excess of low and high frequency variants (thus
negative Tajima’s D (11) values) and a region of reduced polymorphism around the selected site (10).
On the other hand, the trench warfare dynamics (8) is expected to maintain polymorphism at both host
and parasite coevolutionary loci, resulting in the occurrence of balancing selection with two or more
alleles (5, 8). Typically, this would be observed as high levels of polymorphism and an excess of
intermediate frequency variants around the selected site reflected by positive Tajima’s D values (12).
With technological advances and the sequencing of numerous plant and animal genomes
among and within populations, e.g. in Arabidopsis thaliana, humans, and Drosophila, it becomes
feasible to detect host genes under selection which underlie host-parasite coevolution (3, 13-15).
However, results from recent studies suggest that few new genes underlying coevolution are detected
in hosts based on selective sweeps and balancing selection signatures (for example in 13-15). In fact,
most genes underlying coevolution showing a genomic signature of selection had been identified
based on a limited set of candidate genes and sequenced individuals (7, 8, 16, 17). The relative lack of
additional evidence for genes under coevolution may lie in 1) the difficulty to disentangle the genomic
signatures of selection from demographic effects (18) and/or 2) the inadequacy of the theoretical
coevolutionary models to predict the observable complex patterns of molecular diversity. Predicting
the polymorphism signatures at loci underlying coevolution is also of importance for parasite
population genomics studies, as scans for genomic signatures of selection are becoming key methods
to discover novel genes with essential function for infection (19-21).
A logical link, potentially based on naïve premises, has traditionally been suggested between
three levels of study of coevolution: 1) the mathematical behavior of the deterministic coevolutionary
model, i.e. the stability of the polymorphic equilibrium point, 2) the occurrence of the arms race or
trench warfare dynamics, and 3) the incidence of selective sweeps or balancing selection in population
genomic data (5). Most theoretical models of coevolution have thus focused on the ecological and
epidemiological conditions for the stability of the polymorphic equilibrium while assuming infinite
population sizes (22-24). It has been shown that in such models costs of resistance and infectivity are
necessary but not sufficient for the stability of the polymorphic equilibrium point. A mathematical
condition, negative direct frequency-dependent selection (ndFDS), is however required for stability of
the equilibrium and is promoted by various life-history traits such as parasite polycyclicity (23, 24).
Nonetheless, theoretical studies have not investigated so far the parameter conditions and model
assumptions under which ndFDS would generate 1) a trench warfare dynamics for a realistic finite
2
population model, and 2) balancing selection signatures in the genome. Understanding the link
between mathematical stability, coevolutionary dynamics and genomic signatures is essential for
predicting the occurrence of genomic signatures of coevolution in natural populations, but also for
future statistical inference of coevolutionary parameters based on host and parasite sequence data.
To address this major shortcoming of coevolutionary theory, we construct here realistic
models of coevolution in finite size populations and generate polymorphism data at coevolution genes.
We focus in the following on the gene-for-gene (GFG) relationship which receives strong support
from empirical studies in plants and their parasites (1), as well as in invertebrates (2). We propose a
clear distinction between three levels of coevolutionary theory. First, the stability or instability of a
model is defined as the mathematical behavior of the deterministic coevolutionary model (in the form
of a dynamical system) around the polymorphic equilibrium point. Second, in finite populations of
hosts and parasites, the allele frequency dynamics can be defined as arms race dynamics when alleles
are recurrently fixed in the populations (or at frequency below a threshold of detection), or as a trench
warfare dynamics, when there is maintenance of alleles in the host and parasite populations at
observable frequencies. Finally, selective sweep and balancing selection are defined as the genomic
signatures of polymorphism to be observed at these genes in a sample of several host and parasite
DNA sequences. We aim to demonstrate the connection between these three levels of coevolutionary
theory, and to predict the genomic signatures at genes of coevolution.
We first quantify the properties of the coevolutionary dynamics by comparing finite
population size versions of models without and with ndFDS (23). It appears that the speed of the arms
race dynamics, measured as the period of cycles, is not slower than the trench warfare models for a
wide range of parameter values. Moreover, stable polymorphism is found to generate sensu stricto
trench warfare dynamics only for high costs of resistance and infectivity. We then simulate observable
patterns of genetic polymorphism (SNPs) at host and parasite interacting loci, using a coalescent
simulator (25). We show that the typical signatures of selective sweeps and balancing selection are
only observed in a very small range of parameter values, and that balancing selection is more likely to
be detected in parasite populations.
RESULTS
Arms race versus trench warfare dynamics
We compare two GFG models with resistance (RES) and susceptible (res) alleles in the host and
infectivity (INF) and non-infectivity (ninf) in the parasite. The deterministic “unstable model”
assuming only niFDS has an unstable polymorphic equilibrium point, and is thus expected to generate
an arms race dynamics in a finite population (23). Four costs are associated with the simple GFG
model. The costs of resistance (u) and infectivity (b) are fitness costs associated with the
corresponding allele (where 0 < u, b <1). Hosts can also exhibit a fitness cost of being diseased
(denoted by s, with 0 < s <1). Finally, c is the cost to the non-infective parasite of not being able to
infect the host (where c ≈ 1 in GFG models). On the other hand, the deterministic “stable model”
assumes that the parasites undergo two generations per host generation, which generates ndFDS for
some parameter values (23). The mathematical condition of stability of the non-trivial polymorphic
equilibrium is expected to give rise to a trench warfare dynamics over a large parameter range. Finite
population sizes (N) and mutation (µGFG), as well as the population mutation rate (θGFG = 2NµGFG) are
introduced for both models. Genetic drift is caused by binomial sampling based on allele frequencies
at each generation. Parasites undergo thus once genetic drift in the unstable model, but twice in the
stable model.
In the unstable model, the main determinant of the speed of coevolution is the population
mutation rate (θGFG), while the costs (u, b, s) play a minor role. The speed of coevolution increases
with higher values of θGFG but does not depend on the population size alone (Figure S1). Population
size, i.e. genetic drift, determines thus only the fixation of alleles and the time alleles spend near
fixation (Figure S2), while the population mutation rate (θGFG) determines the rate at which new
mutations arise in the population and thus the incidence of coevolutionary cycles. Importantly, fixation
of resistance alleles can occur only for a small population size and an intermediate to high cost of
disease, i.e. when drift and selection for resistance are strong (N < 5,000, Figure S2). In the absence of
costs (u = b = 0) infective and susceptible alleles are fixed. Following expectations from the
deterministic unstable model (23), increasing values of the costs u and s move the equilibrium point
3
away from the boundaries in the phase plan (SI Appendix section 1), which increases the strength of
selection for resistance and infective alleles, and the speed of coevolution (Figures S1 and S2).
For the stable model, by fixing the population mutation rate θGFG we show that genetic drift
per se determines the probability of allele fixation leading to dynamics similar to the arms race in
small populations (Figure 1). In the stable model, smaller population sizes, i.e. stronger genetic drift,
pulls away the allelic frequencies from the interior equilibrium, thus increasing the fixation probability
of alleles even under high mutation rates. Stable long-term polymorphism, defined as a near zero
probability of allele fixation, arises only for large population size (N = 10,000) and a limited range of
cost values (intermediate to high costs u and b, low to intermediate cost s, Figure 1). In this range of
parameter values, natural selection driving allele frequencies towards the stable equilibrium point is
stronger than genetic drift. The area of trench warfare dynamics sensu stricto represents thus only a
small proportion of the parameter space for which stability of the internal equilibrium occurs in the
deterministic model (Figure 1). Genetic drift is thus a key factor determining the transition from the
trench warfare to an arms race dynamics under the stable model, in addition to the known effects of
the cost values on the stability of the interior equilibrium point (22, 23).
We then compare the speed of coevolution between the two models assuming a fixed
population mutation rate (θGFG =0.02). With large population size (N = 10,000) the arms race
dynamics appears to run more slowly than the trench warfare only for high values of the cost u = b (u,
b around 0.2) and intermediate values of the cost of disease (s = φ = 0.95; Figure 2). Noticeably, the
difference in speed of coevolution occurs for parameter values (u, b, s) for which fixation of alleles
never occurs in the stable model, the so-called trench warfare sensu stricto (compare Figure 1 and 2).
For small population size (N = 1,000) the speed of evolution is only slightly faster in the stable model
for costs u = b higher than 0.1 (Figure S3). Our results narrow down the generality of the claim that
the arms race model is always slower than trench warfare. In fact, for most parameter combinations
the speed of coevolution is driven by the population mutation rate (θGFG), whereas the strength of
natural selection is higher than that of genetic drift only for high costs u and b, and an intermediate
cost of disease (s), where selection drives the speed of coevolution.
Genomic signatures of coevolution
We generate expected genomic signatures at the host and parasite coevolution loci based on the
frequency dynamics of the host and parasite alleles computed under the stable and unstable models
with finite population sizes. We use the coalescent simulator msms (25) to generate neutral
polymorphism segregating at these loci (parameters and set-up described below and in Table S1,
Figures S6-S11). We summarize the expected genomic signature as the Tajima’s D (DT) for a given
population mutation parameter θGFG = 0.02 assuming two haploid population sizes: N = 1,000 (Figures
S4 and S5) and N = 10,000 (Figures 3 and S6) and various parameter combinations (u = b, s = φ).
Balancing selection is not observed under N=1,000 due to the action of genetic drift in the stable
model (Figures S4, S5). In both host and parasite unstable models, negative Tajima’s D (DT < -1) is
observed for intermediate to high costs u = b and intermediate cost of disease (s), corresponding to the
characteristic signature of recurrent selective sweeps (dark blue in Figures S4, S5). As expected, the
unstable model exhibits more negative Tajima’s D than the unstable model. However, for small and
potentially realistic values of the costs u and b (u, b < 0.05), the site frequency spectrum does not
appear different from that expected under neutrality (light blue in Figures S4, S5, DT ≈ 0). This occurs
because fixations of alleles occur very rapidly so that and host and parasite populations are
monomorphic, i.e. have most of the time one fixed or nearly fixed allele (Figures 1 and S1). As a
consequence, the polymorphic neutral SNPs at the ceovolutionary loci do not segregate following a
structured coalescent. For small population sizes, N=1,000, the observed site frequency spectrum at
coevolutionary loci cannot be distinguished from that of loci under neutral evolution for most of the
parameter combinations (especially in host populations).
We increase the population size N but keep θGFG constant (Figures 3 and S6). First, for the
unstable model, similar values of Tajima’s D and thus signatures of selective sweeps are observed as
for N = 1,000 (Figures S4, S5). For the unstable model, genetic drift therefore does not determine the
resulting genomic signature. Second, a balancing selection signature, i.e. high Tajima’s D (DT > 1), is
observed for intermediate to high values of costs u = b and an intermediate cost of disease s (Figures 3
and S6). Importantly, this signature is only observed in the parasite population but not in host
sequences.
4
In the stable model (Figures 3 and S6), balancing selection is observed for parameter
combinations for which the trench warfare sensu stricto is observed in Figures 1 and 2. When the costs
u and b are small (below 5%) the stable model does not generate a trench warfare dynamics or
balancing selection. In fact, as allele fixation takes place for such cost values (Figure 1), the structured
coalescent with two alleles in host and parasite occurs only over short periods of time, thus balancing
selection is not observed. Importantly, balancing selection is generated only in parasite populations in
a limited area of the parameter space (Figure 3), even though the trench warfare dynamics is observed
in host and parasite populations when N=10,000 (Figure 1). When comparing Figure 3 and Figure S6,
we note that higher Tajima’s D (DT > 1) is observed under balancing selection if the mutation rate is
smaller. Finally, the signatures of selection may be observed at different time scales. Selective sweep
signatures, i.e. more negative Tajima’s D and a smaller number of segregating sites than expected
under neutrality, are observable in host and parasite sequences even if selection is as recent as N
haploid host generations (Figure S7). In contrast, balancing selection, i.e. more positive Tajima’s D
and a higher number of segregating sites than expected under neutrality, is detectable only if selection
has been acting for at least 4N generations (Figure S8). Finally, even if balancing selection is not
reflected in the host site frequency spectrum (DT), it can be distinguished from neutrality by a higher
amount of segregating sites (Figure S8).
We explain these outcomes as follows. The trench warfare dynamics occurs for intermediate
to high cost values of u and b, meaning that both host and parasite alleles are maintained in the
populations without fixation. However, for such parameter values, the equilibrium frequency of the
host resistance (RES) allele is small (in fact ≈ b). The effective population size of the RES type is
therefore small and experiences regular bottlenecks due to allele frequency change around the
equilibrium value. When generating a coalescent genealogy at the host locus, most coalescent lineages
associated with the RES type may be lost due to the low frequency. Gene flow from the res class (via
the mutation rate µGFG) which has a higher effective size, into the RES class homogenizes thereafter
the genomic signatures between types. As a result, despite the maintenance of both RES and res types
in the population, balancing selection (DT > 1) is not observed in the host population. On the contrary,
as the equilibrium frequencies for parasite types are both higher than for RES types, bottlenecks due to
frequency changes in parasites are less extreme than in hosts. As a result, at least two coalescent
lineages for each type (ninf and INF) are maintained over long periods of time, generating typical
balancing selection signatures in parasite populations.
DISCUSSION
A large body of the theoretical literature on coevolution, e.g. for GFG interactions, has focused on
studying the stability of the internal polymorphic equilibrium in deterministic infinite size models (22,
23). We show that the stability of the internal equilibrium point in the deterministic model is a
sufficient, but not necessary condition to observe the trench warfare coevolutionary dynamics.
Moreover, it may be too naïve to draw a direct link between genomic signatures at host and parasite
loci and coevolutionary dynamics (5, 24). We study here a simple but widely applicable genetic model
of host-parasite coevolution for genes with major phenotypic effects on the outcome of infection. In
animal hosts, these genes may be i) upstream or downstream components of the innate immunity
system (MHC, Interferons, Toll like receptors in mammals; ref. 3), or targets of parasite effectors (3,
15), or ii) genes involved in RNA silencing pathways for virus resistance (2, 16). In plants, these are
found to be i) involved in basal defense and non-host resistance (26), ii) intra-cellular targets of
parasite effector molecules (guardee; ref. 27), or iii) R-genes interacting directly or indirectly with
parasite effectors (1, 8).
By generating expected genomic signatures at these loci, we show that balancing selection
may occur only in a limited range of coevolutionary parameter values, and be mainly observable in
parasite genomic data if population sizes are large enough (here N > 1,000). We thus expect genome
scans to be successful to detect balancing selection in parasite genomes based on the site frequency
spectrum (using Tajima’s D or Fay and Wu’s H). In hosts, population genomics studies based on the
site frequency spectrum would be unsuccessful to discover genes under balancing selection as those
may not deviate from neutrality. A fruitful approach, however, would be to scan for significantly high
values of genetic diversity at defense genes, for example using the number of segregating sites and the
Hudson-Kreitman-Aguadé (HKA) test. Our results shed thus light on the relative paucity of immunity
genes under balancing selection and selective sweeps in host model organisms: Drosophila
5
melanogaster, Arabidopsis thaliana, and humans (3, 14, 15, 17). In addition, we show that selective
sweeps can be observed at genes undergoing recent coevolution taking place only since N host
generations. Selective sweeps may thus likely be observed at key genes for infection of crop pathogens
(20, 21, 28) because selection is recent and coevolutionary dynamics in agriculture most likely follows
an unstable model (24), supporting the scans for positive selection to detect genes with novel function
(19-21). In contrast, balancing selection observed at a locus would have been acting for a long time
and be very strong (12, 29). Our theoretical results support the findings of old balancing selection
signatures, for example trans-specific at MHC loci in vertebrates (3) or between copies of a duplicated
gene in plants (27). Based on this time scale difference for the detection of selection, population
genomic studies would thus inflate the chances to detect arms race compared to trench warfare
dynamics, even if these are comparatively common scenarios in natural populations.
Population sizes and various fitness costs determine the rate of allele fixation, which occurs
especially when the costs of resistance (u) and infectivity (b) are realistically small (< 5%; refs. 30, 31
but see 32). Two different scenarios of allele fixation are of interest. First, fixation of the resistance
allele only occurs for small population sizes in both the stable and unstable models. Natural
populations are thus expected to comprise higher frequencies of susceptible plants (33). We predict
that proliferation of resistance genes in the genome by duplication may arise preferentially in small
populations under strong parasite pressure (high disease incidence and prevalence, and high cost of
disease). Second, fixation of infectivity occurs for a large range of parameter values in parasite
populations, favoring the increase in the number of effectors in parasite genomes following gene
duplication (34).
By varying the population mutation rate (θGFG) we disentangle in our study the influence of
genetic drift and mutation on the maintenance of polymorphism in coevolutionary GFG models (35,
36). Mutations between types (µGFG) prevent allelic fixation and shorten the waiting time for new
alleles to arise in the population. Drift has on the other hand different roles in the two models: 1) it
influences only the time that an allele spends at fixation (or near fixation) in the unstable model, while
2) it determines the probability of allele fixation in the stable model. As a result, the maintenance of
polymorphism in the presence of genetic drift and mutation (36), is a necessary but not sufficient
condition to generate observable balancing selection signatures at these genes. In fact all but one
coalescent lineage may be lost for a given allelic type (RES or res, INF or ninf) due to allele frequency
variation around a small equilibrium value. The cost of the resistance (u) and infectivity (b) alleles
determine the value of the equilibrium values (24, 31), and are thus key in coevolutionary models to
generate observable non-neutral genomic signatures.
In contrast to previous suggestions (5, 9), we find that the arms race dynamics in our GFG
model may not be slower compared to trench warfare. Defining the trench warfare dynamics sensu
stricto by the absence of allele fixation, fast coevolutionary cycles in the stable model occurs for only
for large population sizes and when costs u = b are intermediate to high, and the cost of disease is
intermediate (Figures 1 and 2). We suggest that distinguishing between the two coevolutionary
dynamics based on measures of host and parasite population fitness at different time points (37) is
restricted to this subset of the parameter values and to populations with large sizes. As coevolution in
the arms race model becomes faster when increasing the mutational input (θGFG), predicting the speed
of the coevolutionary dynamics requires measuring both mutation rates and effective population size
in host and parasite populations.
Under our conservative model assumptions the classical expected genomic signatures of
coevolution, selective sweeps or balancing selection, may often not be observed at coevolutionary loci.
Combining several ecological and epidemiological characteristics which individually promote ndFDS
(24) may increase stability in a deterministic model, and potentially enhance the likelihood to observe
balancing selection. Our models assume for simplicity constant host and parasite population sizes,
even though these sizes may vary in time due to random demographic events and density-dependent
disease transmission. Such demographic changes are known random processes affecting the genome
wide diversity and frequency spectrum, which would decrease the likelihood to detect genes under
selection (18) in natural populations or experimental coevolution studies.
6
METHODS
GFG models with finite population sizes
In the classic GFG model (22) the outcome of infection is determined by one locus in haploid hosts
and haploid parasites. Two alleles, resistance (RES) and susceptibility (res), are present at the host
locus (similar notations as in 23). The parasite has two alleles: for infectivity (INF) and non-infectivity
(ninf). Note that in the plant pathology literature, the infectivity allele is called virulent (avr), and the
non-infectivity is called avirulent (AVR; see ref. 23). Infection occurs if the host is susceptible or if the
parasite is infective.
The first model denoted as the “unstable model” assumes one parasite generation per host
generation, and all hosts receive parasite spores at every generation with frequency-dependent disease
transmission. Recursion equations for the allelic frequencies and values of the equilibrium points can
be formulated based on the host and parasite fitness costs (SI Appendix Section 1, ref. 23). The
polymorphic equilibrium point is always unstable (23).
The second model denoted as the “stable model” assumes that the parasites undergo two
generations per host generation. As a plant grows between the two parasite generations, each new leaf
may be infected by a spore produced either on the same plant or on another plant (called auto-infection
and allo-infection, respectively, 38). We define ψ as the auto-infection rate. The recursion equations
are found in the SI Appendix Section 2 (23). The fitness costs of being diseased after one or two
parasite generations are ε and φ, respectively. For simplicity when comparing the stable and unstable
models we assume that φ = s, and that non-infective parasites cannot infect resistant hosts (c =1). In
this model, the stability of the polymorphic equilibrium point is generated by negative direct
frequency-dependent selection (ndFDS; (23); SI Appendix section 2). In the deterministic model,
when auto-infection is high (we use thereafter ψ = 0.95), the stability of the equilibrium point occurs
over a wide range of parameter values: for 0.02 < s < 0.31, when u = b = 0.02, and for 0.05 < s < 0.55
when u = b = 0.05 (see Figure 3 in ref. 23).
Host and parasite populations have sizes NH and NP, respectively, and genetic drift is
introduced by binomial sampling based on allele frequencies in each host or parasite generation.
Population size is assumed to be constant over time and for simplicity in our simulations N = NH = NP.
Mutations between allelic types are introduced from RES to res or INF to ninf alleles and vice versa
(35). Backward and forward mutations occur at a symmetric rate µ GFG per generation following a
Poisson distribution. Mutation (µ GFG) and population size (N) determine the population mutation rate
parameter θGFG (θGFG = 2Nµ GFG) which defines the rate of appearance of new alleles. To disentangle
the influence of mutation from that of drift on the dynamics of allele frequencies, we compare
characteristics of the coevolutionary dynamics which are relevant for population genetics (see below)
for fixed values of the population mutation parameter (θGFG), while varying the population size
between small (N = 1,000) or high (N = 10,000) values, and mutation rate µ GFG taking values of 10-4,
10-5 and 10-6. A summary of all parameters can be found in Table S1.
Statistical analysis of the dynamics
We analyze both models by simulating the allele dynamics over 10,000 host generations using R
codes. We measure the percentage of time that host or parasite alleles are fixed. High mutation rates
can prevent the fixation of alleles even if the equilibrium is unstable (35, 36). For this reason, we
compute the percentage of time that allele frequencies are near fixation or extinction assuming a 5%
threshold for allele detection. The speed of coevolution is measured by counting the total number of
cycles for both the resistance and infectivity alleles over 10,000 generations. A cycle is defined
between two consecutive maximum values of the host (parasite) allele frequencies (including allele
fixation). This computation is realized by fitting a smooth spline curve to the trajectories of allele
frequency with the “smooth.spline” function from the R package “stats” (smoothing parameter equal
to 0.15). Various smoothing parameters (0.05 to 0.5) are preliminarily tested to find the value 0.15
which ensures robustness and accuracy of computations. The initial allele frequencies (a0 and R0 at the
start of simulations) may affect the behaviour of the system due to the existence of unstable limit
cycles (35). Even if we do not expect to find limit cycles in our models, all statistics are averages over
100 runs with varying initial frequencies sampled from a uniform distribution (a0 and R0 in the interval
[0.01, 0.5]). We study the coevolutionary dynamics over a realistic range of parameters with u = b by
varying between no costs (0) to high costs (0.3), and the costs of disease s and φ range from low (0.01)
to high (0.6) values (Table S1).
7
Generating expected genomic signatures
We assume constant population size, and that the INF or RES type is caused by a single SNP located
in the center of each locus. Using the above described R scripts, we generate the host and parasite
allele frequency dynamics for a given parameter combination assuming that selection acts over 6N
haploid host generations. This is referred to as the path of allele frequency. Allele frequency paths are
generated assuming an initial allele frequency of R0 = a0 = 0.1. These paths are used as separate inputs
for msms. In msms, a neutral coalescent tree is generated backward in time for each locus under the
allele frequency path given and the sample size n. We assume a sample size of n = 40 haploid hosts
and n = 40 haploid parasites at present for which we obtain sequences. Note that similar results are
obtained with larger sample size (n = 200, SI Appendix Section 3). The haploid host and parasite
population sizes are NP = NH = N = 1,000 and 10,000, and two mutation rates are defined for the
coalescent simulations. First, the coevolution mutation rate µ GFG defines as above the symmetrical
mutation rate with values of 10-4, 10-5 and 10-6 at the central SNP determining the types (RES and res,
and INF and ninf). Second, a neutral mutation rate, µ neutral, determines the appearance of neutral SNPs
within each locus. The neutral population mutation rate is set to θneutral = 20 per locus (with θneutral =
2Nµ neutral). This means that for N = 1,000 and assuming a locus length of 2kb, the mutation rate is
µ neutral = 2.5×10-6 (for N = 10,000, µ neutral = 2.5×10-7). This mutation rate is chosen to generate
approximately 85 segregating sites per locus under a neutral model without selection (for n = 40),
which allows us to be confident in the statistical comparisons to be drawn between loci. A smaller
neutral mutation rate would generate smaller numbers of SNPs and thus decrease the statistical power
to distinguish between different genomic signatures. Using a set of C++ codes, we compute summary
statistics from our samples such as the number of segregating sites (S), the site frequency spectrum
and Tajima’s D (DT, 11).
For a chosen parameter combination, two levels of stochasticity have to be accounted for: 1)
stochasticity among allele frequency paths due to genetic drift and mutation in the finite population
size coevolutionary models, and 2) stochasticity of the coalescent process for a given frequency path.
Preliminary analyses show that the variability in genomic signatures is higher among frequency paths
than among replicates of the coalescent process. Therefore we simulate for each parameter
combination, 2,000 host and parasite frequency paths with one coalescent simulation per path. The
mean of the distributions of S and DT over the 2,000 simulations is computed to represent the expected
polymorphism signature at the locus. Note that our coevolution model can be described as a structured
coalescent in each host and parasite populations with the two types (RES and res, INF and ninf) linked
by migration of lineages defined by the mutation rate µ GFG. Within the host and parasite populations,
allele frequencies vary in time due to the coevolution process, generating a variable size for each
allelic type.
Preliminary analyses are used to evaluate the simulation conditions (SI Appendix Section 3).
The major determinants of observable genomic signatures are the strength of selection, and the time
during which selection occurs (29). As our aim is to compare the genomic signatures for different
strength of selection (coevolution), the time of selection is fixed for all simulations to 6N haploid host
generations. Under a simple model of strong balancing selection with two allelic types at a fixed
frequency of 0.5, an excess of intermediate frequency variants in the site frequency spectrum (and high
DT values) are only observed when selection occurs for at least 4N generations (SI Appendix Section
3). This corresponds to the necessary time for the structured coalescent to generate distinct SNP
frequencies within allelic types. An increase of migration between allelic types due to mutation (µ GFG)
or intra-locus recombination (ρ) decreases the difference in neutral allele frequencies between types,
and DT converges towards zero as expected under neutrality. Being conservative, we fix the
coevolution time to 6N host generations and assume no intra-locus recombination (ρ = 0).
ACKNOWLEDGMENTS
AT acknowledges support from DFG grant HU 1776/1 to Stephan Hutter and the German Federal
Ministry of Education and Research (BMBF) within the AgroClustEr “Synbreed – Synergistic plant
and animal breeding” (FKZ: 0315528I). WS was funded by DFG grants HU 1776/1 and STE 325/14.
SM was funded by the European Union through the Erasmus Mundus Master Program in Evolutionary
Biology.
8
REFERENCES
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
Dodds PN & Rathjen JP (2010) Plant immunity: towards an integrated view of plantpathogen interactions Nat. Rev. Genet. 11, 539-548.
Wilfert L & Jiggins FM (2010) Host-parasite coevolution: genetic variation in a virus
population and the interaction with a host gene J. Evol. Biol. 23, 1447-1455.
Quintana-Murci L & Clark AG (2013) Population genetic tools for dissecting innate
immunity in humans Nat. Rev. Immunol. 13, 280-293.
Clarke B (1976) in Genetic aspects of host-parasite relationships, Symposium of the
British Society for Parasitology, eds. Taylor AER & Muller R (Blackwell Scientific
Publications, Oxford, UK), pp. 97-103.
Woolhouse MEJ, Webster JP, Domingo E, Charlesworth B, & Levin BR (2002)
Biological and biomedical implications of the co-evolution of pathogens and their
hosts Nat. Genet. 32, 569-577.
Holub EB (2001) The arms race is ancient history in Arabidopsis, the wildflower Nat.
Rev. Genet. 2, 516-527.
Bergelson J, Kreitman M, Stahl EA, & Tian DC (2001) Evolutionary dynamics of
plant R-genes Science 292, 2281-2285.
Stahl EA, Dwyer G, Mauricio R, Kreitman M, & Bergelson J (1999) Dynamics of
disease resistance polymorphism at the Rpm1 locus of Arabidopsis Nature 400, 667671.
Ebert D (2008) Host-parasite coevolution: Insights from the Daphnia-parasite model
system Curr. Opin. Microbiol. 11, 290-301.
Maynard Smith J & Haig J (1974) The hitch-hiking effect of a favourable gene Genet.
Res. (Cambridge) 23, 23-35.
Tajima F (1989) Statistical-method for testing the neutral mutation hypothesis by
DNA polymorphism Genetics 123, 585-595.
Charlesworth D (2006) Balancing selection and its effects on sequences in nearby
genome regions PLoS Genet. 2, e64.
Cao J, Schneeberger K., Ossowski S., Günther T., Bender S., Fitz J., Koenig D., Lanz
C., Stegle O., Lippert C., et al. (2011) Whole-genome sequencing of multiple
Arabidopsis thaliana populations Nat. Genet. 43, 956-963.
Horton MW, Hancock AM, Huang YS, Toomajian C, Atwell S, Auton A, Muliyati
NW, Platt A, Sperone FG, Vilhjalmsson BJ, et al. (2012) Genome-wide patterns of
genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap
panel Nat. Genet. 44, 212-216.
Obbard DJ, Welch JJ, Kim KW, & Jiggins FM (2009) Quantifying adaptive evolution
in the Drosophila immune system PLoS Genet 5, e1000698.
Obbard DJ, Jiggins FM, Halligan DL, & Little TJ (2006) Natural selection drives
extremely rapid evolution in antiviral RNAi genes Cur. Biol. 16, 580-585.
Bakker EG, Toomajian C, Kreitman M, & Bergelson J (2006) A genome-wide survey
of R gene polymorphisms in Arabidopsis Plant Cell 18, 1803-1818.
Pavlidis P, Hutter S, & Stephan W (2008) A population genomic approach to map
recent positive selection in model species Mol. Ecol. 17, 3585-3598.
McCann HC, Nahal H, Thakur S, & Guttman DS (2012) Identification of novel innate
immunity elicitors using molecular signatures of natural selection Proc. Natl Acad.
Sci. U.S.A. 109, 4215-4220.
Brunner PC, Torriani SFF, Croll D, Stukenbrock EH, & McDonald BA (2013)
Coevolution and life cycle specialization of plant cell wall degrading enzymes in a
hemibiotrophic pathogen Mol Biol Evol 30, 1337-1347.
9
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
Yoshida K, Schuenemann VJ, Cano LM, Pais M, Mishra B, Sharma R, Lanz C,
Martin FN, Kamoun S, Krause J, et al. (2013) The rise and fall of the Phytophthora
infestans lineage that triggered the Irish potato famine eLife 2, e00731-e00731.
Leonard KJ (1977) Selection pressures and plant pathogens Ann. NY Acad. Sci. 287,
207-222.
Tellier A & Brown JKM (2007) Stability of genetic polymorphism in host-parasite
interactions Proc. R. Soc. B-Biol. Sci. 274, 809-817.
Brown JKM & Tellier A (2011) Plant-parasite coevolution: bridging the gap between
genetics and ecology Annu. Rev. Phytopathol. 49, 345-367.
Ewing G & Hermisson J. (2010) MSMS: A coalescent simulation program including
recombination, demographic structure, and selection at a single locus Bioinformatics
26, 2064-2065.
Vetter MM, Kronholm I, He F, Haweker H, Reymond M, Bergelson J, Robatzek S, &
de Meaux J (2012) Flagellin perception varies quantitatively in Arabidopsis thaliana
and its relatives Mol. Biol. Evol. 29, 1655-1667.
Hörger AC, Ilyas M, Stephan W, Tellier A, van der Hoorn RAL, & Rose LE (2012)
Balancing selection at the tomato RCR3 gene family maintains variation in strength of
pathogen defence PLoS Genet 7, e1002813.
Stukenbrock EH, Bataillon T, Dutheil JY, Hansen TT, Li R, Zala M, McDonald BA,
Wang J, & Schierup MH (2011) The making of a new pathogen: Insights from
comparative population genomics of the domesticated wheat pathogen
Mycosphaerella graminicola and its wild sister species Genome Res. 21, 2157-2166.
Barton NH & Etheridge AM (2003) The effect of selection on genealogies Genetics
186, 1115-1131.
Brown JKM (2003) A cost of disease resistance: paradigm or peculiarity? Trends
Genet. 19, 667-671.
Bergelson J, Dwyer G, & Emerson JJ (2001) Models and data on plant-enemy
coevolution Annu. Rev. Genet. 35, 469-499.
Tian D, Traw MB, Chen JQ, Kreitman M, & Bergelson J (2003) Fitness costs of Rgene-mediated resistance in Arabidopsis thaliana Nature 423, 74-77.
Thrall PH, Laine A-L, Ravensdale M, Nemri A, Dodds PN, Barrett LG, & Burdon JJ
(2012) Rapid genetic change underpins antagonistic coevolution in a natural hostpathogen metapopulation Ecol Letters 15, 425-435.
Spanu PD, Abbott JC, Amselem J, Burgis TA, Soanes DM, Stüber K, Ver Loren van
Themaat E, Brown JKM, Butcher S. A., Gurr S. J., et al. (2010) Genome expansion
and gene loss in powdery mildew fungi reveal functional tradeoffs in parasitism
Science 330, 1543-1546.
Kirby GC & Burdon JJ (1997) Effects of mutation and random drift on Leonard's
gene-for-gene coevolution model Phytopathology 87, 488-493.
Salathe M, Scherer A, & Bonhoeffer S (2005) Neutral drift and polymorphism in
gene-for-gene systems Ecol. Letters 8, 925-932.
Gandon S, Buckling A, Decaestecker E, & Day T (2008) Host-parasite coevolution
and patterns of adaptation across time and space J. Evol. Biol. 21, 1861-1866.
Barrett JA (1980) Pathogen evolution in multilines and variety mixtures Z. Pflanzenk.
Pflanzens.-J. Plant Dis. Prot. 87, 383-396.
10
FIGURE LEGENDS
Figure 1: Percentage of time that susceptible alleles (res) are fixed or near fixation (frequency > 95%)
in the stable model, as a function of the costs of resistance and infectivity (u = b) and cost of being
diseased (s = φ). Other parameters are: c = 1; ψ = 0.95. The GFG population mutation rate is fixed
(θGFG = 0.02), but population size N varies (N = 1,000 in the left graph, N = 10,000 in the right graph).
11
Figure 2: Number of coevolutionary cycles per 10,000 host generations for the arms race (unstable
model; left graph) and trench warfare (stable model; right graph) dynamics as a function of the costs of
resistance and infectivity (u = b) and cost of being diseased (s = φ). Other parameters are: c = 1; ψ =
0.95. The GFG population mutation rate is fixed (θGFG = 0.02), N = 10,000 and µGFG = 10-6.
12
Figure 3: Mean of Tajima’s D (DT) for the arms race (unstable model; left graph) and trench warfare
(stable model; right graph) dynamics as a function of the costs of resistance and cost of infectivity (u =
b) and the cost of being diseased (s = φ). Population size is N = 10,000, mutation rate is fixed to
µGFG=10-5 (so θGFG = 0.2), c = 1; ψ = 0.95; θneutral = 20, selection acts for 6N generations and the
sample size is n = 40 haploid hosts and haploid parasites. A) host locus, and B) parasite locus.
13