Comment on `A congruence index for testing topological similarity

BIOINFORMATICS LETTER TO THE EDITOR
Vol. 25 no. 1 2009, pages 147–149
doi:10.1093/bioinformatics/btn539
Phylogenetics
Comment on ‘A congruence index for testing topological
similarity between trees’
Anne Kupczok∗ and Arndt von Haeseler
Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, University of Vienna, Medical University of
Vienna, University of Veterinary Medicine Vienna, Dr. Bohr-Gasse 9/6, A-1030 Vienna, Austria
Received on August 21, 2008; revised and accepted on October 13, 2008
Advance Access publication October 14, 2008
Associate Editor: Martin Bishop
Contact: [email protected]
Testing the congruence of trees is a major task in phylogenetic
research. The comparison of different trees allows to assess the
similarity of trees originated from different genes or of trees from
the same data but reconstructed with different methods. Another
important application of phylogenetic comparisons are cospeciation
studies. Cospeciation refers to the simultaneous speciation of
ecologically associated lineages, e.g. hosts and their parasites. Since
cospeciation is not the only reason for congruent host and parasite
trees, the task to assess the congruence is denoted as cophylogenetic
analysis (de Vienne et al., 2007b).
Recently, de Vienne et al. (2007a) suggested a novel topological
test for a cophylogenetic analysis: given two phylogenies with the
same number of taxa and a one-to-one mapping of the taxa, the host–
parasite mapping, they test whether the two trees are more congruent
than expected by chance. Their test is based on the maximum
agreement subtree (MAST) between two trees. The MAST is the
largest possible subtree identical in both input phylogenies (Finden
and Gordon, 1985). Thereby a subtree is obtained by pruning taxa
from the phylogenies and collapsing inner nodes of degree two and
its size is the number of taxa in the subtree. Thus, the MAST size
refers to the maximal number of taxa retained in the subtree of both
input phylogenies. The larger the MAST between two trees, the more
congruent these trees are.
First, we outline the method of de Vienne et al. (2007a), which
we will denote as the MAST test. The null distribution of the MAST
size is obtained by generating pairs of trees, where trees are assumed
to be equally likely, and evaluating their MAST size. From this null
distribution, they estimated functions for the mean and SD of the
MAST size depending on the number of taxa n. Thereby, de Vienne
et al. (2007a) confirmed the results of Bryant et al. (2003) that
the mean MAST size grows proportionally to the square root of n
(Fig. 1). For example, for n = 50, the mean MAST size is 10, thus on
average 40 taxa (4/5 of all taxa) are pruned from random trees. The
test statistic for two trees of n taxa is the MAST size centered by the
mean for n and rescaled by the SD for n. The resulting standardized
distribution for 7 ≤ n ≤ 50 is then used to fit an analytical curve to
the left tail of the distribution. With this curve, P-values up to 0.05
can be estimated.
∗ To
whom correspondence should be addressed.
Using the centered and rescaled MAST size as a test statistic
causes two inherent problems. First only the taxa in the MAST
contribute to the significance while the topological information of
the others is ignored. In a biological framework, however, this may
not use all the information present in the topologies. Second, the
mean MAST size increases only with the square root of the number
of taxa n. Hence, the mean relative MAST size (the MAST size
divided by n) approximates zero with increasing n. That means, for
two large trees, on average a high proportion of taxa is pruned to
obtain the MAST.
Our main concern when appplying the MAST test is, however,
a statistical one. When applying a statistical test at a significance
level α (e.g. 5%), the assumption is that not more than 5% of the
tests are rejected under the null hypothesis. The actual fraction of
significant results for a predefined α is known as the size of a
test. For discrete tests, the size will rarely match α exactly since
the sum of probabilites for the extreme cases grows in discrete
steps. This behavior is well-known for discrete distributions like
the binomial distribution. The binomial distribution describes the
number of successes in a sequence of n independent experiments,
each of which yields success with probability P. For example, for
n = 7 and P = 0.5, seven successes occur with a probability of
0.78%, whereas six or seven successes occur with a probability of
6.25%. In such a case, one has to choose the test statistic to be either
conservative, i.e. that the size is always smaller than α, or liberal,
i.e. there are more significant results than the predefined significance
level. To be sure that a significant result or a more extreme case
occurs under the null hypothesis not more often than the significance
level, the test statistic must be conservative. For the example with
the binomial distribution for n = 7, the conservative critical value
for a significance level of at most 5% is 7, and thus the size is
only 0.78%. The significance could also be computed in analogy
to the MAST test. Then mean and SD are computed for 7 ≤ n ≤ 50
and significance is assigned to the values in the 5%-quantile of the
distribution of the centered and rescaled values combined for all n.
Then the critical value for n = 7 is 6, thus the resulting size of 6.25%
is too liberal.
To determine the size of the MAST test, we first compute the
critical value of the MAST size for α = 0.05 (Fig. 1). For example,
for n = 50, the critical value is 13, thus when pruning 37 (≈ 3/4) or
less of the taxa, the trees are considered to be congruent. This high
proportion is counterintuitive, but results from the vast number of
© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]
147
Size of the test
4
Mean MAST size
Critical value
10
20
30
40
50
60
70
80
90
Original size
Conservative size
0.00 0.02 0.04 0.06 0.08 0.10 0.12
12
10
8
6
MAST size
14
16
A.Kupczok and A.von Haeseler
10
100
20
30
40
50
60
70
80
90
100
Number of taxa
Number of taxa
148
0.15
0.10
0.05
Fraction of significant results
trees for large n and the fact already mentioned that the mean MAST
size grows slower than n.
The size of the MAST test cannot be computed analytically as for
the binomial distribution. We determine it by simulating 10 000 pairs
of random trees, where all trees are equally likely. For the resulting
pairs of trees, the MAST size is constructed with the algorithm of
Goddard et al. (1994) as implemented in PAUP* (Swofford, 2002).
The average size of the test over all n and over 7 ≤ n ≤ 50 is 0.043
and 0.048, respectivly and thus below the significance level of 0.05.
However, in Figure 2, we see that the size exceeds the significance
level for some n. In these cases, the estimate of the conservative size
is much smaller but the critical value is only one taxon less than the
critical value obtained from the MAST test (Fig. 1). For example, for
n = 7, the critical value of the MAST test is 5, but the corresponding
size is 12.7%, whereas 6 or 7 has only been observed in 0.8% of
the cases. Thus, 0.8% is the correct size of the conservative test
with a critical value of 6. We observe that the original size exceeds
the significance level when the number of taxa approaches the right
boundary of an area delimited by vertical lines. Within these areas,
all n have the same critical value (cf. Fig. 1). For instance, with
40 ≤ n ≤ 47 the critical value is 12, thus a maximum of 28 (n = 40)
to 35 (n = 47) taxa can be pruned while the pairs of trees are still
significant. While the critical value remains constant between two
lines, the number of taxa allowed to be pruned increases, thus more
pairs show significance.
To evaluate the behavior of the MAST test in a more realistic
setting, we used real trees but random taxa mappings. To this end,
we downloaded all 5023 trees from TreeBASE (http://www.treebase.
org/treebase/data/Tree.txt, April 2008). Thereof we investigated the
4610 trees comprising between 7 and 100 taxa. Unfortunately the
number of available trees varies strongly for the numbers of taxa.
Especially for each n ≥ 94 there are less than 10 trees available, but
for n ≤ 50 there are always more than 30 trees present. Two different
trees with the same number of taxa are drawn randomly. If a tree
contains multifurcations, these are randomly resolved each time the
tree is drawn, where each resolution is equally likely. The resulting
bifurcating trees are relabeled randomly with the same taxa set. This
corresponds to random host–parasite mappings.
In Figure 3, the fraction of significant results is shown for each
number of taxa. On average the MAST test is slightly too liberal with
Fig. 2. Size of the test: evaluation of the test statistic for 10 000 random pairs
of trees. ‘Original size’ is obtained by using the critical values of the MAST
test (Fig. 1). ‘Conservative size’ is obtained by using the largest critical value
which yields a size of 5% or smaller. The vertical lines are the same as in
Figure 1. The horizontal line displays the significance level at 0.05.
0.00
Fig. 1. Mean MAST size and critical value of the MAST size for α = 0.05:
the mean is given by equation (1) in de Vienne et al. (2007a) and the critical
value is computed with equation (6) in de Vienne et al. (2007a). The vertical
lines indicate the steps in the critical value.
10
20
30
40
50
60
70
80
90
100
Number of taxa
Fig. 3. Simulation results: results are considered significant if P < 0.05. A
total of 1000 repetitions for each number of taxa. The vertical lines are the
same as in Figure 1. The horizontal line displays the significance level at 0.05.
a size of 0.064 and 0.058 for all n and for 7 ≤ n ≤ 50, respectively.
This may be due to the fact that the assumption of the null hypothesis
of equally likely trees is not true (see e.g. Blum and François, 2006,
for a study about tree shape distributions on a similar data set).
Note, that we weakened this fact by resolving the multifurcations
in the trees randomly. We observe that the size of the test depends
strongly on the number of taxa, as already observed for random trees
(Fig. 2).
We have shown that a number of pitfalls exist when using the
MAST test introduced by de Vienne et al. (2007a) to test whether
two phylogenetic trees are congruent. First, by using the MAST
size as the basis of the test statistic, the positions of the taxa pruned
from the trees are completely ignored and any positional information
e.g. whether they were in the same subtrees is discarded. When
applying the test in a biological framework the taxa in the maximum
agreement subtree should be regarded not only their number. Second,
a high number of taxa can be pruned from the phylogenies while
the pair remains significant. Our third and major concern is that tree
topologies are discrete as is the MAST size of two trees. One pitfall
of the discreteness of the MAST size is the strongly varying size of
the test for different numbers of taxa. The MAST test is too liberal
for quite some n. Therefore, we recommend to adjust the critical
Testing the congruence of trees
value such that the test is conservative for all n. Finally, the test
is more liberal using random phylogenies from TreeBASE which
indicates that the assumption of equally likely trees may not be an
appropriate null model.
ACKNOWLEDGEMENTS
The authors would like to thank Heiko Schmidt for helpful
comments on the article and the three reviewers for valuable
feedback.
Funding: Wiener Wissenschafts-, Forschungs- and Technologiefonds (WWTF).
Conflict of Interest: none declared.
REFERENCES
Blum,M.G.B. and François,O. (2006) Which random processes describe the tree of life?
A large-scale study of phylogenetic tree imbalance. Syst. Biol., 55, 685–691.
Bryant,D. et al. (2003) The size of a maximum agreement subtree for random binary
trees. Dimacs Series in discrete mathematics and theoretical computer science, 61,
55–65.
de Vienne,D.M. et al. (2007a) A congruence index for testing topological similarity
between trees. Bioinformatics, 23, 3119–3124.
de Vienne,D.M. et al. (2007b) When can host shifts produce congruent host and parasite
phylogenies? A simulation approach. J. Evol. Biol., 20, 1428–1438.
Finden,C.R. and Gordon,A.D. (1985) Obtaining common pruned trees. J. Classif., 2,
255–276.
Goddard,W. et al. (1994) The agreement metric for labeled binary trees. Math. Biosci.,
123, 215–226.
Swofford,D.L. (2002) PAUP*: Phylogenetic Analysis Using Parsimony (*and Other
Methods). Version 4. Sinauer Associates, Sunderland, MA.
149