Trost, Donald C.; (1988).Comparison of the Probabilities of Misclassification for the Estimated Linear, Quadratic, and Unbiased-Density Discriminant Functions Using Asymptotic Expansions."

COMPARISON OF THE PROBABILITIES OF MISCLASSIFICATION FOR
THE ESTIMATED LINEAR, QUADRATIC, AND UNBIASED-DENSITY
DISCRIMINANT FUNCTIONS USING ASYMPTOTIC EXPANSIONS
by
Donald C. Trost
A Dissertation submitted to the faculty of The University
of North Carolina at Chapel Hill in partial fulfillmen~ of the
requirements for the degree of Doctor of Philosophy in the
Department of Biostatistics
Chapel Hill
1988
iser
.f..
u ..
Reader
e.
J~
Z /----
:;;JEJ~
~C~
Reader
ABSTRACT
DONALD C. TROST.
Comparison of the Probabilities of Misclassification
for the Estimated Linear, Quadratic, and Unbiased-density Discrimant
Functions Using Asymptotic Expansions (Under the direction of PRANAB
KUMAR SEN.)
Standard discriminant functions tend to perform well only with large
training samples.
This study developed a theoretical foundation for
deriving the asymptotic expansions of the probabilities of misclassification for a broad class of discriminant functions. Three of these
were evaluated:
the linear discriminant function (LDF), the quadratic
discriminant function (QDF), and the unbiased-density discriminant
function (UDF).
The UDF is based on the UMVU estimator of the multi-
variate normal density function and is asymptotically equivalent to the
QDF.
A class of polynomials with multiple matrix arguments as well as
concepts of discrimination potential, efficiency, and deficiency were
defined.
Under the assumption of equal covariances in multivariate
normal populations the deficiencies of the QDF and UDF relative to the
LDF indicated that both functions are inferior to the LDF and that the
UDF is always inferior to the QDF.
An evaluation of efficiencies with
smaller sample sizes showed that when the distance between populations
is small, the relative efficiencies of the QDF and UDF are quite poor
iv
compared to the LDF with the UDF always slightly worse.
Under the
assumption of unequal covariances, the UDF continued to perform poorly
while the QDF outperformed the LDF only for large sample sizes and large
differences in covariances.
An application to medical diagnosis using
clinical laboratory test was studied.
The main conclusions from the
research were that the UDF is inferior to the QDF and that optimal
density estimation does not improve the discriminant function under the
conditions studied.
An approach that deserves further research is the
optimal estimation of the partition between population. The field of
discriminant analysis needs a firmer theoretical basis instead of the
current empirical approaches.
v
ACKNOWLEDGEMENTS
Firstly, I would like to express my sincere gratitude to my
adviser, Professor P.K. Sen, for giving me the opportunity to work under
his guidance and for his endless patience.
Professor Sen has guided me
into theoretical depths that are far beyond what I thought I could
achieve.
This theoretical background will provide me with important
tools to solve the many practical problems with which I am faced.
Along
with Professor Sen, I would like to give my special thanks to Professor
N.L. Johnson for his guidance and insightful review of this work.
I
would also like to thank Professors J.E. Grizzle, C.E. Davis, B. Switzer,
and H.A. Tyroler for their time and effort in serving on my committee.
My two sons, Jason and Brian, have spent their entire lives wondering when I was going to finish this work so that I could play with
them.
I just hope that I can make it up to them from this point onward
and that they will understand when they get older.
My wife, Roberta, will
never understand, but I will thank her for tolerating any inconvenience
that my have occurred as a result of this work.
To my parents, Donald
and Dorothy Trost, I would like to express my deepest gratitude for
their guidance and many sacrifices that permitted me to attend college
and for their patience while wondering if I were ever going to get
a job.
During my twenty-six years of formal education, I have had many
teachers that influenced my life.
out one, Mrs. Pauline Miller.
Of those, I would like to single
Mrs. Miller was my teacher in at least
five different courses during my junior and senior high school years.
The courses that she taught were generally my least favorite such as
English.
V1
In hindsight, she was the single greatest influence on my decision to
attend college, and the learning tools that she has provided me have been
critical in my education and my work.
Finally, I would like to thank Diana West and Theresa Van Reymersdal
for their excellent efforts in typing and proofreading this dissertation.
The financial support for this research was provided in part by the
Department of Biostatistics, by the National Heart, Lung, and Blood
Institute training grant T32-HL0700S-07, by the University of Florida
Department of Pathology, and by E.R. Squibb and Sons.
D. C. T.
TABLE OF CONTENTS
Page
ACKNOWLEDGEMENTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
v
Chapter
1.
INTRODUCTION AND REVIEW OF THE LITERATURE
1.1.
1.2.
1.3.
1.4.
1.5.
1.6.
1. 7.
2.
Introduction
Notation
Some Approaches to Discriminant Analysis
Unequal Covariance Matrices
Procedures for Evaluating the
Discriminant Function
Density Estimation
Research Plan
.
.
.
.
1
3
3
5
.
.
.
10
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
Lemma 2.1.1.......................................
Lemma 2. 1 . 2. . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . .
18
19
Characteristic Functions
23
Lemma 2.2. 1
"
LeDllla 2.2.2.......................................
LeDIIla 2. 2 . 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lemma 2.2.4.......................................
Theorem 2.2.1.....................................
Theorem 2.2.2.....................................
23
24
25
27
29
32
Asymptotic Expansions
35
Theorem 2.3. 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Corollary 2.3. 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Theorem 2.3.2.....................................
Corollary 2.3.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lemma 2. 3. 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Theorem 2.3.3.....................................
Corollary 2.3.3...................................
35
36
37
39
42
43
45
7
12
ASYMPTOTIC EXPANSIONS OF THE CHARACTERISTIC FUNCTIONS
2. 1 .
2.2.
2.3.
3.
PROBABILITY DISTRIBUTIONS
Introduction
Inverse Fourier Transforms
Properties of the Limiting Distributions
Accuracy of the Asymptotic Approximations
.
.
.
.
48
49
56
Introduction.......................................
Measures of Efficiency
Measures of Deficiency
Performance under Equal Covariances
Performance under Unequal Covariances
66
66
68
69
76
3.1.
3.2.
3.3.
3.4.
4.
MEASURES OF DISCRIMINANT FUNCTION PERFORMANCE
4.1
4.2
4.3
4.4
4.5
5.
AN APPLICATION TO MEDICAL DIAGNOSIS
5. 1 Introduction
5 . 2 Methods
5.3 Results
5.4 Discussion
6.
58
.
.
.
.
82
83
84
90
.
.
91
91
.
.
.
93
94
95
e
CONCLUSIONS AND SUGGESTIONS FOR FUTURE RESEARCH
6. 1
6.2
6.3
6.4
6.5
APPENDIX A.
A.l
A.2
A.3
A.4
A.S
APPENDIX B.
Summary of Findings
The Search for Better Methods
A Unified Approach to the Study of
Discriminant Functions
Applications to Medicine
Conclusions
MATHEMATICAL ANALYSIS
Introduction........ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 96
Asymptotic Expansions and Approximations
96
Taylor Series Expansions and Approximations
98
Properties of Power Series
99
Hypergeometric Functions .....................•..... 100
INTERMEDIATE RESULTS OF CHAPTER 2
LeDBlla
LelllDa
LelllDa
Lemma
LeJJUDa
Lemma
LeJJUDa
B. 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
B.2
B. 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
B. 4. . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . ..
B. S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
B. 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
B. 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
101
105
11 7
118
119
121
123
Lemma B.8
123
Lemma B. 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 125
Lemma B.I0
127
APPENDIX C.
SELECTED PROOFS FROM CHAPTER 2
Proof
Proof
Proof
Proof
APPENDIX D.
Theorem 2.3.1..................................
Theorem 2.3.2
Lemma 2.3.1....................................
Theorem 2.3.3 ........•.........................
129
134
146
149
NUMERICAL PROCEDURES FOR CHAPTER 3
Lemma
Lemma
Lemma
LelllJla
Lemma
Lemma
APPENDIX E.
of
of
of
of
D. 1 . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . .•
D.2 •..........................•...............•.•.
D.3 ..................••...........................
D. 4. . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
D.S ..•..•.....••............••....................
D. 6 ..........•....................................
152
153
154
156
157
158
INTERMEDIATE RESULTS FOR CHAPTER 3
Lemma
Lemma
Lemma
Lemma
Lemma
E. 1 . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . ..
E.2 .........•.......................•.........•...
E. 3. . . • • . . . . . . . . . . . • . . . . . . • . . . . . . . . . . . . . . . . . . . . . ..
E.4 .•...................•...........••..........•.
E. 5. . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
160
164
165
169
171
BIBLIOGRAPHY. . . . . . . • . . . . . . . . . . . . . . . . . . . . . . • . . . . . • . . . . . . . . . . . . . . . .. 172
CHAPTER 1
INTRODUCTION AND REVIEW OF THE LITERATURE
1.1
Introduction
Discriminant analysis is a collection of techniques which attempt
to classify an unknown observation, or observations, into one of several
distinct populations.
The concept of using statistical procedures for
the purpose of classifying unknowns is intuitively appealing.
In medi-
cine, applications exist in such areas as early diagnosis, prediction-of
disease or toxicity, and identification of populations at risk.
There
are numerous applications in engineering, education, psychology, economics, criminology, and many other disciplines.
Unfortunately, these
. classification techniques have not been entirely successful.
A common assumption used to develop the theory of discriminant
functions is that the underlying populations have multivariate normal
(Gaussian) distributions with equal covariance matrices.
One problem is
that many of the data are categorical in nature and could never be
multivariate normal.
Although specific techniques have been developed
which deal exclusively with categorical data, most of the problems in
practice involve a mixture of continuous and discrete data.
Other problems which have not been solved or addressed fully have
contributed to the lack of success.
Many techniques in current use
employ functions of the covariance matrices and assume that the popu-
2
lation covariance matrices are equal.
This assumption may not be
reasonable in many situations, even when normality holds.
dures are inconvenient to compute in practice.
Some proce-
For example, the entire
set of data used to obtain the classification rule may need to be retained to classify future observations.
Incomplete observations occur
frequently but little has been developed to deal with this issue.
This dissertation addresses the problem of unequal covariance
matrices when the data arise from Gaussian distributions with unknown
population parameters.
The recommended discriminant function for this
situation is known as the estimated quadratic discriminant function
(QDF).
Its use is based on its asymptotic optimality.
Likewise the
estimated linear discriminant function (LDF) is recommended for when the
covariances are equal.
Since neither the QDF nor LDF has acceptable
performance using small samples from populations with unequal covariances, Anderson and Bahadur (1962), a new discriminant function was
proposed that is asymptotically optimal and is derived from the UMVU
estimator of the density function.
This function was called the
estimated unbiased-density discriminant function (UDF).
The aim in
choosing the UDF was to achieve better performance with smaller sample
sizes and as a candidate for future studies that investigate robustness
properties under nonnormality assumptions.
Because the distributions
are extremely complicated, asymptotic expansions were used to compare
the probabilities of misclassification of these three discriminant
functions.
methods.
The accuracy of the expansions was checked by Monte Carlo
The development of the concepts of discrimination potential,
efficiency, and deficiency permitted a thorough evaluation of the
statistical properties of the discriminant functions in both
3
large-sample and moderate-sample situations.
The techniques presented
herein have general application to the statistical theory. and application of discriminant functions.
1.2
Notation
Some general notation is presented here to facilitate the presenta-
tion.
Lower case letters correspond to scalar quantities while upper
case letters correspond to matrices.
Greek letters are used.
nate sets.
To indicate population parameters,
Greek letters are occasionally used to desig-
Random variables are implied by the context and not expli-
citly indicated.
Expectation, variance, and probability are represented
by E[o], V[o], and P[o], respectively.
The transpose of a matrix A is
given by A', the determinant of A by IAI, the trace of A by tr(A), and
exp[tr(A)] byetr(A).
The letter i indicates the complex number ~, r
is the dimension of the multivariate sample space, m is the number of
independent columns in a random matrix, n is the sample size, and n.
J
represents the proportion of the training sample coming from the D.
J
population.
To simplify the notation in a compound algebraic expres-
sion, multiplication takes precedence over division.
represents O(n- p ), which is defined in Appendix A.
The notation 0
p
Other notation is
developed as needed.
1.3
Some Approaches to Discriminant Analysis
Fisher (1936) was the first to define the classification problem
for two normal populations with covariance matrices that are known and
equal and to propose an optimum solution using a linear combination of
variables.
His solution,
4
A = L- 1 (M 1 -M 2 )
X
r 1 rXr r X1
(1.3.1)
maximizes the separation between two populations, TIl and TI 2 , when
projected into a single dimension assuming that
x1.J
..
(1.3.2)
for i
=1
~
N (M.,L), j
r
1.
= 1,2, ... ,n.1.
or 2 (corresponding to TIl or TI 2 ).
(1.3.3)
The classification rule,
R1
= {Z
Z' A > c} and
R2
= {Z
Z'A
~
a
c}
is to choose TIl if R1 is true or TI 2 if R2 is true where Z, an rX1
vector, is an unknown observation from TIl or TI 2 and
(1.3.4)
He suggested substituting the estimators
X.1.
(1.3.5)
= n- 1
n.
L1. X.. and
j=l 1.J
(1.3.6)
.for M and L, respectively, when the population parameters are unknown.
i
Within a few years, Welch (1939) derived the Bayes rule and minimax
Bayes rule for known distributions.
(1.3.7)
R 1 = {Z
R2
= {Z
The Bayes rule is
f 1 (Z)/f 2 (Z) > c} and
f 1 (Z)/f2(Z)
~
c}
where f 1 and F 2 stand for the density under TIl and TI 2
IT. is the proportion belonging to
1.
TI. and d .. is the cost of misclas1.
1.J
sifying into TI. given Z is really from TI. (i
1.
J
* j).
minimizes the total cost of misclassification.
obtained when c satisfies
The Bayes rule
The minimax rule is
5
(1.3.8)
where R
d 12
= (R 1 ,R 2 )
= d 21 ,
is the classification rule.
When ITl
= IT2 = ~
and
the Bayes rule is equivalent to the likelihood ratio rule.
Anderson (1951, 1958) and Blackwell and Girshick (1954) have made
contributions to this solution.
When a 0-1 loss function is used, the
minimax Bayes rule is equivalent to the minimum-Mahalanobis-distance
rule, if the underlying distribution are multinormal,
(1.3.9)
where d(Z,M.)
~
= (Z-M.)'l-l(Z-M.).
~
~
Other rules have been developed, many of which are nonparametric.
A discussion of these can be found elsewhere (Das Gupta, 1973).
These
rules are based on density estimation, nearest neighbors, distance
between empirical distribution functions, tolerance regions, rank tests,
and empirical Bayes estimation among others.
When the distributions or parameters are unknown, estimates are
substituted into the above rules.
These are called plug-in rules.
(1944) suggested using maximum likelihood estimators.
Wald
Hoel and Peterson
(1949) have shown that under mild assumptions, these estimators are
asymptotically optimal.
Others who have contributed to this area
include Fix and Hodges (1951), Das Gupta (1964), Van Ryzin (1966), Bunke
(1967), and Glick (1972).
1.4
Unequal Covariance Matrices.
Most of the work related to populations with unequal covariance
matrices has been done under the assumption of multivariate normality.
The Bayes rule in Equation 1.3.7 has been generalized. to populations
6
with unequal covariance matrices.
Smith (1947) was the first to study
the problem of unequal covariances in the multivariate normal case.
The
rule based on the likelihood ratio is a quadratic function,
(1.4.1) 10g(f1(Z)/f 2 (Z))
= ~10g(I~21/1~11)
-
~(Z-M1)'~11(Z-M1)
+ ~(Z-M2)'~21(Z-M2)'
Using information theory, Kullback (1952) suggested maximizing the
divergence between two multivariate normal populations with unequal
covariance matrices.
Han (1968, 1969, 1970, 1974) has studied situa-
tions for patterned covariance matrices such as the intraclass correlation model, proportional matrices, and circular matrices.
The optimal
linear function has been developed by Clunies-Ross and Riffenburgh
(1960) and by Anderson and Bahadur (1962).
Asymptotic expansions of the
distributions for the linear and special cases of the quadratic discriminant functions have been obtained by Okamoto (1961, 1963).
Some work
using nonnormal distributions has been done by Cooper (1963, 1965).
He
found that quadratic functions are optimal for multivariate generalizations of the Pearson Type II and Type VII distributions.
For the
equal-means case, results have been obtained by Okamoto (1961) and
Bartlett and Please (1963).
Simulations have been conducted to study the performance of discriminant functions in the multivariate normal situation.
Gilbert (1969)
used proportional covariance matrices and found that Fisher's linear
discriminant function performs well based on optimum error rates when
the difference between means is large and when the covariance matrices
are nearly equal.
Marks and Dunn (1974) compared the LDF and the "best"
linear discriminant function (BLDF) to the optimal quadratic discriminant function using probabilities of misclassification. For large sample
4It
7
sizes, they found that the QDF performs better than the LDF for large
covariance differences but does only slightly better for small differences.
The QDF does worse than the LDL for small sample sizes and small
covariance differences.
When the covariance differences are large and
the sample sizes small, the BLDF is better than the LDF but the QDF is
usually better than the LDF.
The above findings suggest that estimated discriminant functions
behave differently depending on the sample sizes and population parameters.
This demonstrates a need for further study of the causes of
these variations.
Comparisons with the optimal discriminant functions
give a good measure of performance of the estimated functions.
1.5
Procedures for Evaluating the Discriminant Function
Lachenbruch (1975) described three characteristics which can be
used to evaluate the performance of a discriminant function.
These
include testing for between-group differences, testing for a sufficient
subset of variables, and estimation of error probabilities.
Only the
latter will be described here.
Five methods are described by Giri (1977) for estimating error
probabilities, or probabilities of misclassification.
Let nl be the
probability of classifying an unknown observation into population
given that the true classification is population
bility of classifying into n1 given n2 .
and
and
n2
be the proba-
The variables v and S will be
defined as follows to simplify the discussion:
(1.5.1)
n1
n2
8
(1.5.2)
where X. is defined by (1.3.5), X.. is the jth observation from
1.
1.J
Z is an unknown observation.
n.,
1.
and
Estimators of ITl and IT2 are Pl and P2,
respectively.
Method 1 (apparent error) uses the discriminant function constructed from the sample of known observations to classify each observation from the same sample.
The proportion of misclassified observa-
tions is an estimate of the misclassification error.
This method is
crude and underestimates the error.
Method 2 (estimated actual error) uses the Mahalanobis distance, 0,
and misclassification errors are based on the normal distribution
N (M. ,L) where
r
1.
(1.5.3)
o = (Ml-M2)'L-l(Ml-M2),
(1.5.4)
ITl = <1>( -~) ,
(1.5.5)
IT2
= l-<I>(~),
and <I> is the standard normal cumulative distribution function.
The
estimators Pl and P2 are obtained by estimating 0 and substituting.
When the estimator
(1.5.6)
is used, the error is underestimated because d is biased.
An unbiased
estimator of d is
(1.5.7)
.'.
The misclassification errors are estimated by substituting d" for 6 in
(1.5.4) and (1.5.5).
9
For Method 3 (jackknifing), v is calculated with one observation
omitted and the omitted observation is then classified using that
particular v.
The misclassification error is the proportion of obser-
vations misclassified after each observation has been omitted once.
Jackknifing is a technique used to reduce bias.
This technique is
thought to be insensitive to departures from normality but this is
untested.
Method 4 (leaving-one-out) is similar to Method 3.
The estimated
misclassification probabilities are given by
(1.5.8)
PI
(1.5.9)
P2
= 4>(-Ul/ Sl)
= 4>(U2/ S2)
where
u.
(1.5.10)
1
= n-:- 1
1
n.
l1
v(X .. )
j=l
1J
and
n.
s~ = (n.-1)-1 l1 [veX .. )-u.]2.
1
1
j=l
1J
1
(1.5.11)
The function v(X .. ) is v with observation X.. omitted as in Method 3.
1J
1J
Giri suggests that substituting the pooled estimate of the variance,
Method 5 uses the asymptotic expansion of the distribution of v.
When the dimension is small, this method requires only a moderate sample
size; but, when the dimension is large, a very large sample is needed to
obtain a good approximation for an approximation error of the order n- 1 .
Okamoto has derived approximations for nl and n2 with an approximation
error of order n- 3 .
Lachenbruch and Mickey (1968) studied these methods.
2 were found to give relatively poor results.
Methods 1 and
If approximate normality
10
holds, Methods 4 and 5 do better but Method 5 requires large sample
sizes.
Methods 3 and 4 appear to be useful for all sample sizes but do
better with larger samples.
1.6
Density Estimation
The estimation of densities is useful when the classification
procedure is the likelihood ratio rule, i.e.,
f 1 (2)/f 2 (2) > I} and
(1.6.1)
f 1 (2)/f2(2) ~ I}.
The usual methods of estimation such as maximum likelihood, method of
moments, and uniformly minimum variance unbiased can be used to obtain
estimators of the density function when the distribution is known.
These techniques are described in many textbooks of statistics.
Distribution-free methods have been developed and are described by
Wegman (1972) and by Das Gupta (1973).
The maximum likelihood estimators (MLE) and the uniformly minimum
variance unbiased estimators (UMVUE) are used most frequently because of
their properties.
MLE's are usually easier to compute than UMVUE's.
UMVUE's are always unbiased while MLE's are usually biased.
A function
of MLE's is a MLE while a function of UMVUE's is generally not an UMVUE.
UMVUE's are always consistent and MLE's are usually so.
Minimum vari-
ance is attained for all sample sizes with UMVUE's but is attained only
asymptotically for MLE's.
The MLE of the density f(2,8) is f(2,8) where 8 is a vector of
MLE's of the parameters,
distribution, the MLE is
a,
of the distribution.
For the Gaussian
11
1
~
~
feZ,S) = (2n)-~rILletr[-~L-l(Z-X)(Z-X)']
(1.6.2)
where X is the sample mean and
n
L = n- 1 L (X.-X)(X.-X)'.
(1.6.3)
i=l
1
1
The logarithm of the likelihood ratio is essentially the QDF in this
situation.
The UMVUE of the density could be used instead of the HLE to eliminate bias and to improve efficiency with small samples.
Although these
estimators do not exist in many cases or may be extremely complicated to
use, Ghurye and Olkin (1969) have derived UMVUE's for some common multivariate densities.
Their method applies to any distribution which
admits sufficient statistics.
In particular, the UMVUE of the multi-
variate normal density with unknown mean and covariance matrix is shown
to be
(1.6.4)
feZ) = 2~rr[~(n_1)][n/(n_1)]~rISI-~(n-r-2)
'{g[S - n(n-1)-1(Z-X)(Z-X),]}~(n-r-3)/r[~(n-r-1)]
where
n
L (X.-X)(X.-X)',
i=l 1
1
(1.6.5)
S =
(1.6.6)
n
X = n- 1 L X.,
i=l 1
(1.6.7)
g(A) = IA I, i f A is positive definite
= 0
and rex) is defined by A.2.5.
otherwise
The discriminant function based on the
logarithm of the likelihood ratio of the density estimators simplifies
to
12
(1.6.8)
log[f 1 (Z)/f 2 (Z)]
= log{r[~(n1-1)]r[~(n2-r-l)]}
-
log{r[~(n2-1)]r[~(n1-r-l)]}
+
~r·log[n1(n2-1)/n2(n1-1)]
+
~(n1-r-3)log [
+
~log(IS21/IS11)
- ) ' S1 1 (Z-X
- )]
1 - n1(n1-1)- 1 (Z-X
1
1
- ~(n2-r-3)log[1 - n2(n2-1)-1(Z-X2)'S21(Z-X2)]
when g is positive definite in both the numerator and the denominator of
the ratio.
1.7
Research Plan
Three discriminant functions were compared on the basis of the
asymptotic expansions of the misclassification probabilities under
conditions of heteroscedasticity and multivariate normality with unknown
population parameters.
These functions are the LDF (1.3.3), the QDF
(1.4.1), and the new function (UDF) defined above (1.6.9).
The purpose
of this research was to build a foundation for the development of better
approaches to discriminant analysis and to evaluate a new discriminant
function (UDF) in comparison to the standard functions LDF and QDF.
The decision to use asymptotic expansions with respect to n- 1 was
based on three reasons:
closed-form characteristic functions were not
readily available, the first order term was needed to compute the deficiency of asymptotically equivalent discriminant functions, and the
second order term was predicted to give additional information about the
small sample properties. Extensive use of certain areas of mathematical
theory was necessary to develop the asymptotic expansions.
These areas
included zonal polynomials, hypergeometric functions, Laguerre polynomials, Laplace and Fourier transforms.
Since the first three are
13
somewhat uncommon in the statistical literature, some background is
provided in later chapters.
discussed.
The latter two are common and are not
Some relevant elements of mathematical analysis are pre-
sented in Appendix A.
Chapter 2 describes the development of the asymptotic expansions of
the characteristic functions for the LDF, QDF, and UDF to an approximation error of order n- 3
.
A unified approach employing generalizations
of zonal and Laguerre polynomials was used. A polynomial of multiple
matrix argument was developed to facilitate the unification.
It is
expected that this approach will apply to any competing discriminant
function that is a function of the sufficient statistics.
Some tech-
nical details and lengthy proofs from this chapter are located in
Appendices Band C.
In Chapter 3 the inversion of the characteristic functions was used
to obtain representations of the asymptotic expansions for the probabilities of misclassification for these discriminant functions.
An assess-
ment of the approximation errors of the asymptotic expansions of order
n- 1 was made using Monte Carlo methods.
Technical details required for
the inversions is presented in Appendices D and E.
Comparisons of the discriminant functions are presented in Chapter 4.
Measures of efficiency and deficiency were defined and used for
these comparisons.
Under the assumption of equal covariances, defi-
ciencies were computed for the QDF and UDF relative to the LDF.
To
assess the small sample properties, sample size estimates to attain a
given level of efficiency were calculated.
For unequal covariances,
evaluations consisted of the comparison of the asymptotic relative
14
efficiency of the LDF and QDF and the computation of the deficiencies of
the LDF and UDF relative to the QDF.
Chapter 5 gives an application of discriminant functions in the
area of medical diagnosis using clinical laboratory data.
The dis-
criminant functions are compared using a full set of clinical chemistry
tests as well as a subset of tests.
Directions for future research are discussed in Chapter 6.
Many
aspects of discriminant functions are still unknown because the distribution theory is very complicated.
Several alternative discriminant
functions are suggested for further evaluation.
CHAPTER 2
ASYMPTOTIC EXPANSIONS OF THE CHARACTERISTIC FUNCTIONS
2.1
Introduction
The characteristic function is a very useful tool for studying
distribution theory.
$(t)
= E[e itX ]
It is defined as the Fourier transform given by
where X is the random variable of interest.
In this
chapter, the characteristic functions are derived for the three discriminant functions: LDF, QDF, and UDF.
Since these characteristic func-
tions are very complicated, asymptotic expansions were used to obtain
practical results.
The derivation of these expansions required the use
of several mathematical concepts, such as power series and hypergeometric functions, which are described below and in Appendix A.
In
short, the asymptotic expansions used here are unique convergent or
divergent power series in terms of n- 1 , truncated in such a way as to
provide reasonable approximations to the functions from which they are
derived when n is sufficiently large.
The development of zonal polynomials by James (1961, 1964) and
Constantine (1963, 1966) has provided a very powerful tool for studying
exact distributions and their asymptotic expansions when sampling from
multivariate normal distributions.
From a practical point of view, the
zonal polynomial C (2) can be defined as a symmetric homogeneous polyK
nomial of degree k in the eigenvalues of the rXr matrix 2 where
K
is a
16
lexicographic partition of k.
~
k
2
~
•••
~
k
r
=l
and k
k .
j
That is,
K
= (k 1 ,
... , k ) such that k
r
1
A more extensive discussion of these
polynomials is given by Muirhead (1982).
Hypergeometric functions of matrix argument were first defined by
Herz (1955).
Constantine (1963) defined these functions in terms of
zonal polynomials, i.e.,
(a ) "'(a ) C (Z)
1 K
P K K
(b ) ... (b) k!
1 K
q K
(2.1.1)
where
r
(2.1.2)
(a)
K
=. n1 (a
-
\(j-1))k.'
J=
J
Some special cases of the hypergeometric functions are
(2.1.3)
= etr(Z),
oFo(Z)
(2.1.4)
(2.1.5)
where dH is the normalized invariant (Haar) measure on the orthogonal
group Oem) of matrices.
Recently the zonal polynomials have been extended by Davis (1979,
1980) to a class of polynomials with two rXr matrix arguments.
These
polynomials are invariant under the simultaneous transformations
x~
H'XH, Y
~
H'YH, H e OCr).
Chikuse (1980) has generalized these polynomials to any number of matrix
arguments such that they are invariant under the transformation
X.
J
~
H'X.H, H e OCr), j
J
= 1, ... ,s.
17
The following properties hold:
s
(2.1.6)
C
n
fO(r)
K.
j=l
=
(A.H'X.H) dH
J
J
J
l
<j>eK ••. K
s
1
s
(2.1. 7)
n
C
j=l
Kj
(X.) =
J
l
<j>eK
1
"'K
s
where
K
e
(2.1.8)
1
, ... , K
<j>
K ,··· ,K
s = C<j> 1
s
(I, ... ,I)/C<j>(I).
The generalized Laguerre polynomial is another special functionwhich has considerable utility in multivariate distribution theory.
original work was done by Herz (1955).
Since that time several dif-
ferent generalizations have been proposed.
(2.1.9)
L~(X)
= etr(X)
The
These include the following:
fR>O etr(-R)IRIYCK(R)Ay(RX) dR
(Constantine, 1966),
(2.1.10)
L~(X,y)
= etr(X)
fR>O etr(-R)IRlYCK(RY)Ay(RX) dR
(Khatri, 1977),
(2.1.11)
L~(X,y,Z)
= etr(X+ZZ')
ffR>O etr(-R)etr(-UU' - 2iZU')
• IRIYC (Ruyu')A (RX) dRdU
(2.1. 12)
Ly ,6
K,A;cjl
(X Y)
,
= etr(X+Y)
LY
1..",(X,Y)
K,I\.,'I'
= etr(X)
y
6 K A
fR>O fS>O etr(-R-S)IRI lSI C<j>' (R,S)
• A (RX)A (SY) dSdR
y
6
(2.1.13)
(Khatri, 1977),
Y
K
(Davis, 1979),
fR>O etr(-R)IRIYC;,A(R,Y)Ay(RX) dR
(Davis, 1979).
18
These polynomials have at least two important properties.
One is that
they are orthogonal with respect to a specific weight function.
Se-
condly, each can be expressed in terms of the invariant polynomials
described above.
The development of a class of polynomials analogous to the Laguerre
polynomials discussed above was useful to provide a more unified approach to the three discriminant functions in this paper.
Even though
these polynomials may be another form of the Laguerre type, the establishment of that fact was not essential for the theory below and will
not be presented here.
K
(2.1.14)
p. 1
, ... , K
s
This polynomial was defined as
(A;B 1 ,···,B s )
= n-~retr(-AA')
f
etr(-ZZ' + 2AZ')
The next two lemmas provide the tools for alternative representations of
this polynomial.
Lemma 2.1.1
The following relationship holds:
(2.1.15)
=
where dH is the invariant Haar measure over the group of rXr orthogonal
matrices.
19
Proof of Lemma 2.1.1
Equations 1.1 and 2.7 of Davis (1979) state that
(2.1.16)
IO(r) C (A HX H/)···C
K1
IlKs
(A HX H') dH
s
S
=
and
(2.1.17)
=
I
<j)£K ···K
1
s
This implies that
(2.1.18)
I
<j)£K ···K
1
s
K
=
a
<j)
1
, ... , K
s
Comparison of the coefficients gives the result.
Q.E.D.
Lemma 2.1.2
If A is an rXm complex matrix, B , ... ,B
1
s
K ,···,K
and P<j)
1
S
are complex rXr matrices,
(A;B , ... ,B ) is defined by Equation 2.1.14 then
s
1
20
(2.1.19)
= (_l)f
•
2~r(r-1) (2ni)-~r(r+1)
(\m)$rr(~m)etr(AA')
JRe(S»Y
o
etr(S)ISI-\m
K , ... ,K
where
C$
1
s are the invariant polynomials defined by Davis (1980)
and Chikuse (1980).
Proof of Lemma 2.1.2
The orthogonal transformation Z
~
ZH' gives
(2.1.20)
Equation (29) of Constantine (1963) produces
(2.1.21)
• J
Re(T»X o
J etr(T)ITI-\mI1 - T-IAA'I-\m
K , ... ,K
• etr(-ZZ')C$1
!lL
s [BI(1 - T- I AA')-6B (1 - T-IAA')
\
1
• ZZ' , ... ,B (I - T-IAA')-\B'(1 - T-IAA')-\ZZ'] dZdT.
s
s
The application of another orthogonal transformation, Z
~
HZ using Lemma
2.1.1, and the integration with respect to Z using the relationship
given by Khatri (1977),
21
(~m)
(2.1.22)
C (X)C (Y)/C K(I),
KKK
produces
(2.1.23)
= (_1)f2~r(r-1)
(2ni)-~r(r+1)(\m)~rr(~)
. f Re(T»X etr(T)ITI-\m II - T- 1AA'I-\m
o
The result follows from the transformation T
~
S + AA'.
Q.E-.D.
The calculation of a selected group of these polynomials in terms
of traces is given in the appendix (Lemma B.2). Since the present
application required only polynomials of degree 4, the higher order
polynomials were not derived.
In the interest of time and labor, the
polynomials derived from invariant polynomials with greater than one
matrix agreement were restricted to matrices of rank one.
In this study three discriminant functions were examined.
assumed that only two equally likely populations,
n1
and
n2 ,
It was
were
sampled having multivariate normal distributions with unequal means and
unequal covariance matrices.
The following analogs of 1.3.3, 1.4.1, and
1.6.9 were used:
(2.1.24)
DQ(Z)
= \Mr'
10g[(n1-1)/(n2-1)] + \M' 10g(I S21/I S 11)
- \(n1- 1)tr[Si 1 (Z-X)(Z-X)']
+ \(n2- 1)tr[S2 1 (Z-Y)(Z-Y)'],
22
= 10g{r [\(nl-m-1)]r [\(n2-1)]/r [\(n2-m-1)]r [\(nl-1)]}
r
r
r
r
(2.1.26)
+ \r· 10g[nl(n2-1)/n2(nl-1)] + \m. 10g(18 2 1/18 1 1)
- \(nl-r-m-2)10g\1 - nl(nl- 1)-18i 1 (Z-X)(Z-X)'1
+ \(n2-r-m-2)10gl1 - n2(n2- 1)-18 2 1 (Z-Y)(Z-Y)'1
=
+00
=
-OIl
for 16. 16. = 1,
1 2
for 16. = 1 and 16.2 = 0,
1
for 16. = 0 and 16.2 = 1,
1
for 16. = 0 and 16.2 = 0,
1
= undefined
where
X. - N
J
rm (M 1 ,
I
m
8 I 1) (j = 1, ... ,nl),
Yk -N rm (M2, I 8 L2) (k = 1, ... ,n2)'
m
Z - Nrm (Mo, I m 8 LO),
nl
X = nil L X. ,
j=l J
n2
Y = n2 1 I Yk ,
k=l
81
=
82
=
n1
.
j=l
J
I (X.-X)(X.-X)',
J
n2
I
(Yk-Y)(Yk-Y)',
k=l
6. 1
= {(Z,X,8 1)
18 1 - n1(nl-1)-1(Z-X)(Z-X)'1 > OJ,
6.2
= {(Z,Y,8 2 )
182 - n2(n2-1)-1(Z-Y)(Z-Y)'1 > OJ,
23
I~
is the indicator function for the set
~,
8 is the Kronecker product
operator, r is the dimension of the sample space, and m is the number of
independant columns in the random matrix.
In subsequent sections an
integral form of the exact characteristic functions for D and D are
L
Q
presented.
The asymptotic expansions of the characteristic functions
for all three discriminant functions were derived.
The optimal dis-
criminant function for populations from multivariate Gaussian distributions with known means and known unequal covariances
C2.1.27)
DOCZ)
= ~.
logCIL21/IL11) -
~tr[Ll1CZ-Ml)CZ-Ml)']
+ ~tr[L21CZ-M2)CZ-M2)']
was used as a basis for comparison.
2.2
Characteristic Functions
The derivation of the characteristic functions in an integral form
was successful for D and D but not for D '
L
U
Q
With simple transfor-
mations, asymptotic expansions was readily obtained from these integrals.
For D ' an approximation was necessary at an early stage of the
U
derivation; therefore, the development was restricted to m
=1
to keep
the algebraic manipulations to a minimum.
The first lemma provides a relationship upon which all of the
asymptotic expansions of the characteristic functions are based.
It was
used in a paper by Hsu (1940).
Lemma 2.2.1
If A is an rXr real sYmmetric matrix, X is a real rXm matrix, T is
a complex rXm matrix with r > m, and a is a complex scalar, then
24
(2.2.1)
Proof of Lemma 2.2.1
The result follows from completing the square on the right-hand
side and noting that
(2.2.2)
Q.E.D.
The next three lemmas provide the foundation for each discriminant
function derivation, respectively.
Lemma 2.2.2
If S is an rXr random matrix with the Wishart distribution W (v,~),
r
Y is an rXm random matrix with stochastically independent columns each
having the multivariate normal distribution N (M.,y~) (j = l, ... ,m), a
r J
and ~ are complex scalar constants, M is an rXm matrix with the vectors
M. as columns, and Sand Yare independent, then
J
(2.2.3)
E[ISlaetr(~S-lyy')]
= {r r [\(v+m+2a)]/r
(\v)}12~la n-~r
.
r
. f II + RR'I-\(v+m+2a)etr(~YRR' + ~RM'~-\) dR.
Proof of Lemma 2.2.2
From Lemma 2.2.1
(2.2.4)
E[ISlaetr(~S-lyy')]
= E[(2n)-~r
f ISI-\(m+2a)etr(-\STT' + ~YT') dT].
25
The independence of Sand Y gives
(2.2.5)
E[ISlaetr(~Slyy')]
=
IIs>o I [(2n)-~rISI~(m+2a)etr(-~STT'
+ ~ YT')]
· [{r (~v)}-112II-~vISI~(v-r-1)etr(_~I-1S)]
r
· [(2n)-~mrlyII-~etr{-~(yI)-1
• (Y-M)(Y-M)'}] dYdSdT
Following a change of variables, A = I-~SI-~ (I~ symmetric),
B
= (yI)-~,
and R
= I~
with the corresponding jacobian
J(S,Y,T ~ A,B,R)
(2.2.6)
= III~(r+l)IYII~III-~
= y-~rIII~(r+l)
and some rearrangement of terms, the expression becomes
(2.2.7)
E[ISlaetr(~S-lyy')]
= (2n)-~r[r r (~v)]-lIIla
2-~vr
II
A>O
IAI~(v-r+m+2a-1)
• etr[-~(I + RR')A]etr(~yRR' + ~I-~') dAdR
Q.E.D.
Lemma 2.2.3
be any complex rXr matrix, then
26
Proof of Lemma 2.2.3
The proof is straight-forward matrix integration using Equations
2.9 and 6.3 of Herz (1955).
Let U
= 5 1 +5 2
and V
= 52,
then J(5 1 ,52
~
U,V)
=1
and the right-hand
side becomes
dVdU
The transformation W = U-~-~ with J(V ~ W)
use of Equation 2.9 of Herz (1955), i.e.,
(2.2.10)
= IUI~(r+1)
allows the
27
From the Herz generalization of the Euler formula (Eq. 6.3),
(2.2.11)
The result follows after some rearrangement.
Q.E.D.
Lemma 2.2.4
Let S
and
~,
~
~
W (n-l,L), Y
r
~
N (M, I
be complex scalars, and
~
I~
m
8 n-1L), Sand Y be independent, a
be the indicator function for the set
then
(2.2.12)
E[ISlaIS - n(n-l)-lYY'I~I~]
= n-~r[(n-l)/n]~rI2Lla+P{rr[~(n+m-l+2a+2P)]
• r [~(n-l)]}
r
J
II + XX'I-~(n+m-l+2a+2~)
• etr[-~L-1MM' + (n-l)n-l~-~' + 6(n-l)n- 1XX']
t
· {L L (_I)k(n_l)kn-~ P [~(n-l)/2n(~nL-~ + ~X);
FOK
K K
• (I + XX')~]/[-~(n+m-r-2+2a+2p)]
t <
K
+ R
} dX,
t ,n
~(n+m-r-2)
+
Re(a+~)
28
where ~
= {(S,Y)
: Is - n(n-1)-lYY'1 > OJ, P
K
is defined by Equation
2.1.14, the remainder Rt,n is bounded and converges to zero, and b
K
are
defined by Equation 2.2.15.
Proof of Lemma 2.2.4
Let the function
(2.2.13)
~
= EY [fS>o
~
represent the expectation defined above, then
{r [~(n-1)]}-112LI-~(n-1)I IS - n(n-1)-lYY'I~
~
r
• ISI~(n+2a-r-2)etr(_~L-IS) dS]
= Ey[fT>O
{rr[~(n_1)]}-112LI-~(n-1)ITI~(n-r-2+2a+2~)
• II + n(n_1)-lT-IYY'I~(n-r-2+2a)
•
etr[-~n(n-1)-lL-IYY']etr(-~L-IT)
dT].
The term II + n(n_1)-lT- 1YY'lo(n-1)+e can be asymptotically expanded
using Lemma B.4 and then can be modified using Lemma 2.2.1; transforming
the traces to zonal polynomials gives
(2.2.14)
II + nCn_l)-lT-IYY'loCn-1)+e
= C2n) -~r
fiT I ~etrC -~TRR' + .[2OnYR') dR
• {I + C4n)-1[4eC
C1
)CnT-lyy') - 20C
(2
)CnT-Iyy')
+ oC(1 2 )CnT- 1yy')] + C96n 2 )-1[96eC )CnT- 1yy')
C1
+ 48Ce 2-e-o)CC2)CnT- 1yy') + 24C2e 2 + e + 0)
• C(1 2 )CnT-lyy') - 160C3e - 2)CC3)CnT- 1yy')
- 80Ce+1)CC21)CnT- 1yy') + 80C3e + 1)CC1 3 )CnT-lyy')
+ 120 2C(4 )CnT-lyy') - 202C(31)CnT-Iyy')
+ 70 2 C(22 )CnT-lyy') - 202C(212)CnT-lyy')
+ 30 2C(14 )CnT-lyy')]} +
If S
= L-~L-~,
Z
= ~n1-\Y,
then JCT,Y,R ~ S,Z,X)
and X
03'
= L~,
= n-~rILI~Cr+1)
and
29
• ISI~(n+m-r-2+2a+2~)etr[_~(I + XX')S]
• etr[-~n(n-l)-lZZ' + ~I-~Z'- ~nI-1MM']
t
•
where the b
K
etr(~ZX')[
I
k=O
I b C (S-lZZ') + R
] dSdZdX.
t,n
KKK
are the coefficients associated with the above expansion
and R
is a remainder term whose size depends on the choice of t and
t ,n
n.
Equation (10) of Constantine (1966) can be used for the inside
Re[\(n+m-l+2a+2~)]
integral as long as
> k
then choosing t < Re[\(n + m - 1 + 2a +
that the integral exists.
(2.2.16)
~
1
2~)]
+ \(r-l).
Since k
1
~
k,
- \(r-l) will guarantee
The expectation becomes
= 2-\mrn -mr{r r [\(n+m-l+2a+2~)]/rr [\(n-l)]}12Ila+~
. ff II + XX'I-\(n+m-l+2a+2~)etr[_\n(n_l)-lZZ'
1
+ (.JnI-~ + ~X)Z' - \nI- 1MM']
t
. {I
k=O
.
I (-\)~K
C [(I + XX')ZZ']/[-\(n+m-r-2+2a+2~)]
K K
K
+ R
} dZdX.
t,n
The result follows from Equation 2.1.14.
Q.E.D.
Theorem 2.2.1
The characteristic function of the quadratic discriminant function
given by Equation 2.1.24 is
30
(2.2.17)
. II + 2(n2_1)-lB'ZZ'BI-\(n2+m-1+imt)
. etr(-it6ZZ' +
where
~rz')
A' = [I
r
dZ
0],
B'=[OI],
r
and
Proof of Theorem 2.2.1
By conditioning on Z, the characteristic function can be factored
as follows:
(2.2.18)
~Q(t)
= E[exp(itDQ(Z))]
= [(nl-l)/(n2- 1)]\imrt
• Ez{E[IS11-~imtetr{-~(nl-1)itSl1(Z-Xl)(Z-Xl)'}IZ]}
• E[IS21~imtetr{\(n2-1)itS21(Z-X2)(Z-X2)'}IZ]}
31
From Lemma 2.2.2,
(2.2.19)
(h(t)
= {rr [~(n1+m-1-imt)]/rr [~(n1-1)]n~r 112I 1 1-~imt
. f II +
-1( n1- I)'I t XX'
XX 'I-~(n1+m-1-imt) etr [1
-~n1
+ i~(n2-1)itX(Z-M1)'I~~] dX
and
(2.2.20)
= {r r [~(n2+m-1+imt)]rr [~(n2-1)]n~mr 112I21~imt
1
. f II + YY'I-~(n2+m-1+imt)etr[~n21(n2_1)it]YY'
+ ~(n2-1)it]Y(Z-M2)'I;~] dY.
Integration with respect to Z produces
z'
= (X'
y') establishes the theorem.
Q.E.D.
32
Theorem 2.2.2
The characteristic function of the linear discriminant function
given by Equation 2.1.25 is
(2.2.22)
. f II
+ 2(n-2)-lA'ZZ'A
• (I + 2(n-2)-lA'ZZ'A + 2(n-2)-1
where
A'
= [I r
0],
B'=[01],
r
and
33
Proof of Theorem 2.2.2
The characteristic function can be written as
(2.2.23)
~L(t)
= E{etr[-~i(n-2)t(Sl+S2)-1(Z-X1)(Z-X1)'l
.
etr[~i(n-2)t(Sl+S2)-1(Z-X2)(Z-X2)'l}.
Lemma 2.2.1. gives
(2.2.24)
~L(t)
= (2n)-mrE{ff
IS1+S2Imetr[-~(Sl+S2)RR'
+ ~(n-2)itYR'
where
X
= Z-X 1 and
Y
= Z-X 2 .
-
~(Sl+S2)TT'
+
i~(n-2)itXT'l
dRdT}
The variables Sl and S2 can be
integrated out to produce
(2.2.25)
~L(t)
= (2n)-mr
E{ff {r r [~(n+2m-2)]/rr [~(n-2)]}
+ TT')-l] dRdT}
using Lemma 2.2.3.
(2.2.26)
~L(t)
Following integration with respect to Xl and X2 ,
= EZ[(2n)-mr{rr[~(n+2m-2)]/rr[~(n-2)]}12L2Im
. ff II
+ L1RR' + L1TT'I-~(n1-1)II + L2RR'
+ L2TT'I-~(n2+2m-1)etr[i~(n-2)it(Z-M1)T'
-
~nl1(n-2)itL1TT'
+
~n21(n-2)itL2RR']
•
2F1[~(n1-1),-m;~(n-2);(Ll1_L21)(L1-1
+ TT') -1] dRdT].
+
~(n-2)it(Z-M2)R'
+ RR'
34
Finally, the expectation with respect to Z is
(2.2.27)
with letting Z'
= [X'
y'] gives the final result.
Q.E.D.
The preceding two theorems provide a somewhat unified approach to
the characteristic functions.
An analogous approach gives the charac-
teristic function for the optimal discriminant function
(2.2.28)
where
35
2.3
The Asymptotic Expansions
Closed-form representations of the exact characteristic functions
of the QDF and LDF were not obvious and may not lead to easily calculable probabilities.
The integral form of the characteristic function of
the UDF could not be obtained.
Since the integral of an asymptotic
expansion is the sum of the integral of each term, approximations of the
characteristic functions were readily obtained using asymptotic expansions.
The three theorems in this section provide the expansions for
the QDF, LDF, and UDF.
The proofs of these theorems are given in
Appendix C.
Theorem 2.3.1
The asymptotic expansion of the characteristic function of the
quadratic discriminant function is
• {I - (4nl)-l[mr{r-m+l+mt2 - i(r-2m+l)t}
- (4n2)-l[mr{r-m+l+mt2 + i(r-2m+l)t} + 4{m
36
+ ( 16n1n2)-1
2
2
~
~
~ ~
~
k=O £=0 K A <!>gK' A
a~,A
~
• P;,A[~(I + it~)-\f;(I + it~)-\A,(I + it~)-\B]
+ ( 96n1)-1
~
4
~ b 1 P 1 (t)
k=O KKK
4
+ (96n~)-1 ~ ~K b 2K P 2K (t)} + 0 3
K=O
where A, B, f, and
~
are defined by Equation 2.2.18, a
K
and b. are
JK
given in the proof in Appendix C, and
it~)-\r;(I + ita)-\AJ,
P 1K (t)
= PK[\J}t(I
+
P2K (t)
= PK[\J}t(I
+ ita)-\r;(I +
ita)-~J.
The corollary given below is for the case when m
= 1.
Without loss
of generality, the sample space can be transformed so that TIl has mean 0
and covariance I while TI 2 has a diagonal covariance matrix.
This
property is used to simplify the resulting equations.
Corollary 2.3.1
then
+ t 4 + 4r{r + i(r-l)t} p(~)(t) + 4r{r - i(r-l)t}
~
37
- 6r + 14)t + (3r 2
-
3r + 4)t 3 ]} + 24{r 2
+ 4)t 2 + 3rt 4 + 2i[(3r 3
where
o = 0,
7r 2
-
-
-
4 + rt 2
6r + 14)t + (3r 2
2 e IT 1
1, Z
e
IT2'
Theorem 2.3.2
The asymptotic expansion of the characteristic function of D (2) is
L
(2.3.3)
~L(t)
= In2I
+
n1I1I2-1ImII2I1-1Iml~
+
ital-~etr(TT')
+ 2(n -n )(p100000 + p000100) + 2(2m-n +n2)(p010000
1
2
(1)
(1)
1
(1)
38
- n
2
2
(2p020000 _ pOl 0000 + 4pOllOOO _ 2pOllOOO
(2)
(12)
(12)
(12)
• a K1 ,··· ,K6 pK 1 ,···K6
Q>
Q>
+ 96 m{nl-n2
~
- 96m(2m - 1)n l n2{p(1)[T;A l (8Q2) ] +
- 48m{2P(2)(T;Ae~) - P(12)(T;Ale~)
where
1
P(1)[T;B2(8Q2)~]}
39
T
=
1
~[it(~ + itd)-l]~,
Corollary 2.3.2
then
1
+ n2A~)-2}{1 - (2n)-1[r(r-l) + 2(nl-n2)tr(O)
+ p000200(t)} _ 2n {p020000(t) + 2pOll000(t)
(2)
2
(2)
(2)
40
+ p000300(t) + 3p200100(t) + 3pl00200(t)}
(3)
(3)
(3)
+ 3p021000(t) + 3p012000(t)} +
(3)
(3)
96n~{p200010(t)
(3)
+ p200001(t) + p000210(t) + p000201(t) + 2pl00ll0(t)
(3)
(3)
(3)
(3)
+ 2plOOl0l(t)} + 96n n {p020010(t) + p020001(t)
(3)
1 2 (3)
(3)
41
+ p002010(t) + p002001(t) + 2pOl1010(t) + 2pOll001(t)}
(3)
(3)
(3)
(3)
- 48n2(n -n ){p120000(t) + pl02000(t) + p020100(t)
1 2
(3)
(3)
(3)
+ p001200(t) + 2pl10100(t) + 2plOllOO(t)} +
(3)
(3)
+ 4pl00300(t)} +
(4)
(3)
24n~{p040000(t)
(4)
24n~
+ p 00 4000(t)
(4)
+ 4p031000(t) + 6p022000(t) + 4p013000(t)}
(4)
(4)
(4)
+ 48n n {p220000(t) + p202000(t) + p020200(t)
1 2
(4)
(4)
(4)
\
+ P(l)[TiB2(OO)]}
- 96nln2{p(l)[TiAl(002) \ ]
+ P(1)[Ti B2(OO2)\]} - 96{P(2)(Ti A1S\)
42
where
= (1-A)0,
Q
o
= [I
- nl(1-A)]-l,
Theorem 2.3.2 and Appendix C.
Lemma 2.3.1
The asymptotic expansion of the expectation given by Equation
2.2.12 when m
=1
is
(2.3.5)
- 3(r 3
-
+ i[4(2r 2
8r 2 + 9r + 36)t 2 + 3rt 4
-
15r - 61)t
- 2(3r 2 + 3r + 4)t 3 ]}
+ 2{5r 3 + 73r 2 + 452r + 528 - 8(r 2 + 37r + 60)t 2
- i[(5r 3 + 93r 2 + 760r + 912)t + 12(r 2
-
3r - 8)t 3 ]}
• tr(I-1MM') + {25r 2 + 292r + 564 - 2(7r 2 + 237r
+ 622)t 2 + 48t 4
-
i[(45r 2 + 648r + 1552)t
- (40r + 247)t 2
-
i[(50r + 333)t - (lOr
43
where
a
= {(S,Y): Is
- n(n-l)-lYY'1 > OJ.
Theorem 2.3.3
The asymptotic expansion of the characteristic function for the
discriminant function defined by Equation 2.1.26 when m = 1 and I
I
a1 a2
is given by
(2.3.6)
$U(t)
= IIilI21\itllo~I-\{Po(t)
- (4n l)-1[5irtP o (t)
+ 4(t 2 + it)P 1 (1)(t) + (t 2 + it)P 1 (2)(t)]
+ (4n2)-1[5irtPo(t) + 4(t 2
+ (t 2
-
-
it)P 2 (1)(t)
it)P 2 (2)(t)]} + ( 16nln2)-1[r 2 (2r + 3) 2t 2p o(t)
- 4r(2r + 3)(t 2 - it 3 )P 1 (1)(t)
- 4r(2r + 3)(t 2 + it 3 )P 2 (1)(t) + 16(t 2
+ t 4 )P
3
2
11 (t) - r(2r + 3)(t - it )P 1 (2)(t) - r(2r
+ 3)(t 2 + it 3 )P2(2)(t) + 4(t 2 + t 4 )P21(t) + 4(t 2
+ t 4 )P21(t) + (t 2 + t 4 )P22(t)] + (96n¥)-1[r{32(r 2
+ 6r + 8) - 3(4r 3 + 10r 2 + 9r + 36)t 2 + 3rt 4
+ i[4(4r 2
-
3r - 31)t - 2(3r 2 + 3r + 4)t 3 ]}Po(t)
+ 2{Sr 3 + 73r 2 + 452r + 528 + 4(r 2 - 56r - 120)t 2
44
- i[(5r 3 + 93r 2 + 760r + 9l2)t + l2(3r 2 + 9r
8)t 3 ]}P 1 (1)(t) + {25r 2 + 292r + 564 - 2(4r 2 + 2l9r
+ 622)t 2 + 48t 4 - i[(45r 2 + 648r + l552)t + 2(6r 2 - llr
- l76)t 3 ]}P 1 (2)(t) + {20r + 47 - (40r + 247)t 2
- i[(50r + 333)t - (lOr + 67)t 3 ]}P 1 (3)(t) + {5 - l8t 2
+ 3t 4 - i(15t - llt 3 )}P 1 (4)(t)] + (96n~)-1[r{32(r2
+ 6r + 8) - 3(4r 3 - lOr 2 - 9r + 36)t 2 + 3rt 4
- i[4(4r 2
-
3r - 3l)t - 2(3r 2 + 3r + 4)t 3 ]}
• poet) + 2{5r 3 + 73r 2 + 452r + 528 + 4(r 2 - 56r
- l20)t 2 + i[5r 3 + 93r 2 + 760r + 9l2)t + l2(3r 2
+ 9r - 8)t 3 ]}P 2 (1)(t) + {25r 2 + 292r + 564 - 2(4r 2
+ 2l9r + 622)t 2 + 48t 4 + i[(45r 2 + 648r + l552)t
+ 2(6r 2 - llr - l76)t 3 ]}P2(2)(t) + {20r + 47
- (40r + 247)t 2 + i[(SOr + 333)t - (lOr
+ 67)t 3 }P 2 (3)(t) + {5 - l8t 2 + 3t 4 + i(15t
- llt 3 )}P 2 (4)(t)] + °3}
where
45
Corollary 2.3.3
46
+ 4(t 2 + t 4 )P12(t) + (t 2 + t 4 )P 22 (t)]
+ (96n~)-1[r{32(r2 + 6r + 8) - 3(4r 3
+ 36)t 2 + 3rt 4 + i[4(4r 2
-
-
10r 2 - 9r
3r - 31)t - 2(3r 2 + 3r
+ 4)t 3 ]} + 4{5r 3 + 73r 2 + 452r + 528 + 4(r 2 - 56r
- 120)t 2
-
i[(5r 3 + 93r 2 + 760r + 912)t + 12(3r 2
+ 9r - 8)t 3 ]}P 1 (1)(t) + 4{25r 2 + 292r + 564
- 2(4r 2 + 2l9r - 622)t 2 + 48t 4
- i[(45r 2 + 648r + 1552)t + 2(6r 2
- llr - l76)t 3 ]}P 1 (2)(t) + 8{20r + 47 - (40r + 247)t 2
- i[(50r + 333)t - (lOr + 67)t 3 ]}P 1 (3)(t) + 16{5
- l8t 2 + 3t 4
-
i(l5t - llt 3 )}P 1 (4)(t)]
+ (96n~)-l[r{32(r2 + 6r + 8) - 3(4r 3 - 10r 2 - 9r
+ 36)~2 + 3rt 4 - i[4(4r 2
-
3r - 3l)t - 2(3r 2 + 3r
+ 4)t 3 ]} + 4{5r 3 + 73r 2 + 452r + 528 + 4(r 2 - 56r
- l20)t 2 + i[5r 3 + 93r 2 + 760r + 9l2)t + 12(3r 2
+ 9r - 8)t 3 ]P 2 (l)(t) + 4{25r 2 + 292r + 564 - 2(4r 2
+ 2l9r + 622)t 2 + 48t 4 + i[(45r 2 + 648r + l552)t
+ 2(6r 2 - llr - 176)t 3 ]}P 2 (2)(t) + 8{20r + 47 - (40r
47
+ 247)t 2 + i[(SOr + 333)t - (lOr + 67)t 3 ]}P l (3)(t)
where
and
The previous theorem deals with the region defined by the intersection
of
~l
and
~2'
The total misclassification probability is given by
P[classify IT 2 I Z e IT l ]
(2.3.8)
= P[DU(Z)
where
e
< k and I~lI~2
P[~ln~2
=1
I Z e IT l ] +
P[~lu~2
Z e IT l ] + S{1 -
P[~2
I Z e IT l ]
I Z e IT l ]}
is determined by randomization and will be assumed to be
~
Only the first term in Equation 2.3.8 does not vanish for large n.
here.
The
probability above can be rewritten as
(2.3.9)
P[classify IT 2 I Z e ITt]
= P[DU(Z)
+ \{l +
P[~2
< k and I~tI~2
p[~tn~2
=0
(2.3.10)
P[~l
I Z e ITt]
I Z e ITt]}
in Equation 2.3.6.
tion 2.3.5 in a similar manner.
2.3.6 reveals
Z e ITt]
Z e ITt] -
The probability of the intersection of
setting t
= 11
The
~t
and
P[~.]
J
~2
is determined by
can be obtained from Equa-
Examination of Equations 2.3.5 and
CHAPTER 3
PROBABILITY DISTRIBUTIONS
3.1
Introduction
In the previous chapter, characteristic functions were derived for
the distributions of the discriminant functions.
To be useful, these
characteristic functions must be converted to distribution functions.
The formula
(3.1.1)
where $ is the characteristic function of the cumulative distribution
function F and F(n) is the nth derivative was used repeatedly to obtain
the desired probabilities.
This chapter shows the derivations of the general techniques for
inverting the characteristic functions and describes the distributional
properties of each discriminant function.
Monte Carlo studies were used
to evaluate the accuracy of the asymptotic approximations.
The results
in this chapter are the basis from which the discriminant functions are
evaluated in later chapters.
e-
49
3.2
Inverse Fourier Transforms
The characteristic functions under consideration have the general
form of
(3.2.1)
where any combination of the three terms may be present.
These combin-
ations represent seven distributions.
The first is the normal distribution,
(-00 < x < 00)
(3.2.2)
where
~
is the standard normal distribution. Properties of this distri-
bution include the cumulants
K.
J
=0
(j >
2L
and the derivatives are
(3.2.3)
(n ~ 0),
where H is the Hermite polynomial defined by Equation D.7.
The second distribution is the gamma distribution,
(x > 1-1)
(3.2.4)
where y is defined by Equation A.2.7.
Its properties include the
cumulants
Kl
= Ct~
K.
= (j -1) ! Ct~j
J
+ 1-1,
(j
> 1),
50
and the derivatives
(3.2.5)
F2
(n+l)
(Cl,~,IJ;X)
= n!(x_IJ)Cl-n-le-(X-IJ)/~LCl-n-I[(x_IJ)/~]/r(Cl)~Cl
n
(n ~ 0).
The next one is a reflection of the gamma distribution about IJ,
(3.2.6)
(x < IJ)
with cumulants
Kl
= -Cl~
K.
= (-1) j U -1) ! Cl~j U > 1)
J
+ IJ,
and derivatives
(3.2.7)
(n ~ 0).
Since products of characteristic functions correspond to sums of
independent random variables, the remaining distributions can be thought
of as linear combinations of independent normal and gamma variables.
The fourth distribution comes from the sum of a gamma and a normal
random variable,
(3.2.8)
F4(Cl,~,IJ,Y;x)
= JO~ vCl-l e -v/~~[(x-v-IJ)/cr]
dV/r(Cl)~
Cl
(-00 < x < (0)
with cumulants
Kl
K2
K.
J
= Cl~ + IJ,
= Cl~2 + cr2 ,
= U - 1) ! Clpj
U>2) ,
and derivatives
(3.2.9)
F4
(n+l)
(Cl,P,IJ,cr;x)
2 -~(n+l)
= (-1) nn-~ (2cr)
~
J
Cl-l -v/~ -~(x-v-IJ)2/cr2
O vee
• H [(x-v-IJ)/~cr] dV/r(Cl)pCl
n
(n ~ 0).
e-
51
Subtracting a gamma variate
fro~
a normal one produces the next
distribution,
(3.2.10)
(-
00
< x <
00).
~
0).
Its cumulants are
Kl =
-a~
+ IJ,
K2 =
a~2
+
02,
K. = ( -1) j (j -1) ! a~j (j > 2)
and
J
and its derivatives are
(3.2.11)
(n
The difference of two gamma variables leads to the distribution
(3.2.12)
F6(al,a2'~1'~2,IJ;X)
= IX
{lv_lJlal+a2-1
-00
•
(1/~1
+
1/~2)lv-IJI]/r[\(al+a2)
+ \sgn(V-IJ)(al-a2)]
• ~lal~2a2} dv
where U is defined by Equation 0.14.
and
The cumulants are
Kl
= al~l
K.
= (j-1)![al~lj
J
- a2~2 +
IJ
+ (-1)ja2~2j] (j>1).
The derivatives are
(3.2.13)
= n![sgn(x-IJ)]nexp{-\[(l/~l
+
sgn(x-IJ)(l/~l
-
+ 1/~2)
1/~2)]lx-lJl}
52
al+a2+k;(1/~1 + 1/~2)lx-~I]L~::a2-n+k-1[~{(1/~1 + 1/~2)
+
sgn(x-~)(l/~l
-
1/~2)}lx-~I]
(n ~ 0).
The final distribution can be derived by adding a normal random variable
to the difference of two gamma random variables.
The resulting distri-
bution function is
(3.2.14)
F7(al,a2'~1'~2,~,a;x)
= ~~al~;a2{~
val+a2-1e-v/~1 ~[(x-v-~)/a]
• U[a2,al+a2;(1/~1 + 1/~2)V] dv/f(al)
+
~
fO v
al+a2- 1 -v/~2
e
~[(x+v-~)/a]U[al,al+a2;(1/~1
(-~
< x <
Additional properties of the distribution include the cumulants,
Kl = al~l
- a2~2
+ IJ,
K2 = al~12 + a2~2 + a 2 ,
K. = (j-1)! [al~lj + (-l)ja2~2j] (j > 2),
J
and the derivatives
(3.2.15)
F7
(n+1)
(al,a2'~1'~2,IJ,a;x)
= (_1)nn-~(2a2)-(n+1)
~).
e.
53
(n
~
0).
The numerical procedures for computing these distribution functions and
their derivatives are given in Appendix D.
The characteristic function common to all three of the discriminant
functions,
0.2.16)
can be rewritten as an infinite sum of the characteristic functions
shown in Equation 3.2.1.
Two relationships are needed to construct the
sum:
0.2.17)
and
(3.2.18)
(aB + a 2C)(I + aA)-l
= aCA-l
+ (B - CA-1)A- 1 - (B - CA-l)A-l(I + aA)-l.
The application of these to Equation 3.2.16 gives
(3.2.19)
~6(t)
= ~61(t)~62(t)etr{-\[t2
- (-1)6it]B633}
~B6jj [( - 1)61 - A6j-1]A-1
A- 1
• etr { "'2
6j + ~·tB
"'2~ 6jj 6j
54
o
[
Boll
B =
o
B
o12
013
8
B
021
B
022
B023
B
031
B
032
B
033
]
'
and
A = diag("'-o'l'" ,,"'-0' ),
Oj
J
Jr.
J
for j
= 1,2
and k
= 1, ... ,rJ..
(-l)j '''-OJ k > 0,
The technique of Khatri and
Srivastava (1971),
e·
where "'-Oj1 and "'-ojr. are the largest and smallest roots of A '
Oj
J
respectively, gives a reasonable rate of convergence.
Inversion of the characteristic function
~o(t)
cumulative distribution function
(3.2.21)
F(Ao,Bo;x)
-
-1
- la01Ao11
-~
-1
la 02A02 '
-~
etr{~Bo11
Using Lemma 0,5
in 3.2.19 produces the
55
(Xl
(Xl
L
L
j=O k=O
with
Gjk(AO,BO;x) = F 1 (IJ ,cr;x)
for rl = r2
=
= F 2 (\rl+j,-a 01 ,IJ;x)
°
for rl > 0, r2 = 0, tr(B
)
o33 =
°
= F 3 C\r2+k ,a 02 ,IJ;x)
for rl = 0, r2 > 0, tr(B
)
033 =
= F 4 C\rl+j,-a 01 ,IJ,cr;x)
for rl > 0, r2 = 0, tr(B
)
o33 >
)
= F S C\r2+k ,a 02 ,IJ,cr;x) for rl = 0, r2 > 0, tr(B
o33 >
°
°
°
= F6(\rl+j,\r2+k,-a01,ao2,IJ;x)
for rl > 0, r2 > 0, trCB
)
o33 =
°
= F7(\rl+j,\r2+k,-a01,a02,IJ,cr;x)
for rl > 0, r2 > 0, tr(B o33 ) >
6i)
IJ = \[tr(B 011 A
6i)
+ tr(B 022 A
cr = [tr(B
and when r j = 0, then gk£(AOj,B Oj ) =
033
°
o
+ (-1) trCBo33)]'
)]\
° for
all k,Q >
° and la6jAOj i
= 1.
Traces of matrices with rank zero are considered to be identically zero.
The asymptotic expansions of the characteristic functions can be
written in terms of traces by applying Lemma B.2.
With the definition
m
(3.2.22)
=
n tr[ro.(I
j=1
J
+ itA
o)
-no
J],
the terms of the expansion become
(3.2.23)
(Xl
= ~o£ (t)j~O
(Xl
k~ohj(n1,···,nm£;Aol;rOlll,···,romQ11)
56
The inverse transform of Equation 3.2.23 is
(3.2.24)
FOQ(M,i\;x)
= la6t
asymptotic expansion of the cumulative distribution function is
(3.2.25)
Fo(M
A
,1\ ;
X
)
~
F00 (M
P
A
,1\ ;
X
)
2
+ n- l L~
~
L
j=O k=O
q
+ n-2 L~
4
~
L
bJ'Ok F(k)(M
A
OJ'
,
1\ ;
X
a 0. k F(k)(M
oJ'
J
A
,1\ ;
X
)
)
j=O k=O
The coefficients
3.3
a~k
and
b~k
are defined in Appendix D.
Properties of the Limiting Distributions
The distributional properties of each of the discriminant functions
can be studied by inspecting the characteristic functions and the
associated cumulants.
The optimal discriminant function (see Equation
2.1.27) has the characteristic function
57
(3.3.1)
When the covariance matrices are equal (LI
= L2),
the characteristic
function becomes
(3.3.2)
~O(t)
= etr{\itL- I [2M o (M I -M 2 )'
-
MIMI' + M2M2 ']
+ \(it)2r-I(MI-M2)(MI-M2)'}·
Whereas the general case leads to a complicated distribution, the case
of equal covariance matrices leads to a normal distribution.
The first
four cumulants from Equation 3.3.1 are
(3.3.3)
(3.3.4)
(3.3.5)
and
(3.3.6)
A comparison of the characteristic functions
D has a limiting distribution that is optimal.
Q
Corollary 2.3.1, D has the limiting cumulants
Q
(3.3.7)
KI
C3.3.8)
K2
(3.3.9)
K3
= \[logIAI
~O
trCA 30 - 2MM'),
(3.3.10)
D and DU are asymptotically equivalent.
Q
The limiting distribution of D is normal with
L
C3.3.12)
These cumulants simplify to
C3.3.13)
reveals that
Under the conditions of
and
and
~Q
1
+ tr[Ao(I-A- I )] + (-1)Otr(AO- MM'),
= \tr[AoCI-A-I)]2 +
= -tr[AoCI-A-I)]3,
C3.3.11)
and
58
(3.3.14)
under the conditions of Corollary 2.3.2.
This discriminant function is
optimal in the limit only if the covariance matrices are equal.
3.4
Accuracy of the Asymptotic Approximations
To obtain estimates of the approximation error of the asymptotic
expansions, Monte Carlo methods were employed.
The pseudorandom numbers
were generated by programs that were written in the C computer programming language.
These programs were executed under the MS-DOS operating
system on IBM and IBM-compatible microcomputers using Intel 8087 math
coprocessors.
The method of Ahrens and Dieter (1972) was used to
generate normal deviates.
Tables 3.1 - 3.7 show the results.
The parameters for each table
were chosen so that the computational accuracy of Equation 3.2.25 could
be assessed for each F. described in Equation 3.2.21.
J
To achieve some
gain in computational speed, 500 unknowns were classified from each
population using the same training samples.
This process iterated until
the estimated standard errors were less than 0.001 for all of the
discriminant functions.
One half of the unknown observations in the
undefined region of the UDF were considered to be misclassified.
Each
pair of sample sizes required from 3 days of continuous computation time
for r
=2
to 30 days for r
= 20.
Prior to general publication, these computations will be rerun on a
VAX 8700.
A broader range of parameters will be examined.
A sample run
showed that the computation time ranges from 10 minutes for r
hours for r
= 20.
=2
to 24
59
Table 3.1
Probabilities of Misclassification Comparing Monte Carlo
Simulation with Asymptotic Expansion for
r=2, M' = (1 0), and diag A = (1 1).
DISCRIMINANT
FUNCTION
POPULATION
Monte
Carlo
Estimate
Asymptotic
Approximation
Monte
Carlo
Estimate
Asymptotic
Approximation
nl=10
Optimal
0.309
0.309
0.308
0.309
LDF
0.341
0.335
0.342
0.335
QDF
0.370
0.395
0.372
0.395
UDF
0.378
0.397
0.381
0.397
Optimal
0.309
0.309
0.308
0.309
LDF
0.343
0.313
0.304
0.313
QDF
0.397
0.388
0.304
0.324
UDF
0.415
0.372
0.307
0.395
nl=100
Optimal
0.311
0.309
0.310
0.309
LDF
0.314
0.312
0.314
0.312
QDF
0.317
0.318
0.317
0.318
UDF
0.318
0.318
0.317
0.318
•
60
Table 3.2
Probabilities of Misclassification Comparing Monte Carlo
Simulation with Asymptotic Expansion for
r = 2, M' = (4 0), and diag A = (1 1).
DISCRIMINANT
FUNCTION
POPULATION
Monte
Carlo
Estimate
Asymptotic
Approximation
Monte
Carlo
Estimate
Optimal
0.023
0.023
0.023
0.023
LDF
0.033
0.029
0.032
0.029
QDF
0.043
0.032
0.042
0.032
UDF
0.073
0.036
0.073
0.036
Asymptotic
Approximation
n2=100
nl=10
Optimal
0.023
0.023
0.023
0.023
LDF
0.027
0.024
0.026
0.024
QDF
0.046
0.034
0.025
0.022
UDF
0.118
0.029
0.023
0.034
nl=100
Optimal
0.023
0.023
0.023
0.023
LDF
0.024
0.023
0.024
0.023
QDF
0.025
0.023
0.024
0.023
UDF
0.023
0.024
0.027
0.024
e
61
Table 3.3
Probabilities of Misclassification Comparing Monte Carlo
Simulation with Asymptotic Expansion for
r = 2, M' = (4 0), and diag A = (4 4).
DISCRIMINANT
FUNCTION
POPULATION
Monte
Carlo
Estimate
Asymptotic
Approximation
Monte
Carlo
Estimate
nl=10
Asymptotic
Approximation
n2=10
Optimal
0.049
0.043
0.102
0.102
LDF
QDF
UDF
0.037
0.024
0.181
0.171
0.098
0.092
0.110
0.098
0.125
0.049
0.136
0.160
nl=10
n2=100
Optimal
0.047
0.043
0.102
0.102
LDF
0.027
0.023
0.161
0.161
QDF
UDF
0.109
0.072
0.094
0.074
0.151
0.046
0.089
0.142
nl=100
n2=10
Optimal
0.049
0.043
0.102
0.102
LDF
QDF
UDF
0.030
0.022
0.174
0.159
0.048
0.068
0.121
0.125
0.048
0.047
0.187
0.127
n2=100
nl=100
Optimal
0.049
0.043
0.102
0.102
LDF
QDF
UDF
0.024
0.043
0.160
0.103
0.049
0.048
0.105
0.102
0.050
0.049
0.105
0.103
62
Table 3.4
Probabilities of Misclassification Comparing Monte Carlo
Simulation with Asymptotic Expansion for
r = 2, M' = (l 0), and diag /\ = (l 4).
DISCRIMINANT
FUNCTION
POPULATION
n1
Monte
Carlo
Estimate
n2
Asymptotic
Approximation
Monte
Carlo
Estimate
nl=10
Asymptotic
Approximation
n2=10
Optimal
0.226
0.222
0.289
0.289
LDF
0.310
0.335
0.365
0.341
QOF
0.291
0.321
0.335
0.392
UDF
0.281
0.298
0.371
0.420
nl=10
n2=100
Optimal
0.225
0.222
0.289
0.2~9
LDF
0.319
0.313
0.313
0.314
QOF
0.297
0.312
0.290
0.329
0.313
0.365
UDF
e
0.295
0.290
nl=100
n2=10
Optimal
0.225
0.222
0.288
0.289
LOF
0.284
0.313
0.396
0.313
QOF
0.222
0.256
0.361
0.408
UDF
0.211
0.302
0.423
0.400
n2=100
nl=100
Optimal
0.225
0.222
0.289
0.289
LDF
0.309
0.312
0.315
0.312
QOF
0.221
0.232
0.301
0.299
UDF
0.218
0.242
0.304
0.298
e
63
Table 3.5
Probabilities of Misclassification Comparing Monte Carlo
Simulation with Asymptotic Expansion for
for r = 2, M' = (2 0), and diag A = (4 0.25).
POPULATION
DISCRIMINANT
FUNCTION
Monte
Carlo
Estimate
n1
Asymptotic
Approximation
Monte
Carlo
Estimate
n1=10
n2
Asymptotic
Approximation
n2=10
Optimal
0.128
0.125
0.290
0.287
LDF
0.246
0.191
0.306
0.337
QDF
0.213
0.254
0.284
0.398
UDF
0.241
0.287
0.300
0.427
n2=100
n1=10
Optimal
0.128
0.125
0.288
0.28-7
LDF
0.173
0.165
0.314
0.314
QDF
0.151
0.148
0.305
0.307
UDF
0.156
0.154
0.344
0.312
n2=10
n1=100
Optimal
0.128
0.125
0.288
0.289
LDF
0.173
0.165
0.314
0.314
QDF
0.151
0.148
0.305
0.307
UDF
0.156
0.154
0.344
0.312
n2=100
n1=100
Optimal
0.128
0.125
0.287
0.287
LDF
0.172
0.162
0.306
0.312
QDF
0.136
0.138
0.284
0.298
UDF
0.136
0.141
0.284
0.301
64
Table 3.6
Probabilities of Misclassification Comparing Monte Carlo
Simulation with Asymptotic Expansion for
r = 20, M' = (1 O... 0), and diag 1\ = (l . .. 1) .
DISCRIMINANT
FUNCTION
POPULATION
n1
Monte
Carlo
Estimate
Asymptotic
Approximation
Monte
Carlo
Estimate
n1=50
e
n2
Asymptotic
Approximation
n2=50
Optimal
0.309
0.309
0.309
0.309
LDF
0.372
0.400
.0.374
0.400
QDF
0.453
0.559
0.452
0.559
UDF
0.456
0.619
0.455
0.619
n2=100
n1=50
Optimal
0.309
0.309
0.308
0.309
LDF
0.387
0.370
0.331
0.370
QDF
0.681
0.454
0.225
0.352
UDF
0.731
0.450
0.190
0.415
Q2=100
n1=100
Optimal
0.309
0.309
0.309
0.309
LDF
0.346
0.355
0.347
0.355
QDF
0.428
0.434
0.426
0.434
UDF
0.428
0.464
0.426
0.464
e
65
Table 3.7
Probabilities of Misclassification Comparing Monte Carlo
Simulation with Asymptotic Expansion for
r = 20, M' = (4 0 ... 0), and diag A = (1 ... 1).
DISCRIMINANT
FUNCTION
POPULATION
TTl
Monte
Carlo
Estimate
Asymptotic
Approximation
Monte
Carlo
Estimate
nl=50
TT 2
Asymptotic
Approximation
n2=50
Optimal
0.023
0.023
0.023
0.023
LDF
0.044
0.043
0.044
0.043
QDF
0.111
0.081
0.111
0.081
UDF
0.160
0.087
0.159
0.087
Optimal
0.023
0.023
0.022
0.023
LDF
0.038
0.036
0.034
0.036
QDF
0.208
0.088
0.029
0.045
UDF
0.366
0.081
0.019
0.066
n2=100
nl=100
Optimal
0.023
0.023
0.023
0.023
LDF
0.033
0.033
0.032
0.033
QDF
0.058
0.052
0.058
0.052
UDF
0.058
0.055
0.058
0.055
CHAPTER 4
MEASURES OF DISCRIMINANT FUNCTION PERFORMANCE
4.1
Introduction
Large sample techniques are frequently utilized to compare the
performance of a new statistic to those of commonly used statistics.
In
this chapter a measure of efficiency is constructed and used to study
the discriminant functions as n approaches infinity.
Unfortunately when
two discriminant functions have the same asymptotic efficiency, it is
difficult to learn anything about their relative behaviors in smaller
sample size situations.
To address this problem, a measure of defi-
ciency has been constructed.
Under the conditions of both equal and
unequal covariances, the functions DO' D , D and D
L
U
Q
are compared using these measures.
4.2
Measures of Efficiency
The objective of discriminant analysis is to classify unknown
objects using data from previously classified objects.
The usual
strategy for choosing a discriminant function is to minimize the cost,
or probability, of misclassification.
To expand this concept, consider
a quantity called the discrimination potential defined as
67
(4.2.1)
where D is the discriminant function, C is the cost of misclasR
sification when an unknown object is classified randomly, and CD is the
cost of misclassification when an unknown object is classified using D.
If CR
~
CD' the discriminant function is not useful and the discrimina-
tion potential of D will be considered to be identically zero.
By
letting Co be the additional overhead of using a discriminant function,
C1 be the cost of misclassifying an observation Z from n 1 , and C2 be the
cost of misclassifying an observation from n 2 and by defining
(4.2.2)
8 1 = P[misclassifying a randomly classified Z
Z e nil
(4.2.3)
8 2 = P[misclassifying a randomly classified Z
Z e n 2 ],
(4.2.4)
q1 = P[D(Z) < k
(4.2.5)
q2 = P[D(Z)
(4.2.6)
~
k
Z e nil,
Z e n 2 },
w1=p[Zenil,
and
(4.2.7)
then
(4.2.8)
and
(4.2.9)
It follows that
(4.2.10)
DP(D)
= w1(8 l -ql)C l + wa(8 a -Q2)C2 - Co
W1 8 1 C1 + W2 8 2 C2'
A natural definition for the absolute efficiency of D becomes
(4.2.11)
eff(D) = DP(D)/DP(D ).
O
68
The relative efficiency of D1 with respect to D2 can be obtained using
(4.2.12)
By taking the limit of the efficiency as n approaches infinity, the
asymptotic efficiency can be obtained.
For this study it was assumed that Co
= 8 1 = 82
=~.
= 0,
C1
= C2 = 1,
and
w1
= w2
After simplification, this gives the formula
and the relative efficiency becomes
(4.2.14)
Chang (1980) examined a number of different measures of efficiency using
a different approach and chose the one described by Equation 4.2.14
above.
The approach presented here gives a more general measure of
efficiency than those of Chang.
4.3
Measures of Deficiency
For the case when the asymptotic relative efficiency is one, Hodges
and Lehmann (1970) have developed a measure of deficiency.
This quanti-
ty represents the number of additional observations needed to make one
statistic equivalent t9 another statistic (as n
some parameter such as the variance.
~ ~)
with respect to
This section defines an analogous
measure of deficiency appropriate for discriminant functions.
The total discrimination potential, nDP(D) , of the discriminant
function D is a measure of the effectiveness of a training sample of
size n.
By letting n'
= n+d n
and by setting
where D1 is constructed from n samples and D2 is constructed from n'
samples, the equation can be solved for d.
n
The variable d
n
is a signed
69
quantity and represents the increment needed to obtain an equivalent
total potential.
Equation 4.3.1 can be rewritten as
(4.3.2)
since CR is unrelated to the choice of discriminant function.
By sub-
stitution
(4.3.3)
n{w1[e1 + a11/n + b 11 /n 2 + 0a]C 1 + w2[e2 + a12/n + b 12 /n 2
+ 0a]}C 2
= (n+d n ){w1[e1
+ a21/(n+d ) + b 21 /(n+d )2 + 0a]C 1
n
n
+ w2[e2 + a22/(n+d n ) + b 22 /(n+d )2 + Oa]}C 2
n
-
dnC .
R
Simplification leads to
(4.3.4)
As n
~ ~
=w1(a11- a 21)C 1 + w2(a12- a 22)C 2
(4.3.5)
w1(e1- e 1)C 1 + C2w2(e2- e2)
Under the conditions above
(4.3.6)
(a21- a 11) + (a22- a 12)
1- e 1- e 2
4.4
Performance under Equal Covariances
Under the assumption of homoscedasticity, the QDF, LDF, and UDF are
asymptotically equivalent.
To study the large sample properties of the
discriminant functions, the relative deficiencies as defined by Equation
4.3.6 can be used.
These are presented in Table 4.1 comparing QDF and
UDF to LDF for the dimensions r = 2, 5, 10, 20 and for distances between
populations
~
= 1,
2, 3, 4.
The relative performance of QDF and UDF
improves with decreasing dimension and
increasing~.
Decreased per-
formance also occurs as the sampling proportions deviate from 0.5.
The
UDF performs uniformly more poorly than the QDF under these conditions.
70
Sample sizes can be estimated by the discriminant function D
equation.
(4.4.1)
neD)
where DO is the optimal discriminant function and all, a12, el, e2 are
defined in Equation 4.3.3.
Estimated sample sizes are presented in
Tables 4.2 and 4.3 under various conditions.
To achieve efficiencies in
the region of 90-95%, a useful rule-of-thumb for the LDF might be to
choose a sample size at least four times r 2 .
71
Table 4.1
Deficiencies of D and D Relative to D When
L
U
Q
M'
5
10
20
0 ... 0] and A
=I
~=1
r
2
= [~
0.50
6.3
6.5
1.8
1.8
0.6
0.8
0.1
0.3
0.25
9.3
13.7
2.7
3.5
0.9
1.4
0.3
0.6
0.10
22.3
36.5
6.4
9.1
2.3
3.6
0.8
1.6
0.50
23.2
26.9
6.1
6.0
1.9
2.0
0.4
0.6
0.25
33.4
48.5
8.7
10.6
2.8
3.5
0.8
1.1
0.10
77.3
121.5
20.3
26.6
6.9
9.1
2.2
3.2
0.50
66.6
82.1
15.8
16.7
4.6
4.8
0.8
1.0
0.25
94.8
135.9
22.7
27.8
6.9
8.3
1.6
2.2
0.10
216.7
3Z7.9
52.5
68.0
16.9
21.3
5.0
6.6
0.50
213.9
276.1
46.4
53.6
12.5
14.3
1.8
2.5
0.25
301.0
424.8
66.1
83.6
18.8
23.4
3.9
5.3
0.10
678.7
987.7
151.6
197.9
46.1
58.4
12.6
16.4
72
Table 4.2
Sample Sizes Required to Achieve a Fixed Level of Efficiency
When M'
= [~
(n!
r
1
2
3
4
=I
= 0.50)
Efficiency (%)
D
50
2
0 ... 0] and A
L
75
80
85
90
95
97.5
99
11
14
18
27
55
110
274
Q
18
36
45
60
90
180
360
900
U
18
37
46
62
92
185
369
923
15
31
77
L
Q
10
13
17
26
51
103
257
U
10
13
17
26
52
104
260
17
42
L
Q
10
20
39
98
U
12
24
48
120
L
10
26
Q
16
40
12
23
58
U
5
1
2
3
4
L
14
29
36
48
71
143
286
715
Q
61
122
152
203
304
608
1216
3039
U
68
136
170
227
341
681
1362
3406
10
13
20
39
78
196
L
Q
16
32
40
53
80
160
321
802
U
16
32
40
53
79
158
317
792
10
19
39
97
L
Q
19
28
57
114
284
U
20
29
59
118
294
L
12
24
60
Q
20
41
102
U
23
47
117
e
e
•
73
Table 4.2 (cont'd)
10
1
2
3
4
20
Efficiency (%)
D
r
1
2
3
4
50
75
80
85
90
95
97.5
99
L
36
71
89
119
178
357
713
1783
Q
169
338
422
563
844
1689
3377
8444
U
200
400
500
666
999
1999
3998
9995
L
10
19
24
32
48
96
193
482
Q
41
83
103
138
206
413
826
2064
U
43
86
108
143
215
430
860
2150
12
16
24
47
94
236
L
Q
28
35
46
69
139
277
694
U
29
36
48
72
144
287
718
L
15
31
61
153
Q
31
63
125
313
U
37
74
148
370
L
95
190
238
317
476
952
1905
4761
Q
523
1046
1307
1743
2615
5229
10459
26147
U
647
1295
1618
2158
3237
6473
12947
32367
L
26
51
64
85
128
256
511
1278
Q
118
237
296
394
592
1183
2367
5918
U
133
266
332
443
664
1328
2656
6640
L
25
32
42
63
126
253
632
Q
75
94
126
189
377
755
1887
U
83
103
138
206
413
825
2063
L
42
84
169
421
Q
61
121
242
606
U
67
134
267
668
74
Table 4.3
Sample Sizes Required to Achieve a Fixed Level of Efficiency
When H'
= [~
0 .. , 0] and A
(Ttl
50
1
2
3
4
5
1
2
3
4
= 0.25)
Efficiency (%)
D
2
=I
L
75
80
85
90
95
97.5
99
11
14
18
27
55
110
274
Q
24
48
60
80
120
240
480
1200
U
33
66
82
109
164
328
656
1640
15
31
77
L
Q
14
17
23
34
68
137
342
U
17
21
28
43
85
170
425
17
42
L
Q
13
26
52
131
U
18
36
73
182
10
26
L
Q
11
21
53
U
17
33
83
L
14
29
36
48
71
143
286
715
Q
81
162
203
270
405
811
1621
4053
U
III
222
278
371
556
1112
2225
5562
10
13
20
39
78
196
L
Q
21
43
53
71
107
214
427
1069
U
25
50
63
83
125
250
501
1252
10
19
39
97
L
Q
19
25
38
76
151
379
U
22
30
45
90
179
448
12
24
60
L
Q
14
27
54
136
U
17
34
69
172
e
e
75
Table 4.3 (cont'd)
r
10
j.I
1
2
3
4
20
1
2
3
4
Efficiency (%)
D
50
75
80
85
90
95
97.5
99
L
36
71
89
119
178
357
713
1783
Q
225
450
563
751
1126
2252
4503
11258
U
307
615
768
1025
1537
3074
6148
15370
t
10
19
24
32
48
96
193
482
Q
55
110
138
183
275
550
1101
2752
U
65
130
163
217
326
652
1304
3261
12
16
24
47
94
236
L
Q
37
46
62
92
185
370
925
U
43
53
71
107
214
427
1068
L
15
31
61
153
Q
31
63
125
313
U
37
74
148
370
L
95
190
238
317
476
952
1905
4761
Q
697
1395
1743
2324
3486
6973
13945
34863
U
945
1890
2362
3194
4724
9448
18897
47242
L
26
51
64
85
128
256
511
1278
Q
158
316
394
526
789
1578
3156
7890
U
193
386
482
643
964
1928
3856
9641
L
13
25
32
42
63
126
253
632
Q
50
101
126
168
252
503
1006
2515
U
59
119
149
198
297
594
1189
2972
L
21
28
42
84
169
421
Q
40
54
81
162
323
808
U
48
64
95
191
382
954
76
4.5
Performance under Unequal Covariances
The LDF is clearly superior to the QDF or UDF under the homo-
scedasticity assumption.
To evaluate the LDF when the covariances are
unequal, the equation
eff(DL,D )
(4.5.1)
Q
=k
was solved for A1 and A2' the variances, for two values of
between the populations.
~,
the distance
The constant k was set at 0.90 and 0.95.
Polar
coordinates were used and a solution was obtained every lis n radians.
These results are presented in Table 4.5 and are plotted in Figures 4.1
and 4.2.
SAS/GRAPH software (SAS Institute Inc.) was used to generate the
plots and the points were connected using the parametric spline option.
The LDF appears to be generally robust against heteroscedasticity
when
(4.5.2)
2-2~ < A1 < 2~.
The performance tends to be relatively unaffected by large deviations
..,
~
perpendicular to the first component on the order of
(4.5.3)
2
-~
~
< A2 < 2 .
Deficiencies were computed for various conditions.
in Tables 4.6 and 4.7.
These are shown
The UDF again was more deficient than the QDF
and the QDF showed an advantage over the LDF for variances much different than one.
The QDF loses its advantage when the distance between
the populations increases.
77
Table 4.5
Asymptotic Isoefficient Sets for LDF Relative to QDF and UDF with r
and Origin at A1 = 1, A2 = 1.
= 2,
Angle
0.95
a
1/ 8
1/ 4
3/ 8
1/ 2
5/ 8
3/ 4
7/ 8
9/ 8
5/ 4
11/ 8
3/ 2
13/ 8
7/ 4
15/ 8
0.95
Tt
Tt
Tt
Tt
Tt
Tt
Tt
Tt
Tt
Tt
Tt
Tt
Tt
Tt
Tt
0
1/ 8
1/ 4
3/ 8
1/ 2
5/ 8
3/ 4
7/ 8
9/ 8
5/ 4
11/ 8
3/ 2
13/ 8
Tt
Tt
Tt
Tt
Tt
Tt
Tt
Tt
Tt
Tt
Tt
Tt
Tt
7/ 4 Tt
15/ 8 Tt
2.04
1.84
1.49
1.25
1.00
0.78
0.54
0.17
0.09
0.27
0.60
0.78
1.00
1.22
1.53
1.89
1.00
1.29
1.49
1.71
1.67
1.84
1.85
2.09
1.00
0.58
0.60
0.55
0.60
0.62
0.65
0.77
4.90
3.89
2.32
1.53
1.00
0.46
0.16
0.08
0.06
0.13
0.30
0.59
1.00
1. 78
2.78
4.32
1.00
1. 76
2.32
2.78
3.40
6.38
6.15
2.83
1.00
0.43
0.30
0.27
0.29
0.25
0.36
0.55
2.53
2.26
1. 73
1.32
1.00
0.68
0.21
0.05
0.04
0.08
0.42
0.72
1.00
1.37
1.90
2.35
1.00
1.40
1. 73
1.94
2.19
2.54
4.85
3.41
1.00
0.34
0.42
0.45
0.46
0.47
0.53
0.70
6.51
5.22
3.12
1.83
1.00
0.12
0.05
0.04
0.04
0.05
0.09
0.29
1.00
2.04
4.15
5.83
1.00
1. 98
3.12
4.27
7.24
1.72
1. 95
3.83
1.00
0.33
0.09
0.05
0.14
0.18
0.24
0.48
78
FIGURE 4.1
ASYMPTOTIC ISOEFFICIENT SETS
FOR
LDF RELATIVE TO QDF (UDF)
7
15
2
O+--+---t----------t----i-t-----1
-2
-a
79
FIGURE 4.2
ASYMPTOTIC ISOEFFICIENT SETS
FOR
LDF RELATIVE TO QDF (UDF)
7
•
M
a
A.
=[:]
0]
o
= [A1
A2
0+-+--+---------....-+-------4-4-
-l
-a
-s
80
Table 4.6
Deficiencies of D and D Relative to D for r
L
U
Q
When H' =
D
L
0.25
1.00
0.25
~=2
D
U
D
L
1.0
-6.3
0.2
D
U
0.5
-1.8
0.0
4.00
11.8
1.0
0.25
0.2
0.5
1.00
0.10
0] and A = AI
1-1=1
Ttl
0.50
[~
=2
-9.3
4.4
-2.7
0.8
4.00
4.7
1.4
0.25
0.9
2.3
1.00
4.00
-22.3
14.3
4.1
-6.4
2.7
3.1
81
Table 4.7
Deficiencies of D and D Relative to D
L
U
Q
for r
0.50
= 10
When H'
= AI
29.6
-66.6
15.5
2.00
37.5
0.50
39.2
1.00
0.10
0 .,. 0] and A
0.50
1.00
0.25
= [~
-94.8
41.1
2.00
56.8
0.50
81.3
1.00
2.00
-216.7
111.2
126.7
CHAPTER 5
AN APPLICATION TO MEDICAL DIAGNOSIS
5.1
Introduction
Many examples of the use of discriminant functions are present in
the medical literature.
Clearly, at least some physicians accept the
concept of discriminant analysis.
Unfortunately many articles have used
discriminant analysis computer programs for hypothesis testing.
Only
the rare instance has demonstrated the use of discriminant functions to
diagnose new cases of disease.
It is disappointing that physicians'
ignorance of quantitative methods have created an unnecessary
barrier between medical practice and the use of mathematical and
statistical techniques in diagnosis.
The study presented below is a
subset of a larger project to redefine physiological normality and to
apply statistical methods to make medical decisions about individual
patients.
This study examines the use of some clinical chemistry tests to
determine a patient's gender.
Gender was selected here because it has
two primary classifications that are equally likely, its phenotype is
readily apparent, and the phenotype is a very reliable measure of the
genotype.
This is an example where the outcome is easily verified,
where the training samples are not likely to be misclassified, and where
the physician is likely to do considerably worse than the discriminant
function.
83
5.2
Methods
The data were obtained from a clinical laboratory of a university
teaching hospital.
The training samples were from volunteers between
the ages of 20 and 50 years who were considered to be healthy based on
their medical history and physical examination.
The validation sample
came from 18 to 60 year-old "normal" patients seen at the hospital
during a five month period.
The patients were a subset of a larger
prospective study on multivariate diagnosis where 1516 consecutive
patients were identified as being eligible by having essentially complete hematology, clinical chemistry, and urinalysis testing at admission.
Each patient's chart was reviewed at least two weeks after
admission to determine the diagnosis related to the present illness.
A
total of 1038 patients were excluded because the patient was taking more
than four drugs, had a history of more than two unrelated diagnoses, the
diagnosis was unknown, or the chart was never available.
From the
eligible patients, 40 patients had a diagnosis of "normal" or had a
disease that is not considered to alter clinical chemistry results.
The variables that were measured in serum included sodium (Na),
potassium (K), chloride (Cl), carbon dioxide (C02), urea nitrogen (BUN),
creatinine (Cr), uric acid (UA), calcium (Ca), phosphorus (P0 4 ) , glucose
(Glc), total protein (Prot), albumin (Alb), total bilirubin (TB), direct
bilirubin (DB), alkaline phosphatase (AP), lactate dehydrogenase (LO),
aspartate aminotransferase (AST), and alanine aminotransferase (ALT).
Transformations were used to achieve approximate multivariate normality.
A logarithmic transformation was used with Alb, TB, DB, AP, LO, AST, and
ALT.
Since the values of TB and DB were rounded to one decimal place,
one was added to avoid the singularity at zero.
84
The methods for estimating the probabilities of misclassification
are described in Section 1.5.
The results were obtained using PROC
DISCRIM, PROC LOGIST, and PROC MATRIX from SAS (SAS Institute Inc.)
and the C programs developed for asymptotic expansions.
The asymptotic
expansion estimator was constructed by substituting unbiased estimators
for the parameters and the frequencies displayed for the asymptotic
expansion estimator were estimated from the probabilities.
The actual
probabilities were obtained by using the training samples to
classify the validation samples.
5.3
Results
Data from 77 males and 25 females were available for the training
samples and data from 24 males and 16 females were available for the
validation sample.
The means and standard deviations are given in Table
5.1 and the correlations are in Table 5.2 for the training samples. A
univariate comparison of the means suggests that Cl is lower in males
than females while CO 2 , BUN, Cr, UA, and TB appear to be higher in
males.
Based on the standard deviations, men seem to have more
variability in UA and Glc and less variability in Na, K, and Ca than
women.
From 153 pairs of correlation coefficients, there appear to be
only a few that differ between the sexes:
Cl-Prot, Cl-AP, Alb-K, and Alb-UA.
CO 2 -UA, CO 2 -AP, Ca-DB, Ca-Cr,
The subtleties of these six dif-
ferences cannot be fully explained with current medical knowledge.
The usual tendency in this type of situation is to reduce the
dimension (r) of the problem.
commonly applied.
Logistic regression is a procedure
After subjecting these data to logistic regression
employing a backward elimination approach, seven variables appear to
4It
85
explain the difference between male and female: CO 2 , Cr, VA, BUN, Prot,
Alb, and AST.
The estimated probabilities of misclassification shown in Table 5.3
suggest several
tren~s.
First, the reduction in dimension led to a
reduction in performance for all three discriminant functions, based on
estimators that rely solely on the training samples.
performs the LDF and UDF with these estimators.
validation samples are quite different.
effect and the LDF is the best performer.
The QDF out-
The results from the
Dimension reduction has little
86
Table 5.1
Means and Standard Deviations of the Training Samples
Test
Mean ± 2 SEM
Male
(n=77)
Na
K
141.5
± 0.6
4.24 ± 0.09
Standard
Deviation
Female
(n=25)
141. 7
± 1.3
4.35 ± 0.20
Male
Female
2.62
3.32
0.397
0.492
Cl
103.2
± 0.5
105.1
± 0.9
2.13
2.33
CO 2
26.2
± 0.5
24.1
± 1.0
2.28
2.55
BUN
13.7
± 0.7
10.9
± 1.2
3.22
3.04
0.122
0.113
1.17
0.988
Cr
1.19 ± 0.03
UA
6.0
Ca
9.52 ± 0.08
9.43 ± 0.18
0.349
0.471
P04
3.62 ± 0.09
3.84 ± 0.21
0.402
0.527
± 2
9.72
6.00
± 0.5
0.430
0.504
Glc
96
± 0.3
± 2
± 0.1
0.98 ± 0.05
4.3
96
7.1
± 0.4
Prot
7.0
Alb*
1.50 ± 0.01
1.47 ± 0.02
0.0591
0.0585
TB*
0.45 ± 0.04
0.31 ± 0.05
0.196
0.118
DB*
0.07 ± 0.01
0.05 ± 0.02
0.0577
0.0600
AP*
4.42 ± 0.05
4.37 ± 0.11
0.238
0.267
LD*
5.17 ± 0.04
5.14 ± 0.05
0.154
0.137
AST*
3.06 ± 0.07
2.88 ± 0.11
0.317
0.285
ALT*
3.3
0.680
0.776
* Transformed variable
± 0.2
2.9
± 0.3
e
87
Table 5.2
Product-Moment Correlation Coefficients of the Training Samples
FEMALE
Na
Na
K
0.47
0.40
Cl
0.31
CO2
BUN
0.13
0.06
VA
Ca
0.16
0.30
0.27
-0.01
-0.09
b
0.37
0.30
GLc
0.00
-0.02
c
b
-0.08
0.17
CO 2
BUN
Cr
VA
Ca
0.12
0.10
-0.12
-0.08
0.51
-0.07
a
0.53
-0.23
0.20
-0.03
-0.36
0.04
0.01
0.12
0.09
0.37
0.03
-0.05
-0.09
0.10
-0.05
-0.21
-0.05
-0.32
-0.09
0.15
-0.13
-0.21
-0.16
-0.39
a
0.27
0.15
b
0.14
0.35
0.08
0.04
0.02
0.13
-0.04
0.15
-0.03
-0.17
0.02
0.12
-0.02
-0.11
-0.14
0.20
0.17
0.21
0.10
0.24 a
-0.03
-0.11
0.14
- 0.20
-0.00
0.03
c
0.43
0.19
-0.16
c
-0.04
-0.38
-0.08
0.18
0.02
0.28
0.10
0.21
0.06
0.05
0.06
-0.04
0.00
0.11
0.12
TB
0.00
-0.13
-0.07
-0.11
-0.20
-0.01
0.38
DB
0.10
-0.07
-0.11
-0.17
0.07
0.19
AP
-0.18
0.11
-0.01
-0.14
-0.12
0.10
-0.06
-0.09
a
a
-0.21
-0.11
-0.01
0.17
0.26
-0.31
b
-0.07
-0.18
b
b
-0.31
-0.14
-0.14
P04
0.28
-0.05
0.10
0.10
Alb
b
0.13
0.21
0.15
0.57
0.10
b
P04
Prot
a
c
0.19
Cr
Cl
K
LD
0.23
AST
0.03
0.07
-0.16
-0.03
0.16
0.11
-0.05
0.14
0.26
ALT
0.07
-0.05
0.06
-0.05
0.20
0.08
0.07
-0.09
0.23
MALE
a)
0.01 < P < 0.05
b)
0.001 < P < 0.01
c)
P < 0.001
b
88
Table S.2 (Cont.)
e
FEMALE
Glc
Prot
Alb
Na
0.23
0.39 a
0.46
K
0.36
0.11
0.34
a
TB
DB
AP
LD
AST
ALT
0.11
0.03
-0.24
0.37
0.16
-0.13
0.11
0.14
0.02
0.11
0.32
-0.09
0.09
0.01
-0.05
Cl
-0.06
0.01
0.25
-0.24
-0.15
-0.45 a
CO2
0.17
0.17
-0.09
O.OS
0.03
0.34
0.12
0.19
0.16
BUN
-0.22
0.17
-0.12
-0.38
-0.06
-0.03
-0.18
-0.20
0.25
Cr
-0.21
0.01
-0.09
0.02
-0.07
-0.12
0.25
0.08
0.25
0.31
0.11
-0.14
0.07
-0.16
0.14
0.14
0.06
-0.27
0.10
0.23
0.35
UA
Ca
P0 4
0.09
0.44
0.20
a
0.27
Glc
Prot
Alb
-0.00
-0.11
0.00
0.03
0.50
a
0.54
b
a
c
0.05
0.32
0.07
-0.02
-0.06
0.09
0.08
-0.33
0.00
0.34
-0.04
;'0.01
0.21
0.01
-0.07
0.32
0.16
-0.17
0.26
0.09
0.18
-0.17
-0.18
-0.07
0.39
-0.09
0.27
0.08
-0.01
0.25
0.34
0.36
c
-0.07
-0.09
0.35
-0.01
0.14
0.10
0.09
-0.05
0.13
0.04
-0.06
-0.14
0.11
-0.20
-0.20
a
0.16
-0.09
-0.01
MALE
a)
0.01 < P < 0.05
b)
0.001 < P < 0.01
c)
P < 0.001
b
0.08
0.13
-0.08
0.44
a
0.S2
0.38
AP
-0.05
-0.26
0.01
DB
ALT
0.58
0.4S
0.09
0.23
AST
0.22
-0.11
TB
LD
-0.23
b
-0.11
0.28
b
-0.14
0.22
0.2S
0.08
a
-0.07
0.32
0.22
b
a
0.20
0.29
a
e
89
Table 5.3
Estimated Probabilities (Frequencies) of Misclassification
Estimator
n
True
Class
LDF
QDF
r
Apparent
77
25
Jackknife
77
25
UDF
= 18
M
F
0.026(2)
0.000(0)
0.065(5)
0.080(2)
M
F
0.065(5)
0.120(3)
0.143(11) 0.013(1)
0.280(7) 0.000(0)
0.013(1)
0.000(0)
Asymptotic
Expansion
77
25
M
F
0.005 (1)
0.146(4)
0.038(3)
0.197(5)
0.019(1)
0.512(13)
Actual
24
16
M
F
0.042(1) 0.083(2)
0.813(13) 0.188(3)
O.OOO(OL
1. 000 (16)
r
Apparent
Jackknife
Asymptotic
Expansion
Actual
77
M
25
F
77
M
25
F
77
M
25
F
24
16
M
F
=7
0.078(6)
0.000(0)
0.104(8)
0.120(3)
0.091(7)
0.000(0)
0.091(7)
0.160(4)
0.117(9)
0.120(3)
0.104(8)
0.240(6)
0.071(5)
0.153(4)
0.103(8)
0.087(2)
0.113(9)
0.230(6)
0.083(2)
0.250(4)
0.125(3)
0.063(1)
0.083(2)
0.250(4)
90
5.4
Discussion
The results presented above are difficult to interpret because the
true population parameters are unknown and the validation samples were
obtained at a different time period than the training samples.
Under
these conditions no clear cut conclusions can be drawn.
However the
results provide a starting point to study this problem.
Additional
studies should make the situation clearer.
•
CHAPTER 6
CONCLUSIONS AND SUGGESTIONS FOR FUTURE RESEARCH
6.1
Summary of Findings
This study was designed to examine the performance of the estimated
linear, quadratic, and unbiased-density discriminant functions when the
training samples are from multivariate normal distributions with unknown
unequal means and covariance matrices.
Several new tools for studying dis-
criminant functions were developed including a class of polynomials with
matrix arguments, the concept of discrimination potential, a generalization of a measure of efficiency, and a measure of deficiency when the
asymptotic efficiencies are equal.
By all measures the UDF is not
superior to the QDF and the LDF outperforms the QDF except when the
covariance matrices are notably different.
6.2
The Search for Better Methods
Discriminant analysis is not widely used at least in part because
of the large sample sizes needed to obtain near optimal results.
The
UDF studied herein does not contribute anything to improve this situation.
However, the UDF may be more robust to nonnormality, and since it is not
greatly different than the QDF, it should be rexamined under other
conditions.
Several other known discriminant functions may be better.
92
The logistic discriminant function is available in several computer
packages.
It is based on a transformation of the likelihood ratio, ie.,
•
(6.2.1)
However the logistic function is used in a more general form than this
and the parameters can be estimated by several techniques, usually
maximum likelihood.
Anderson (1984) suggests a discriminant function
based on the likelihood ratio test criterion:
-
~n1r'log(n1)
+ ~log(IS21/IS11)
- ~(n1+1)10g[1 + (Z-X)'Si 1 (Z-X)]
+ ~(n2+1)log[1 + (Z-Y)'S2 1 (Z-Y)].
Lemma 4 of Ghurye and Olkin (1969) can be used to construct a UMVU
estimator of the optimal QDF under the assumption of unequal covariance matrices.
The discriminant function is
r
+ ~ L {~[~(n1-j)] - ~[~(n2-j)]} + ~'log(IS21/IS11)
j=l
- ~(n1-r-2)tr[Si1(Z-X)(Z-X)']
+ ~(n2-r-2)tr[S21(Z-Y)(Z-Y)']
where
~(x)
=d
log r(x)/dx.
Under the assumption of equal covariance
matrices, Equation 6.2.3 simplifies to
(6.2.4)
D4 (Z)
= \r(1/ n1
- 1/ n2 ) + \(n1+n2-r-3)
·tr[S-l{(Z-X)(Z-X)'-(Z-Y)(Z-Y)'}].
It is apparent that essentially every method of construction for test
statistics and estimators can be applied to the construction of discriminant functions.
However what is optimal for testing and estimating is
not necessarily optimal for discriminating.
~
93
6.3
A Unified Approach to the Study of Discriminant Functions
Several aspects of discriminant function theory need to be devel-
oped.
The foremost aspect is the definition of optimality.
The defini-
tion should incorporate the following properties:
1)
Optimality should be based on maximizing the discrimination
potential.
2)
The discriminant function should be "optimal" for all sample
sizes.
3)
The method of construction should apply to both discrete and
continuous random variables.
Additional computational considerations should be examined:
1)
The information contained in training samples should be
representable as parameter estimates or sufficient statisti~s.
2)
The probabilities of misclassification should be easy to
estimate.
Suggested steps for developing a unified approach should be as follows:
1)
Definition of optimality,
2)
Development of methods of constructing the optimal discriminant
functions from training samples,
3)
Thorough study of the multivariate normal case with equal
covariance matrices,
4)
Thorough study of the multivariate normal case with unequal
covariance matrices,
5)
Study of other continuous multivariate distributions
6)
Thorough study of the multinomial distribution and other
discrete distributions,
7)
Extension of the findings to mixtures of discrete and
continuous variables, and
8)
Extension of the findings to sequential discrimination.
94
The classical approach to discrimination has been to choose D so that
the total probability of misclassification (cost) is minimized when the
training samples are large.
The total probability of misclassification
is
In the past, most of the research effort has focussed on using the
likelihood ratio to construct the discriminant function.
The
traditional approach is based on the theory of hypothesis testing.
An
approach using estimation theory may be more productive, especially
with smaller sample sizes.
6.4
Applications to Medicine
Voluminous amounts of data are being collected on patients re-
ceiving medical care.
On a given patient, each measurement and lab-
oratory test is intertwined with every other.
multidimensional observations.
This creates very complex
Unfortunately the human mind has great
difficulty in dealing with multidimensional problems.
Most medical
decisions are made by interpreting the situation with multiple univariate or stepwise approaches.
Each physician develops an intuitive
approach to these problems but tends to depend heavily on unusual cases
(outliers) and can only inefficiently share experiences with other
physicians.
A technique of storing these experiences and applying them
objectively would greatly enhance the practice of medicine.
For many of the discrete variables, artificial intelligence (AI)
techniques are being developed in the form of expert systems.
many cases where AI cannot decide between several choices.
There are
The applica-
tion of discriminant analysis to these cases could greatly improve the
abilities of these expert systems.
The use of these systems is in-
~
95
evitable. It is important that the field of discriminant analysis be
thoroughly studied so that the results can be applied to these systems.
Heuristic approaches should only be used as a last resort.
The sequential problem is much more interesting.
Acute disease
processes tend to follow defined pathways through a multidimensional
space of those factors that are affected by the disease.
fully recover will return to their starting point.
Patients who
Those patients who
do not fully recover or who die may deviate from the usual pathway for
the disease.
Early detection of these deviations from the recovery
path may have a significant impact on medical management of the patient.
The pathways should be disease- or organ-specific and will provide
diagnostic regions in n-space that can be discriminated.
6.5
Conclusions
Haphazard approaches to these problems could lead to much wasted
time and inefficient use of data.
Discriminant analysis is not being
enthusiastically studied because past research has not been rewarding
and because the distribution theory is very difficult.
Because of the
tremendous potential mentioned above and in other areas, the lack of
utility of discriminant analysis should not be assumed until the topic
has been thoroughly studied with a unified theory designed to extract
the maximum amount of information from the data.
APPENDIX A
MATHEMATICAL ANALYSIS
A.I
Introduction
This appendix provides the mathematical tools needed to evaluate
the properties of the discriminant functions.
The results here are
presented without proof because they are commonly available in textbooks.
A.2
Asymptotic Expansions and Approximations
To begin, some definitions are needed.
An asymptotic expansion of
g(z) is defined as a convergent or divergent power series
where z is a complex variable and the ak's are real constants such that
g(z) is approximately equal to the given power series as Izi approaches
infinity (izi
~~)
in some region of the complex plane.
diverges, only a finite number of terms approximate g(z).
expansions are written as
(A.2.1)
When the series
Asymptotic
97
When g(x)/h(x)
~
1 as x
~ ~,
hex) is said to be an asymptotic approxima-
tion of g(x) and this is written as
(A.2.2)
If g(x)/h(x)
~
hex)
~
0 as
X ~ ~,
(A.2.3)
g(x)
(x
~ ~)
(x
~ ~)
(x
~ ~)
this is represented by
g(x)
= o(h(x))
and, if Ig(x)/h(x)1 is bounded, this is represented as
(A.2.4)
g(x)
= O(h(x))
Operations with asymptotic expansions such as addition, multiplication,
division, and integration are the same as for other power series (see
section A.4).
Differentiation is the notable exception.
The derivative
g'(x) must be continuous and its asymptotic expansion must exist before
this operation can be used.
A detailed description of the properties of
asymptotic expansions is given by Olver (1974).
The first asymptotic expansion that will be described is for the
gamma function
~
(A.2.5)
rex)
= Jo
t x - 1e- t dt.
This expansion is well known and its approximation is known as
Stirling's formula, or approximation.
It is
(A.2.6)
(Ixl
According to Gradshteyn and Ryzhik (1965),
lel(x)1 < 1/ 2SS • x 2
~ ~,Iarg
xl < n).
98
when x is real and positive.
Bounds when x is complex are described by
Olver (1974).
An asymptotic expansion for the incomplete gamma function
(A.2.7)
is also useful.
It is easier to obtain the expansion for the complemen-
tary incomplete gamma function
f(a,x)
(A.2.8)
=f
~
t a - 1 e- t dt
x
and to use the relationship
(A.2.9)
y(a,x)
= f(a)
- f(a,x).
The asymptotic expansion is given by
~
f(a,x) - e-xx a - 1 L (a-l)···(a-k)x- k .
k=O
(A.2.10)
A.3
Taylor Series Expansions and Approximations
Any function f(x) which is analytic inside and on a circle centered
at x
=a
(A.3.l)
can be represented by a Taylor series,
f(x)
= f(a)
+ f(l)(a)(x-a) + ••• +(l/k!)f(k)(a)(x-a)k +
where f(k)(a) is the kth derivative of f(x) evaluated at a.
This series
can also be represented as a finite power series with a remainder
99
(A.3.2) f(x) = f(a) + f(l)(a)(x-a) + ••• + f(k-l)(a)(x_a)k-l + Rk(x-a).
This remainder has several forms, one of which is
(A.3.3)
(0 < 8 < 1).
Two special cases of the Taylor series will be given here.
The
first is
eX = 1 + x + \x 2 + e2(x).
(A.3.4)
In this case E2(X) = 1/6·x3e8x <
1/6
x3 e X
log(l-x) = -x - ~x2 -
(A.3.5)
E3(X)
Here
= -\(1
The second situation is
1/3X3 -
E3(X).
ex)-4 x 4
-
Both E2(X) and E3(X) converge to zero as x converges to zero.
A.4
Properties of Power Series
In general, a power series in x has the form
co
where {ak}k=O and
Ixl < R.
e
are constants.
R may be infinite.
A power series converges for some
The main properties of power series are
given without proof as· follows:
(Ixl < R
(A.4.1)
where c
k
= ak+b k ,
R2 )
R1 and R2 are the respective radii of convergence.
b
(A.4.2)
1 /\
co
k
I ( I akx ) dx
a k=O
co
=I
k=O
b
k
I akx dx,
a
(-R < a < b < R)
100
d
(A.4.3)
( Ix I <
=
dx
R)
(A.4.4)
k
where
c
= L a .b _ . ,
k
j=O J k J
co
co
co
k
k
k
( L bkx )/( L akx ) = L c x
k=O
k=0
k=O k
(A.4.5)
k
where
c k = b k - ao-1
'"
L
j=1
(ao
ck_·a. ,
0).
1:-
J J
Other properties are given in Gradshteyn and Ryzhik (1965) and in many
advanced calculus textbooks.
A.5
Hypergeometric Functions
A hypergeometric function of a single argument is defined as
co
(A.5.l)
F (a1, ... ,a ;~1'."'~ ;x)
p q
p
q
= k=O
L
(a1\'" (ap)k x
k
(~1)k' .. (~q\ k!
The convergence of these functions depends on p, q, a,
~,
and x.
Many
well-known functions are special cases of hypergeometric functions.
APPENDIX B
INTERMEDIATE RESULTS OF CHAPTER 2
The purpose of this appendix is to present mathematical concepts
that are not essential to the main text of Chapter 2.
The first lemma
deals with the integration of traces of quadratic forms.
The function
G[o] is defined as
G [0]
(B.1)
A
= n-~retr(-AA')
f etr(-ZZ'
+ 2AZ')[o] dZ.
Lemma B.1
Let (X)
= tr(X)
and A
= A'XX'A,
then the following identities hold:
(B.3)
(B.4)
::
1:
~:]
[
12
-3
3
P (3) (D;A) ]
P(21)(D;A)
[
P(l3)(D;A)
102
(B.5)
G [(A)4]
D
48
48
48
G [ (A) 2 (A) 2]
D
G [(A 2 )2]
D
48
20
8
48
-8
= 1/ 48
G [ (A)(A 3 )]
D
G [(A 4 )]
D
(B.6)
[GD [ (A)(B) I
G [ (AB)]
D
(B.7)
48
J=
1/ 2
28
-8
12
P(2 2 )(D;A)
6 -12
-3
12
P(21 2 )(D;A)
4
-6
P(l4)(D;A)
-2
(D;A.B) ]
P(2) (D;A,B)
4
G [ (AB )(A) ]
D
4 -1
= 1/4
P(4)(D;A)
P (31) (D;A)
Cch
[: -: ]
G [ (A)2(B)]
D
G [(A2 )(B)]
D
G [(A 2 B)]
D
48
-4 -24
-8
48
48
e
4
4
4
4
2 -2
4 -2 -2
4 -1 -1
1
PH) (D;A,B)
pH1) (D;A,B)
2
Pbb (D;A,B)
2
P !)(D;A,B)
C1
e
103
(B.10)
Gn[(ABAB)]
(B.11)
G[(AB)(C)]
(B.12)
G[(AB)(AC)]
(B.13)
Gn[(ABAB)]
104
105
Lemma B.2
The following identities hold:
= 1,
(B.17)
P(O)(A)
(B.18)
P(l)(AjB)
(B.19)
P(2)(AjB)
= _1/ 2 [m·tr(BB')
+ 2tr(AA'BB')],
= 1/12{m(m+2)tr 2 (BB')
+ 2m(m+2)tr(BB')2
+ 4(m+2)tr(BB')tr(AA'BB') + 8(m+2)tr[AA'(BB')2]
+ 4tr 2 (AA'BB') + 8tr(AA'BB')2},
(B.20)
P(12)(AjB)
= 1/ 6 {m(m-l)tr 2 (BB')
- m(m-l)tr(BB')2
+ 4(m-l)tr(BB')tr(AA'BB') - 4(m-l)tr[AA'(BB')2]
+ 4tr 2 (AA'BB') - 4tr(AA'BB')2},
(B.2l)
P(3)(AjB)
= _1/12~{m(m+2)(m+4)tr3(BB')
+ 6m(m+2)(m+4)tr(BB')tr(BB')2 + 8m(m+2)(m+4)tr(BB')3
+ 6(m+2)(m+4)tr 2 (BB')tr(AA'BB') + l2(m+2)(m+4)
• tr(BB')2 tr (AA'BB') + 24(m+2)(m+4)tr(BB')tr[AA'(BB')2]
+ 48(m+2)(m+4)tr[AA'(BB')3] + l2(m+4)tr(BB')tr 2 (AA'BB')
+ 24(m+4)tr(BB')tr(AA'BB')2 + 48(m+4)tr(AA'BB')
• tr[AA'(BB')2] + 96(m+4)tr[(AA'BB')2BB'] + 8tr 3 (AA'BB')
+ 48tr(AA'BB)tr(AA'BB')2 + 64tr(AA'BB')3},
106
(B.22)
P(21)(A;B)
= _1/40{3m(m-1)(m+2)tr 3 (BB')
+ 18m(m-1)(m+2)tr(BB')tr(BB')2 - 6m(m-1)(m+2)tr(BB')3
+ 18(m-l)(m+2)tr 2 (BB')tr(AA'BB') + 6(m-l)(m+2)tr(BB')2
• tr(AA'BB') + 12(m-l)(m+2)tr(BB')tr[AA'(BB')2]
- 36(m-l)(m+2)tr[AA'(BB')3] + 32(m+2)tr(BB')tr 2 (AA'BB')
+ 4(m+2)tr(BB')tr(AA'BB')2 + 24(m+2)tr(AA'BB')
• tr[AA'(BB')2] - 72(m+2)tr[(AA'BB')2BB']
+ 24tr 3 (AA'BB') + 24tr(AA'BB')tr(AA'BB')2
- 48tr(AA'BB')3},
(B.23)
P(13)(A;B)
= _1/24{m(m-l)(m-2)tr 3 (BB')
- 3m(m-l)(m-2)
• tr(BB')tr(BB')2 + 2m(m-l)(m-2)tr(BB')3
+ 6(m-l)(m-2)tr 2 (BB')tr(AA'BB') - 6(m-l)(m-2)tr(BB')2
- 12(m-l)(m-2)tr(BB')tr[AA'(BB')2] + 12(m-2)tr(BB')
• tr 2 (AA'BB') - 12(m-2)tr(BB')tr(AA'BB')2 - 24(m-2)
• tr(AA'BB')tr[AA'(BB')2] + 24(m-2)tr[(AA'BB')2BB']
+ 8tr 3 (AA'BB') - 24tr(AA'BB')tr(AA'BB')2
+ 16tr(AA'BB')3},
107
(B.24)
P(4)(A;B)
= 1/1680{m(m+2)(m+4)(m+6)tr 4 (BB/)
+ 12m(m+2)(m+4)(m+6)tr 2 (BB/)tr(BB/)2
+ 12m(m+2)(m+4)(m+6)tr 2 (BB/)2 + 32m(m+2)(m+4)(m+6)
· tr (BB/)tr(BB/)3 + 48m(m+2)(m+4)(m+6)tr(BB/)4
+ 8(m+2)(m+4)(m+6)tr 3 (BB/)tr(AA/BB/) + 48(m+2)(m+4)(m+6)
• tr(BB/)tr(BB/)2 tr (AA/BB/) + 64(m+2)(m+4)(m+6)tr(BB/)3
• tr(AA/BB/) + 48(m+2)(m+4)(m+6)tr 2 (BB/)tr[AA/(BB/)2]
+ 96(m+2)(m+4)(m+6)tr(BB/)2 tr [AA/(BB/)2]
+ 192(m+2)(m+4)(m+6)tr(BB/)tr[AA/(BB/)3]
+ 384(m+2)(m+4)(m+6)tr[AA/(BB/)4] + 24(m+4)(m+6)
• tr 2 (BB/)tr 2 (AA/BB/) + 48(m+4)(m+6)tr 2 (BB/)tr(AA/BB/)2
+ 48(m+4)(m+6)tr(BB/)2 tr 2(AA/BB/) + 96(m+4)(m+6)tr(BB/)2
• tr(AA/BB/)2 + 192(m+4)(m+6)tr 2 [AA/(BB/)2]
+ 384(m+4)(m+6)tr[AA/(BB/)2]2 + 192(m+4)(m+6)tr(BB/)
• tr(AA/BB/)tr[AA/(BB/)2] + 384(m+4)(m+6)tr(BB/)
• tr[(AA/BB')2BB'] + 384(m+4)(m+6)tr(AA/BB/)tr[AA'(BB')3]
+ 768(m+4)(m+6)tr[(AA'BB')2(BB')2] + 32(m+6)tr(BB')
• tr 3 (AA'BB') + 192(m+6)tr(BB')tr(AA'BB/)tr(AA/BB/)2
+ 256(m+6)tr(BB')tr(AA'BB')3 + 192(m+6)tr[AA'(BB/)2]
• tr 2 (AA'BB') + 768(m+6)tr(AA'BB')tr[(AA'BB')2BB']
+ 384(m+6)tr[AA'(BB')2]tr(AA'BB')2 + 1536(m+6)
• tr[(AA'BB')3BB'] + 16tr 4 (AA'BB') + 192tr 2 (AA'BB')
• tr(AA/BB')2 + 192tr2 (AA'BB/)2 + 512tr(AA'BB')
• tr(AA'BB/)3 + 768tr(AA'BB')4},
108
(B.2S)
P(31)(A;B)
= 1/1260
{lSm(m-l)(m+2)(m+4)tr 4 (BB')
+ 75m(m-l)(m+2)(m+4)tr 2 (BB')tr(BB')2
-
30m(~-1)(m+2)(m+4)tr2(BB')2
+ 60m(m-1)(m+2)(m+4)
• tr(BB')tr(BB')3 - 120m(m-l)(m+2)(m+4)tr(BB')4
+ 120(m-l)(m+2)(m+4)tr 3 (BB')tr(AA'BB')
+ 300(m-l)(m+2)(m+4)tr(BB')tr(BB')2 tr (AA'BB')
+ 120(m-1)(m+2)(m+4)tr(BB')3tr(AA'BB')
+ 300(m-l)(m+2)(m+4)tr 2 (BB')tr[AA'(BB')2]
- 240(m-l)(m+2)(m +4)tr(BB')2 tr [AA'(BB')2]
+ 360(m-l)(m+2)(m+4)tr(BB')tr[AA'(BB')3]
- 960(m-l)(m+2)(m+4)tr[AA'(BB')4] + 4/ 3 (m+4)(17m-lSS)
• tr 2 (BB')tr 2 (AA'BB') + 10/ 3 (m+4)(19m-7)tr(BB')2
~
tr 2 (AA'BB') + 40/ 3 (m+4)(32m-3S)tr 2 (BB')tr(AA'BB')2
+ 10/ 3 (m+4)(11m-23)tr(BB')2 tr (AA'BB')2 - 480(m-l)(m+4)
• tr 2 [AA'(BB')2] - 960(m-l)(m+4)tr[AA'(BB')2]
+ 80(m+4)(m-21)tr(BB')tr(AA'BB')tr[AA'(BB')2]
+ 40(m+4)(46m-S)tr(BB')tr[(AA'BB')2BB'] + 240(m+4)(3m+ll)
• tr(AA!BB')tr[AA'(BB')3] - 80(m+4)(19m+23)tr[(AA'BB')
• (BB')2] + 240(2m+S)tr(BB')tr 3 (AA'BB') + 240(Sm+2)
• tr(BB')tr(AA'BB')tr(AA'BB')2 + 240(Sm+23)tr[AA'(BB')2
• tr 2 (AA'BB') + 1440(m+6)tr(AA'BB')tr[AA'(BB')3]
- 480(2m+5)tr[AA'(BB')2]tr(AA'BB')2 - 1920(2m+S)
• tr[(AA'BB')3BB'] + 80/ 7tr 4(AA'BB') + 400/ 7tr 2 (AA'BB')
tr(AA'BB')2 - 160/ 7tr 2(AA'BB')2 + 320/ 7tr (AA'BB')
• tr(AA'BB')3
- 640/ 7tr(AA'BB')4},
109
(B.26)
P(2 2 )(A;B)
= 1/120{m(m-1)(m+1)(m+2)tr 4 (BB')
+ 2m(m-l)(m+l)(m+2)tr 2 (BB')tr(BB')2 + 7m(m-l)(m+1)(m+2)
· tr 2 (BB')2 - 8m(m-l)(m+l)(m+2)tr(BB')tr(BB')3
- 2m(m-l)(m+l)(m+2)tr(BB')4 + 8(m-l)(m+l)(m+2)
• tr 3 (BB')tr(AA'BB') + 8(m-l)(m+l)(m+2)tr(BB')
• tr(BB')2 tr (AA'BB') - 16(m-l)(m+l)(m+2)tr(BB')3
• tr(AA'BB') + 8(m-l)(m+l)(m+2)tr 2 (BB')tr[AA'(BB')2]
+ 56(m-l)(m+l)(m+2)tr(BB')2tr [AA'(BB')2]
- 48(m-l)(m+l)(m+2)tr(BB')tr[AA'(BB')3]
- 16(m-l)(m+l)(m+2)tr[AA'(BB')4] + 8(m+l)(3m+2)
• tr 2 (BB')tr 2 (AA'BB') + 4(m+l)(m-6)tr 2 (BB')tr(AA'BB')2
+ 4(m+l)(m-6)tr(BB')2tr2(AA'BB') + 8(m+l)(7m-2)tr(BB)2
• tr(AA'BB')2 + 24(m+l)(3m+2)tr 2 [AA'(BB')2]
+ 24(m+l)(m-6)tr[AA'(BB')2]2 + 32(m+l)(m+4)tr(BB')
• tr(AA'BB')tr[AA'(BB')2] - 96(m+l)(m+2)tr(BB')
• tr[(AA'BB')2BB'] - 224(m+l)(3m+2)tr(AA'BB')
• tr[(AA'BB)2(BB')2] + 96(m+l)tr(BB')tr 3 (AA'BB')
+
96(m+~)tr(BB')tr(AA'BB')tr(AA'BB')2
- 192(m+l)tr(BB')
• tr(AA'BB')3 + 480(m+l)tr[AA'(BB')2]tr 2 (AA'BB')
+ 320(m+l)tr(AA'BB')tr[(AA'BB')BB'] + 1760(m+l)
• tr[AA'(BB')2]tr(AA'BB')2 - 4160(m+l)tr[(AA'BB')3BB']
+ tr 4 (AA'BB') + 2tr 2 (AA'BB')tr(AA'BB')2
+ 7tr 2 (AA'BB')2 - 8tr(AA'BB')tr(AA'BB')3 - 2tr(AA'BB')4},
110
(B.27)
P(21 2 )(A;B)
= 1/30{m(m-1)(m-2)(m+2)tr 4 (BB')
- m(m-l)(m-2)(m+2)tr 2 (BB')tr(BB')2 - 2m(m-1)(m-2)(m+2)
• tr 2 (BB')2 - 2m(m-1)(m-2)(m+2)tr(BB')tr(BB')3
+ 4m(m-l)(m-2)(m+2)tr(BB')4 + 8(m-l)(m-2)(m+2)
• tr 3 (BB')tr(AA'BB') - 4(m-l)(m-2)(m+2)tr(BB')
• tr(BB')2 tr (AA'BB') - 4(m-l)(m-2)(m+2)tr(BB')3
• tr(AA'BB') + 4(m-l)(m-2)(m+2)tr 2 (BB')tr[AA'(BB')2]
- 16(m-l)(m-2)(m+2)tr(BB')2tr [AA'(BB')2] - 12(m-l)
• (m-2)(m+2)tr(BB')tr[AA'(BB')3] + 32(m-l)(m-2)(m+2)
• tr[AA'(BB')4] + 4(m-2)(6m+7)tr 2 (BB')tr 2 (AA'BB')
+ 4(m-2)(m-12)tr(BB')2 tr2(AA'BB') - 4/ 3 (m-2)(4m-7)
• tr(BB')2 tr2(AA'BB') - 4/ 3 (m-2)(11m-8)
• tr(BB')2 tr (AA'BB')2 - 32(m-l)(m-2)tr 2 [AA'(BB')2]
+ 32(m-2)(m+2)tr[AA'(BB')2]2 - 8(m-2)(Sm+2)tr(AA'
- 8(m-2)(3m+ll)tr(AA'BB')tr[AA'(BB')3 + 16(m-2)(4m+3)
• tr[(AA'BB')2(BB')2] + 48/ s (3m-l)tr(BB')tr(AA'BB')
• tr(AA'BB')2 + 24/ s (11m+8)tr(BB')tr(AA'BB')3
- 48(m-S)tr[AA'(BB')2]tr 2 (AA'BB') - 144(m+2)tr(AA'BB')
• tr[(AA'BB')2BB'] - 96(2m-l)tr[AA'(BB')2]tr(AA'BB')2
+ 192(2m-l)tr[(AA'BB')3BB'] + 48tr 4 (AA'BB')
- 48tr 2 (AA'BB')tr(AA'BB')2 - 96tr 2 (AA'BB')2
- 96tr(AA'BB')tr(AA'BB')3 + 192tr(AA'BB')4},
III
(B.28)
P(14)(A;B)
= 1/ 360 {3m(m-l)(m-2)(m-3)tr 4(BB')
- l8m(m-l)(m-2)(m-3)tr 2 (BB')tr(BB')2
+
9m(m-l)(m-2)(m-3)tr 2 (BB')2 + 24m(m-l)(m-2)(m-3)
• tr(BB')tr(BB')3 - l8m(m-l)(m-2)(m-3)tr(BB')4
+ 24(m-l)(m-2) (m-3)tr 3 (BB')tr(AA'BB')
- 72(m-l)(m-2)(m-3)tr(BB')tr(BB')2 tr (AA'BB')
+ 48(m-l)(m-2)(m-3)tr(BB')3 tr (AA'BB')
- 72(m-l)(m-2)(m-3)tr 2 (BB')tr[AA'(BB')2]
+ 72(m-l)(m-2)(m-3)tr(BB')2tr [AA'(BB')2]
+ 144(m-l)(m-2)(m-3)tr(BB')tr[AA'(BB')3]
- 144tr[AA'(BB')4] + 72(m-2)(m-3)tr 2 (BB')tr 2 (AA'BB')
- 72(m-2)(m-3)tr 2 (BB')tr(AA'BB')2 - 72(m-2)(m-3)
• tr(BB')2 tr2(AA'BB') + 72(m-2)(m-3)tr(BB')2
• tr(AA'BB')2 + 144(m-2)(m-3)tr 2 [AA'(BB')2]
- 144(m-2)(m-3)tr[AA'(BB')2j2
- 288(m-2)(m-3)tr(BB')tr(AA'BB')tr[AA'(BB')2]
+ 288(m-2)(m-3)tr(BB')tr[(AA'BB')2BB']
+
288(m~2)tr(AA'BB')tr[AA'(BB')3]
- 288(m-2)(m-3)
• tr[(AA'BB')2(BB')2] + 96(m-3)tr(BB')tr 3 (AA'BB')
- 288 (m-3)tr(BB')tr(AA'BB')tr(AA'BB')2 + 192(m-3)
• tr(BB')tr(AA'BB')3 - 288(m-3)tr 2 (AA'BB')tr[AA'(BB')2]
+ 576(m-3)tr(AA'BB')tr[(AA'BB')2BB']
+ 288(m-3)tr[AA'(BB')2]tr(AA'BB')2
- 576(m-3)tr[(AA'BB')3BB'] + tr 4 (AA'BB')
- 6tr 2 (AA'BB')tr(AA'BB')2 + 3tr 2 (AA'BB')2
+ 8tr(AA'BB')tr(AA'BB')3 - 6tr(AA'BB')4},
112
For rankeD)
(B.29)
= 1,
P(1)(DjA,B)
= 1/12[3tr(AA')tr(BB')
+ 6tr(AA'BB')
+ 6tr(BB')tr(DD'AA') + 6tr(AA')tr(DD'BB')
+ 12tr(DD'AB'BA') + 12tr(DD'BA'AB')
+ 4tr(DD'AA')tr(DD'BB') + 8tr 2 (DD'AB')]
(B.30)
PC~)(DjA,B,C)
=-
1/120{IStr(AA')tr(BB')tr(CC')
+ 30tr(AA')tr(BB'CC') + 30tr(BB')tr(AA'CC')
+ 30tr(CC')tr(AA'BB') + 120tr(AA'BB'CC')
+ 30[tr(BB')tr(CC') + 2tr(BB'CC')]tr(DD'AA')
+ 30[tr(AA')tr(CC') + 2tr(AA'CC')]tr(DD'BB')
+ 30[tr(AA')tr(BB') + 2tr(AA'BB')]tr(DD'CC')
+ 60tr(CC')tr(DD'AB'BA') + 60tr(CC')tr(DD'BA'AB')
+ 60tr(BB')tr(DD'AC'CA') + 60tr(BB')tr(DD'CA'AC')
+ 60tr(AA')tr(DD'BC'CB') + 60tr(AA')tr(DD'CB'BC')
+ 240tr(DD'AB'BC'CA') + 240tr(DD'BC'CA'AB')
+ 240tr(DD'CA'AB'BC') + 20tr(CC')tr(DD'AA')tr(DD'BB')
+ 20tr(BB')tr(DD'AA')tr(DD'CC') + 20tr(AA')tr(DD'BB')
• tr(DD'CC') + 40tr(CC')tr 2 (DD'AB')
+ 40tr(BB')tr 2 (DD'AC') + 40tr(AA')tr 2 (DD'BC')
+ 40tr(DD'AA')tr(DD'BC'CB') + 40tr(DD'AA')tr(DD'CB'BC')
+ 40tr(DD'BB')tr(DD'AC'CA') + 40tr(DD'BB')tr(DD'CA'AC')
+ 40tr(DD'CC')tr(DD'AB'BA') + 40tr(DD'CC')tr(DD'BA'AB')
+ 160tr(DD'AB')tr(DD'BC'CA') + 160tr(DD'BC')tr(DD'CA'AB')
~
113
+ 160tr(DD'CA')tr(DD'AB'BC') + 8tr(DD'AA')tr(DD'BB')
· tr(DD'CC') + 16tr(DD'AA')tr 2 (DD'BC') + 16tr(DD'BB')tr 2
· (DD'AC') + 16tr(DD'CC')tr 2 (DD'AB') + 64tr(DD'AB')
• tr(DD'BC')tr(DD'AC')},
(B.31)
P(4)(D;A,B,C)
= _1/1680{10Str 2 (AA')tr(BB')tr(CC')
+ 210tr(AA')2
• tr(BB')tr(CC') + 210tr 2 (AA')tr(BB'CC')
+ 420tr(AA')tr(BB')tr(AA'CC') + 420tr(AA')tr(CC')
• tr(AA'BB') + 420tr(AA')2 tr (BB'CC') + 840tr(AA'BB')
• tr(AA'CC') + 840tr(BB')tr[(AA')2CC'] + 840tr(CC')
· tr[(AA')2BB'] + 1680tr(AA')tr(AA'BB'CC')
+ 1680tr(AA'BB'AA'CC') + 3360tr[(AA')2BB'CC']
+ 420[tr(AA')tr(BB')tr(CC') + 2tr(AA')tr(BB'CC')
+ 2tr(BB')tr(AA'CC') + 2tr(CC')tr(AA'BB')
+ 8tr(AA'BB'CC')]tr(DD'AA') + 210[tr 2 (AA')tr(CC')
+ 2tr(AA')2 tr (CC') + 4tr(AA')tr(AA'CC')
+ 8tr{(AA')2(CC')}]tr(DD'BB') + 210[tr 2 (AA')tr(BB')
+ 2tr(AA')2 tr (BB') + 4tr(AA')tr(AA'BB')
+ 8tr{(AA')2BB'}]tr(DD'CC') + 840[tr(BB')tr(CC')
+ 2tr(BB'CC')]tr[(DD'AA')2] + 840[tr(AA')tr(CC')
+ 2tr(AA'CC')]tr(DD'AB'BA') + 840[tr(AA')tr(CC')
+ 2tr(AA'CC')]tr(DD'BA'AB') + 840[tr(AA')tr(BB')
+ 2tr(AA'BB')]tr(DD'AC'CA') + 840tr[tr(AA')tr(BB')
+ 2tr(AA'BB')]tr(DD'CA'AC') + 420[tr 2 (AA')
+ 2tr(AA')2]tr(DD'BC'CB') + 420[tr 2 (AA')
114
+ 2tr(AA')2]tr(DD'CB'BC') + 1680tr(CC')tr(DD'AB'BA'AA')
+ 1680tr(CC')tr(DD'AA'AB'BA') + 1680tr(CC')
• tr(DD'BA'AA'AB') + 1680tr(BB')tr(DD'AA'AC'CA')
+ 1680tr(BB')tr(DD'AC'CA'AA') + 1680tr(BB')
· tr(DD'CA'AA'AC') + 3360tr(AA')tr(DD'AB'BC'CA')
+ 3360tr(AA')tr(DD'BC'CA'AB') + 3360tr(AA')
• tr(DD'CA'AB'BC') + 3360tr(DD'AA'BB'AA'CC')
+ 3360tr(DD'CC'AA'BB'AA') + 3360tr(DD'AA'CC'AA'BB')
+ 3360tr(DD'BB'AA'CC'AA') + 6720tr[DD'(AA')2BB'CC']
+ 6720tr[(DD'CC'(AA')2BB'] + 6720tr[DD'BB'CC'(AA')2]
+ 6720tr(DD'AA'BB'CC'AA') + 420[tr(BB')tr(CC')
+ 2tr(BB'CC')]tr 2 (DD'AA') + 280[tr(AA')tr(CC')
+ 2tr(AA'CC')]tr(DD'AA')tr(DD'BB') + 280[tr(AA')tr(BB')
• 2tr(AA'BB')]tr(DD'AA')tr(DD'CC') + 140[tr 2 (AA')
+ 2tr(AA')2]tr(DD'BB')tr(DD'CC') + 560[tr(AA')tr(CC')
+ 2tr(AA'CC')]tr 2 (DD'AB') + 560[tr(AA'tr(BB')
+ 2tr(AA'BB')]tr 2 (DD'AC') + 280[tr 2 (AA')
+ 2tr(AA')2]tr 2 (DD'BC') + 1680tr(CC')tr(DD'AA')
• tr(DD:AB'BA') + 560tr(CC')tr(DD'AA')tr(DD'BA'AB')
+ 1680tr(BB')tr(DD'AA')tr(DD'AC'CA') + 560tr(BB')
• tr(DD'AA')tr(DD'CA'AC') + 560tr(AA')tr(DD'AA')
• tr(DD'BC'CB') + 560tr(AA')tr(DD'AA')tr(DD'CB'BC')
+ 560tr(CC')tr(DD'BB')tr[DD'(AA')2] + 560tr(AA')
• tr(DD'BB')tr(DD'AC'CA') + 560tr(AA')tr(DD'BB')
tr(DD'CA'AC') + 560tr(AA')tr(DD'CC')tr(DD'AB'BA')
+ 560tr(AA')tr(DD'CC')tr(DD'BA'AB') + 560tr(BB')
115
• tr(DD'CC')tr[DD'(AA')2] + 1120tr(CC')tr(DD'AB')
• tr(DD'AA'AB') + 1120tr(CC')tr(DD'AB')tr(DD'BA'AA')
+ 1120tr(BB')tr(DD'AC')tr(DD'AA'AC') + 1120tr(BB')
• tr(DD'AC')tr(DD'CA'AA') + 2240tr(AA')tr(DD'AB')
• tr(DD'BC'CA') + 2240tr(AA')tr(DD'BC')tr(DD'CA'AB')
+ 2240tr(AA')tr(DD'AC')tr(DD'AB'BC') + 2240tr(DD'AA')
• tr(DD'AB'BC'CA') + 2240tr(DD'AA')tr(DD'BC'CA'AB')
+ 2240tr(DD'AA')tr(DD'CA'AB'BC') + 1120tr(DD'BB')
• tr(DD'AA'AC'CA') + 1120tr(DD'BB')tr(DD'AC'CA'AA')
+ 1120tr(DD'BB')tr(DD'CA'AA'AC') + 1120tr(DD'CC')
• tr(DD'AA'AB'BA') + 1120tr(DD'CC')tr(DD'AB'BA'AA')
+ 1120tr(DD'CC')tr(DD'BA'AA'AB') + 2240tr(DD'AA')
• tr(DD'CC'AA'BB') + 2240tr(DD'AA')tr(DD'BB'AA'CC')
+ 4480tr(DD'AA')tr(DD'BB'CC'AA') + 4480tr(DD'AA')
• tr(DD'AA'BB'CC') + 2240tr(DD'BB')tr(DD'AA'CC'AA')
+ 4480tr(DD'BB')tr[DD'CC'(AA')2] + 2240tr(DD'CC')
• tr(DD'AA'BB'AA') + 4480tr(DD'CC')tr[DD'(AA')2BB']
+ 1120tr[DD'(AA')2]tr(DD'BC'CB') + 1120tr[DD'(AA')2]
•
tr(DD~CB'BC')
+ 1120tr(DD'AB'BA')tr(DD'AC'CA')
+ 1120tr(DD'AB'BA')tr(DD'CA'AC') + 1120tr(DD'BA'AB')
• tr(DD'AC'CA') + 1120tr(DD'BA'AB')tr(DD'CA'AC')
+ 2240tr(DD'AA'BB')tr(DD'AA'CC') + 2240tr(DD'BB'AA')
• tr(DD'CC'AA') + 4480tr[DD'(AA')2]tr(DD'BB'CC')
+ 4480tr(DD'AA'BB')tr(DD'CC'AA') + 168tr(CC')tr 2 (DD'AA')
• tr(DD'BB') + 168tr(BB')tr2 (DD'AA')tr(DD'CC')
+ 112tr(AA')tr(DD'AA')tr(DD'BB')tr(DD'CC') + 672tr(CC')
116
· tr(DD'AA')tr 2 (DD'AB') + 672tr(BB')tr(DD'AA')
· tr 2 (DD'AC') + 224tr(AA')tr(DD'AA')tr 2 (DD'BC')
+
224tr(AA')tr(DD'BB')tr 2 (DD'AC') + 224tr(AA')tr(DD'CC')
· tr 2 (DD'AB') + 896tr(AA')tr(DD'AB')tr(DD'BC')tr(DD'CA')
+ 1792tr 2 (DD'AA')tr(DD'BB'CC') + 336tr 2 (DD'AA')
• tr(DD'BC'CB') + 336tr 2 (DD'AA')tr(DD'CB'BC')
+ 672tr(DD'AA')tr(DD'BB')tr(DD'AC'CA') + 224tr(DD'AA')
• tr(DD'BB')tr(DD'CA'AC') + 896tr(DD'AA')
· tr(DD'BB')tr(DD'AA'CC') + 2688tr(DD'AA')tr(DD'BB')
• tr(DD'CC'AA') + 672tr(DD'AA')tr(DD'CC')tr(DD'AB'BA')
+ 224tr(DD'AA')tr(DD'CC')tr(DD'BA'AB') + 2688tr(DD'AA')
• tr(DD'CC')tr(DD'AA'BB') + 896tr(DD'AA')tr(DD'CC')
• tr(DD'BB'AA') + 2016tr(DD'BB')tr(DD'CC')
· tr[DD'(AA')2] + 896tr(DD'AA')tr(DD'AB')tr(DD'BC'CA')
+ 896tr(DD'AA')tr(DD'BC')tr(DD'CA'AB')896tr(DD'AA')
· tr(DD'AC')tr(DD'AB'BC') + 448tr(DD'BB')tr(DD'AC')
· tr(DD'AA'AC') + 448tr(DD'BB')tr(DD'AC')
• tr(DD'CA'AA') + 448tr(DD'CC')tr(DD'AB')tr(DD'AA'AB')
+ 448tr(DD'CC')tr(DD'AB')tr(DD'BA'AA') + 448tr 2 (DD'AB')
• tr(DD'AC'CA') + 448tr 2 (DD'AB')tr(DD'CA'AC')
+ 448tr 2 (DD'AC')tr(DD'AB'BA') + 448tr 2 (DD'AC')
• tr(DD'BA'AB') + 448tr 2 (DD'BC')tr[DD'(AA')2]
+ 816tr 2 (DD'AA')tr(DD'BB')tr(DD'CC') + 96tr 2 (DD'AA')
• tr 2 (DD'BC') + 192tr(DD'AA')tr(DD'BB')tr 2 (DD'AC')
+ 192tr(DD'AA')tr(DD'CC')tr 2 (DD'AB') + 256tr(DD'AA')
• tr(DD'AB')tr(DD'BC')tr(DD'CA') + 128tr 2 (DD'AB')
• tr 2 (DD'AC')}.
117
Proof of Lemma B.2
The results come from Lemma 2.1.2 by converting the polynomial, C,
to traces, isolating the S-1 terms, reconverting to invariant polynomials, finding the inverse Laplace transforms with respect to S, and
transforming the polynomials to traces.
Q.E.D.
Lemma B.3
For all complex h, the following asymptotic expansion holds:
(B.32)
log r(\n + h)
m
k Bk + 1 (h)
+ (\n + h - \)log(\n) - \n - L ( -2)
k
k(k+1)n
k=l
1
m
+ O(n- - ).
= log~
where B.(a) are Bernoulli numbers defined by the generating
J
functio.n
(B.33)
zeaz(e z - 1)-1
=
00
L
Izl < 2n.
j=O
Proof of Lemma B.3
The proof is given by Barnes (1899)
Q.E.D.
118
Lell1l1la 8.4
For any symmetric matrix Z,
(B.34)
II - n- 1ZI-(an+ p )
= etr(aZ){l + (2n)-1[atr(Z2) + 2ptr(Z)]
+ (24n)-2[3a 2tr 2 (Z2) + 8atr(Z3) + 12aptr(Z2)tr(Z)
+ 12ptr(Z2) + 12t3 2tr 2 (Z)] + 03}
Proof of Lell1l1la 8.4
It is well-known that
(B.35)
-loglI - n- 1ZI = ~ tr(Zj)/jn j + 0(n- r - 1).
j=l
From this relationship,
(B.36)
3
.
.
-(an + P)loglI - n- 1ZI = (an + P)[ I tr(ZJ)/jn J + 04]
j=l
3
.
'-1
= a I tr(ZJ)/jn J
j=l
2
.
.
+ P L tr(ZJ)/jn J + 0 3
j=l
= atr(Z) + a(2n)-1 tr (Z2) + a(3n 2)-1 tr (Z3)
+ pn- 1tr(Z) + p(2n 2 )-1 tr (Z2) + 0 3
= atr(Z)
+ (2n)-1[atr(Z2) + 2ptr(Z)]
+ (6n 2 )-1[2atr(Z3) + 3ptr(Z2)] + 03.
119
Exponentiation gives
(B.37)
II - n-1ZI-(an+ p )
= etr(aZ)exp{(2n)-l[atr(Z2)
+ 2ptr(Z)]
+ (6n 2)-l[2atr(Z3) + 3ptr(Z2)] + 03}
= etr(aZ){l
+ (2n)-l[atr(Z2)+2ptr(Z)] + (6n 2 )-l[2atr(Z3)
+ 3ptr(Z2)] + (8n 2 )-1[a 2tr 2 + (Z2) + 4aptr(Z2)tr(Z)
+ 4p2 tr2(Z)] + °3}
= etr(aZ){l
+ (2n)-1[atr(Z2)2ptr(Z)]
+ (24n 2 )-l[3a 2tr 2 (Z2) + 8atr(Z3) + 12aptr(Z2)tr(Z)
+
12~tr(Z2)
+
12~2tr2(Z)]
+ 03}.
Q.E.D.
Lemma 8.5
The following asymptotic expansion holds:
(B.38)
rr
[~(n+a)]/r
r
[~(n+~)]
= (~n)~(a-~)r{l
+
- ~nl(a-~)r(r-a-~+l)
(96n2)-1(a-~)r[3(a-~)r3
- 2{4 + 3(a-p)(a+p-l)}r 2
+ 3(a+p-l){4 + (a-p)(a+p-l)}r - 4{2(a 2 + ap + p2)
- 3(a+p)-1}] + 03}.
Proof of
L~mma
8.5
Let a(n)
= log{r r [\(n+a)]/r r [\(n+p)]},
then
r
(B.39)
a(n)
=L
~1
[log{r[\(n+a-k+l)]} -
log{r[~(n+~-k+l)]}].
120
The application of Lemma B.3 gives
r
(B.40)
a(n)
= I {log(~) + ~(n+a-k)log(~n) - ~n
k=1
+ n- 1 B2[~(a-k+l)] - 2(3n2)-lB3[~(a-k+l)] - log~
-
~(n+~-k)log(~n)
+
~n
-
n-1B2[~(~-k+l)]
+ 2(3n2)-lB3[~(~-k+l)] + 03}
r
= I {~(a-~)log(~n) +
k=1
n-1[B2{~(a-k+l)}
- B2{~(~-k+l)}] - 2(3n2)-1[B3{~(a-k+l)}
-
Since
B3{~(~-k+l)}]
+ Os}.
B2 [(a-k+l/2)] = [(a-k+l)/2]2 - 1/ 2 (a-k+l) + 1/ 6 and
B3 [(a-k+l)/2] = [(a-k+l)/2]3 - 3/ 2 [(a-k+l)/2]2
+ 1/ 2 [(a-k+l)/2],
r
then
I B2 [(a-k+l/2)] = 1/ 24 [2r 3 - 3(2a - l)r 2 + (6a 2 - 6a - l)r]
k=1
and
r
I
k=1
B3 [(a-k+l)/2] = _1/ 32 [r 4 + 2(1 - 2a)r 3 + (6a 2 - 6a - l)r 2
- 2(2a 3 - 4a 2 + a + l)r].
The substitution of these into Equation B.40 produces
(B.41)
a(n)
= ~(a-~)r log(~n) + (4n)-1[(~-a)r2
+ (a2_~2-a+~)r] - (24n2)-1[2(a-~)r3 - 3(a2_~2-a+~)r2
+ (2a 3 - 2~3 - 3a 2 + 3~2 + a - ~)r] + 03'
121
This implies that
(B.42) exp[a(n)]
= (~n)~(a-~)rexp{\n-l[(~-a)r2
-
(24n2)-1[2(a-~)r3
-
+ (a2_~2-a+~)r]
3(a2_~2-a+~)r2
+ (2a 3 - 2~3 - 3a 2 +3~2 + a - ~)r]
= (~n)~(a-~)r{l
+ (4n)-1[(~-a)r2 + (a2_~2-a+~)r]
(24n2)-1[2(a-~)r3
-
+ 03}
-
3(a2_~2-a+~)r2
+ (2a 3 - 2~3 - 3a 2 + 3~2 + a - ~)r]
+
(32n2)-1[(~-a)2r4
+
(a2_~2-a+~)2r2]
= (~n)~(a-~)r{l
+
2(~-a)(a2_~2-a+~)r3
+ 03}
+ (4n)-1(~-a)r(r-a-~+I)
+ (96n2)-1(a-~)r[3(a-~)r3 - 2{4 + 3(a-~)(a+~-1)}r2
+
3(a+~-I){4
-
3(a+~)
+
(a-~)(a+~-I)}r
-
4{2(a2+a~+~2)
- I}] + 03}
Q.E.D.
Lemma B.6
If a,
~,
y, and 6.are complex numbers such that Re(a),Re(y) 1 0 and
r
(B.43)
= n r[a
(a)
K
j=1
+ k. -
~(j-I)]/r[a
-
~(j-I)]
J
r
where K is the r partitions of k such that I k.
j=1 J
=k
then
122
(B.44)
(an +
~)
K
/(yn + 6)
K
+ (2n)-1[2(a-l~ - y- 16)k
= (a/y)k{l
+ (a-1_y-l)al(K)] + (24n 2 )-1[{a- 2 - y-2 - 24y-16
- y- 16)}k +
•
(a-l~
-
12(a-2~
- y- 16)2k(k-l)
12(a-l~
- y-2 o)al(K) + 3(a- 1_y-l)aY(K)
- (a- 2_y-2)a2(K)]} + 0 3
r
where
=I
al(K)
k.(k.-j)
j=l J J
and
r
a2(K)
=I
k.(4k. 2
j=l J
J
6k.j + 3j2).
J
Proof of Lemma 8.6
The proof of this asymptotic expansion is similar to the one for
Lemma B.S.
The following expressions are needed:
r
(B.45)
I B2 [a + k. - (j-l)/2]
j=l
J
= 1/24[2r 3
+ 3(1 - 4a)r 2 + (24a 2
-
12a - l)r
+ 48ak + 24al(K)] and
r
I B3 [a + k. ~ (j-l)/2]
j=l
J
= _1/32[r 4
+ 2(1 - 4a)r 3 + (24a 2 - 12a - 1)r 2
- 2(16a 3 - 12a 2 - 2a + l)r + 8(1 - 12a 2 )k
Q.E.D.
123
Lemma B.7
If U is any complex rXr matrix, then
(B.46)
for large n.
Proof of Lemma B.7
The left-hand side can be written as I - n-1U(I + n-1U)-1.
This relationship is applied recursively to obtain the result.
Q.E.D.
Lemma B.8
Let Z
B
= A(I
= (I-A)-l,
+ n-1U)-1 where A and U are complex rXr matrices, let
and let e be any complex scalar, then the following asympto-
tic expansions hold:
(B.47)
l l (e)KCK(Z)/(k-l)!
k=l K
~
(B.48)
l l (e)Ka1(K)CK(Z)/k!
k=O K
= ~eIBle{tr2(AB)
+ (2e + 1)tr(AB)2 - n- 1 [e{tr 2 (AB)
+ (2e+l)tr(AB)2}tr(ABU) + 2tr(AB)tr(ABUB)
+ 2(2e + 1)tr{(AB)2UB}] + 02},
(B.49)
l lK (e)K CK(Z)/(k-2)!
k=2
124
00
(B.50)
L L(g) Kal(K)C K (Z)/k!
k=O K
g
= \g/Bl {[2(2g + 1)tr 2 (AB) + 2(2g + 3)tr(AB)2
+ 4tr 3 (AB) + 12(2g + 1)tr(AB)tr(AB)2
+ 8(2e 2 + 3g + 2)tr(AB)3
+ g·tr 4 (AB) + 2(2g 2 + g + 2)tr 2 (AB)tr(AB)2
+ (2g + 1)(2g 2 + g + 2)tr 2 (AB)2 + 8(2g + 1)
• tr(AB)tr(AB)3 + 2(8g 2 +
10e +
5)tr(AB)4] + all, and
00
(B.51)
L L (g) a2(K)C (Z)/k!
k=O K K K
= ~eIBlg{[2tr(AB) + 3(2e + 1)tr 2 (AB) + 3(2e + 3)tr(AB)2
+ 2tr 3 (AB) + 6(2e + 1)tr(AB)tr(AB)2 + 4(2g 2 + 3e + 2)
• tr(AB)3] + all.
Proof of Lemma B.8
The basic relationships are given in Lemma 3 of Fujikoshi (1970).
The expansions follow from Lemmas B.4, B.6, and B.7.
v
Let Z = A(1
= 2(1-2)-1 then
(B.52)
(B.53)
V2 = (AB)2 - n- 1 [(AB)2UB + ABUBAB] + n- 2 [ABA(BU)2B
+ A(BU)2BAB + (ABUB)2] + 03
(B.54)
V3
= (AB)3
- n- 1 [(AB)3UB + ABUB(AB)2
+ (AB)2UBAB] + 02,
(B.55)
V4 = (AB)4 - n- 1 [(AB)4UB + (AB)2UB(AB)2
+ (AB)3UBAB + ABUB(AB)3] + O2 ,
125
(B.56)
tr(V)
(B.57)
tr 2 (V)
(B.58)
tr(V2)
(B.59)
tr 3 (V)
(B.60)
tr(V)tr(V2)
(B.61)
tr(V 3 )
(B.62)
tr 4 (V)
(B.63)
tr 2 (V)tr(V 2 )
(B.64)
tr 2 (V 2 )
(B.6S)
tr(V)tr(V3)
(B.66)
tr(V 4 )
g
/1-21-
(B.67)
= tr(AB) = tr 2 (AB)
n- 1 tr(ABUB) + 02,
2n- 1 tr(AB)tr(ABUB) + 02
= tr(AB)2 2n- 1 tr[(AB)2UB]
= tr 3 (AB) + 01,
= tr(AB)tr(AB)2 + 01'
= tr(AB)3 + 0 1
= tr 4 (AB) + 0 1
= tr 2 (AB)tr(AB)2
= tr 2 (AB)2
+ 02'
+ 01,
+ 01
= tr(AB)tr(AB)3
+ 01'
= tr(AB)4 + 01' and
= IBlg{l - gn-1tr(ABU)
+ \gn- 2 [gtr 2 (ABU)
- tr(U2) + tr(BU)2] + 03}.
The results are produced by substituting the above expressions.
Q.E.D.
Lemma B.9
Let a,
~,
y, 0, and g be complex scalars such that Re(a),Re(y) 1 0,
let A and U be complex rXr matrices, and let B
= (I
- ay-1A)-1; then, if
g is a negative integer or the absolute value of the largest root of
ay-lA(1 + n-1U)-1 is less than one,
(B.68)
2Fl[g,an+~;yn+o;A(1 +
= IBI£{l
n-1U)-1]
+ (4n)-1£[4tr(U) - 4tr(BU) + 4ay-l(a- 1p - y-1o)
• tr(AB) + a 2y-2(a- 1_y-l){tr 2 (AB) + (2g + 1)tr(AB)2}]
+ (96n 2 )-lg[-96ay-2 0 (a- 1p - y-1 0 )tr(AB)
+
6a2y-2{8£(a-l~
- y- 1o)2 -
4(a-2~
- y-2 0 )
- (2£ + 1)(a- 1_y-l)(a- 1+y-l-l)}tr 2 (AB)
126
+
6a2y-2{8(a-1~
- (2e
- y- 10)2 - 4(2e +
1)(a-2~ -
y-2 0 )
+ 3)(a- 1_y-1)(a- 1+y-1-1)}tr(AB)2
- 4a 3y-3(a- 1_y-1)(a- 1+y-1-3){tr 3 (AB) + 3(2e + 1)
. tr(AB)tr(AB)2 + 2(2e 2 + 3e + 2)tr(AB)3}
+ 3a 4y-4(a- 1_y-1){e-tr 4 (AB) + 2(2e 2 + e + 2)
- tr 2 (AB)tr(AB)2 + (2e + 1)(2e 2 + e + 2)tr 2 (AB)2
+ 8(2e + 1)tr(AB)tr(AB)3 + 2(8e 2 + IOe + S)tr(AB)4}
-
48ay-l{2(a-l~
_1
- y 0) + ay-l(a-1_y-1)
- tr(AB}tr(ABUB) -
24a2y-2{4e(a-1~
- y-o)tr(AB)
+ ay-1e(a-1_y-1) [tr 2 (AB) + (2e + l)tr(AB)2]}tr(ABU)
- 48a- 2y-2(2e + 1)(a- 1_y-l)tr{(AB)2UB} + 48e·tr 2 (U)
+ 48e-tr 2 (BU) - 96e-tr(U)tr(BU) - 48tr(U)2 + 48tr(BU)2]
Proof of Lemma B.9
By definition,
OIl
=k=O
I
I (e) (an+~) C [A(I + n-1U)-1]/(yn+o) k!
KKK K
K
The right-hand side of Equation B.68 can be expanded in two stages, by
applying Lemma B.6 first and then by applying Lemma B.8.
Q.E.D.
127
Lezmna B.10
The following asymptotic expansion holds for large n:
(B.70)
[(an+l3) ]-1
K
= (an)-k{l - (2an)-1[2I3k + a1(K)]
+ (24a 2n 2 )-1[(1213 2 - 1)k + 1213 2k2 + 1213(k+l)a1(K)
Re(a)tO
Proof of Lemma B.10
From the definition of the generalized hypergeometric coefficient,
(B.7l)
r
k.
= n
nJ[an + 13 - (j-l)/2 + £ - 1]-1
j=l £=1
r
= n (an)
j=1
-k
k.
j nJ {1 + (2an)-1[213 - j + 2£ - 1]}-1.
£=1
The binomial expansion gives
(B.72)
=
r
n (an)
j=1
-k
k.
j
nJ {1 - (2an)-1[213 - j + 2£ - 1]
£=1
+ (2an)-2[413 2 + j2 + 4£2 - 413j + 813£ - 413 - 4j£
+ 2j - 4£ + 1] + 03}
128
This relationship can be rewritten as
(B.73)
k.
r
= exp{
L [-k.log(an) + LJ log{1 - (2an)-1
j=l
•
[2~
Q=l
J
- j + 2Q - 1] + (2an)-2
. [4~2 + j2 + 4Q2 - 4~j + 8~Q - 4~ - 4jQ
+ 2j - 4Q + 1] + Og}]}.
Expansion of the logarithmic term allows
(B.74)
= (an)
-k
r
exp[ L
k.
L J {-(2an)-1[2~ - j + 2! - 1]
j=l Q=l
+ (8a2n2)-1[4~2 + j2 + 4!2 - 4~j + 8~! - 4~ - 4jQ
+ 2j - 4Q + 1] + 03}],
which simplifies to give the desired result.
Q.E.D.
APPENDIX C
SELECTED PROOFS FROM CHAPTER 2
The proofs of Theorems 2.3.1, 2.3.2, and 2.3.3 and of Lemma 2.3.1
are presented below.
Proof of Theorem 2.3.1
The asymptotic expansion of the characteristic function given in
Theorem 2.2.1 can be obtained by expanding the individual terms and
taking the product of the expansions.
The first term can be expanded
using Lemma B.s and the binomial expansion giving
n1-lj
(C.l)
~imrt
[ -n2-1
.
r r [~(nl + m - 1 - ~mt)]r r [~(n2 + m - 1 + imt)]
r
=1
r
[~(nl-l)]r [~(n2-l)][~(nl-l)(n2-l)]
~r
r
- (4nl)-lmr[r-m+l + mt 2 - i(r - 2m + l)t]
-(4n2)-lmr[r-m+l + mt 2 - i(r - 2m + l)t]
+ (16nln2)-lm2r2[(r-m+1)2 - {r 2 - 2(m-1)r + 2m 2
- 2m + 1}t 2 + m2 t 4 ] + (96ni 2 )-lmr[3mr 3 - 2(3m 2
- 3m + 4)r 2 + 3(m 3 - 2m 2 + sm - 12)r - 4(2m 2 - 9m
+ 5) - 3m{r 3 - 2(3m-1)r 2 + (6m 2 - 6m + s)r
- 4(2m - 3)}t 2 + 3m 3 rt 4
-
2i{[3mr 3
-
(9m 2 - 6m + 4)r 2
+ 3(2m3 - 3m 2 + Sm - 6)r - 2(6m 2 - 18m + S)]t
+ m2 [3r 2 - 3(2m - l)r + 4]t 3 }] + (96n22)-lmr[3mr3
- 2(3m 2 - 3m + 4)r 2 + 3(m3
-
2m2 + Sm - 12)r
130
- 4(2m 2 - 9m + 5) - 3m{r 3 - 2(3m - 1)r 2 + (6m 2 - 6m
+ S)r - 4(2m - 3)}t 2 + 3m 3 rt 4 + 2i{[3mr 3
-
(9m 2 - 6m
+ 4)r 2 + 3(2m 3 - 3m 2 + Sm - 6)r - 2(6m 2 - 18m + 5)]t
+ m2 [3r 2 - 3(2m - l)r + 4]t 3 }] + 03
The second term under consideration can be expanded using Lemma 8.4 to
produce
(C.2)
II + 2(nl_l)-lA'ZZ'A,-\(n l +m-l-imt)
II + 2(n2_l)-lB'ZZ'BI-\(n2+m-l+imt)
= etr(-ZZ'){l
- nil[m(l - it)tr(A'ZZ'A) - tr(A'ZZ'A)2]
- n2 l [m(1 + it)tr(B'ZZ'B) - tr(B'ZZ'B)2]
+ (nln2)-l[m 2 (1+t 2 )tr(A'ZZ'A)tr(B'ZZ'B)
- m(l - it)tr(A'ZZ'A)tr(B'ZZ'B)2
- m(l + it)tr(A'ZZ'A)2 tr (B'ZZ'B) + tr(A'ZZ'A)2
• tr(B'ZZ'B)2] - (6ni 2 )-1[6m(1 - it)tr(A'ZZ'A)
- 3m 2 (1 - it)2 tr2(A'ZZ'A) - 6(m + 1 - imt)tr(A'ZZ'A)2
+ 6m(1 - it)tr(A'ZZ'A)tr(A'ZZ'A)2 + 8tr(A'ZZ'A)3
- 3tr 2 (A'ZZ'A)2] - (6n22)-l[6m(1 + it)tr(B'ZZ'B)
- 3m2 (1 + it)2 tr2(B'ZZ'B) - 6(m + 1 + imt)tr(B'ZZ'B)2
+ 6m(1 + it)tr(B'ZZ'B)tr(B'ZZ'B)2 + 8tr(B'ZZ'B)3
- 3tr 2 (B'ZZ'B)2]} + 03
The last term to be expanded is
(C.3)
etr(-ini1tA'ZZ'A + in21tB'ZZ'B)
= 1 - ni1(it)tr(A'ZZ'A) + n2- l (it)tr(B'ZZ'B)
+ (nln2)-lt 2tr(A'ZZ'A)tr(B'ZZ'B) - \ni 2t 2tr 2 (A'ZZ'A)
\n22t2tr2(B'ZZ'B) + 0 3
131
Let
(C.4)
g(Z)
= II
+ 2(nl_1)-lA'ZZ'AI-~(nl+m-1-imt)
· II + 2(n2-1)-lB'ZZ'BI-~(n2+m-1+imt)etr(-iniltA'ZZ'A
+ in21tB'ZZ'B),
then
(C.5)
g(Z)
= etr(-ZZ'){1
- ni1[{m - i(m-1)t}tr(A'ZZ'A)
- tr(A'ZZ'A)2} - n2 1 [{m + i(m-l)t}tr(B'ZZ'B)
- tr(B'ZZ'B)2} + (nln2)-1[{m 2 + (m-l)2t 2 }
· tr(A'ZZ'A)tr(B'ZZ'B) - {m - i(m - l)t}tr(A'ZZ'A)
• tr(B'ZZ'B)2 - {m + i(m-l)t}tr(A'ZZ'A)2tr (B'ZZ'B)
+ tr(A'ZZ'A)2 tr (B'ZZ'B)2} - (6ni 2 )-1[6m(1 - it)
• tr(A'ZZ'A) - 3{m2 - (m-l)2 t 2 - 2im(m-l)t}tr 2 (A'ZZ'A)
- 6(m + 1 - imt)tr(A'ZZ'A)2 + 6{m - i(m-l)t}
• tr(A'ZZ'A)tr(A'ZZ'A)2 + 8tr(A'ZZ'A)3 - 3tr 2 (A'ZZ'A)2]
- (6n22)-1[6m(1 + it)tr(B'ZZ'B) - 3{m 2 - (m-l)2 t 2
+ 2im(m-l)t}tr 2 (B'ZZ'B) - 6(m + 1 + imt)tr(B'ZZ'B)2
+ 6{m + i(m-l)t}tr(B'ZZ'B)tr(B'ZZ'B)2 + 8tr(B'ZZ'B)3
- 3tr 2 (B'ZZ'B)2}} + 0 3
Lemma B.l implies that.
(C.6)
n-mrf g(Z)etr(-itAZZ' +
~rZ') dZ
- 2P(2){\JIt(I + it~)-\r;(I + ita)-~A}
+ P(12){~(I + it~)-\r;CI + it~)-~A}}
-
~n21[2{m
+ i(m-l)t}P(I){\JIt(I +
it~)-\r;
132
(I + it~)-\B}] + \(01 0 2)-1
2
· L
2
L L L
k=1 Q=1 K A
L
$~K'A
a~,A
~
· P;,A[\J!t(1 + it~)-\r;(1 + it~)-\A;(1 + it~)-~]
4
- ( 240 12 )-1 L I b P [\J!tC1 + it~)-\r;(I + it~)-\A]
k=1 K 1K K
4
- (24022)-1 I I b
2K
~1 K
• PK[\JIt(1 + it~)-\r;(1 + it~)-~) + 03}
where
a(~) = ;
ct)
= 4mr{m(r-m+l) + (m-l)(r-m-l)t 2 + i[(r-m 2+m-I)t
- m(m-l)t 3 ]},
_
2
- a 02 - -2a1 02
a 20
(2)- (2) (1 )
= - 4mr[r - m + 1 + mt 2 + i(r-2m+l)t),
= _2a l123 - -a21
(1 ) - (3)
- -a21
- (21)
- 2-a 121
- - (21)
133
= a 22
(31)
b 1 (O)
--
a 22
(2 2 )
2
21
- 2a
. (31)
--
= -b 2 (O) = mr[3mr 3
--
2(3m 2 - 3m + 4)r 2 + 3(m 3 - 2m 2 + 5m - 12)r
- 4(2m 2 - 9m + 5) - 3m{r 3 - 2(3m-1)r 2 + (6m 2 - 6m
+ 5)r - 4(2m - 3)}t 2 + 3m 3 rt 4 - 2i{[3mr 3 - (9m 2
-
- 6m + 4)r 2 + 3(2m 3
-
- 18m + 5)]t + m2 [3r 2
~
b 1 (1)
= b 2 (1) = 24m{mr 2
-
b 1 (2)
= b 2(Z) = -6{mr 2
-
b 1 (1 2 )
=-b Z(1 2) = 1Z{mr 2
bl(3)
= b 2 (3) = -3Z[3m
bl(21)
= b Z(Zl) = -16[m
- 2 - i(m-1)t],
b 1 (I3)
= bZ(I3) = 16[3m
- 2 - Zi(m-l)t],
3m 2 + 5m - 6)r - 2(6m2
-
3(2m-1)r + 4]t 3 }]t
(m-1)r - 4 + r[(m-1)r
m2 + 3m - 1]t 2
- i[{(2m - 1)r 2 -(m 2 - 3m + l)r - 4}t + m(m-1)rt 3 ]}t
m(m-1)r - 8(m 2 + 2m + Z)+[m 2 r + 8(m-1)2]t 2
- im[r 2 - (Zm - l)r - 16(m-2)]t}t
m(m-1)r + 4m 2 - m - 1 + [m 2r
- im[r 2 - (2m-1)r + (Zm-3)]t}.
~
~
-
4(m-1)2]t 2
+ 4 - 3i(m-1)t],
= 48,
and
a is the complex conjugate of a.
~
The characteristic function is ob-
tained by multiplying the right-hand sides of Equations C.1 and C.6.
Q.E.D.
134
Proof of Theorem 2.3.2
The first two terms of the characteristic function given by
Equation can 2.2.20 be expanded in a manner similar to those in Theorem
2.3.1 using Lemmas B.4 and B.S to obtain
(C.7)
r [\(n+2m-2)]/r [\(n-2)][\(n_2)]mr
r
r
=1
- (2n)-lmr(r-2m+I) + (24n- 2 )-lmr[3mr 3
- 2(6m 2 - 3m - 2)r 2 + 3(4m 3
-
4m 2 + Sm
- IO)r - 2(8m 2 - 30m + 11)] + 0 3
and
(C.8) II + 2(n-2)-lA'ZZ'A + 2(n-2)-1!?!~\a'ZZ'B!~\!?I-\(nl-l)
II + 2(n-2)-1!~!~\A'ZZ'A!~\!~
= etr(-n1Sl
+ 2(n-2)-lB'ZZ'BI-\(n2+ 2m - 1)
- n2 S2){1 - n- 1 [(nl-n2)tr(Sl)
+ (2m- n l+n2)tr(S2) - nltr(Sl)2 - n2tr(S2)2]
- (6n 2 )-1[I2(nl-n2)tr(Sl) + 12(2m-n t+n2)tr(S2)
- 6(nt- n 2)(2m- n t+n2)tr(St)tr(S2) - 3(nt-n2)2 tr2(St)
- 3(2m-n t+n2)2 tr2(S2) - 6(4nl-l)tr(St)2 - 6(2m
+ 4n2 - I)tr(S2)2 + 6nt(nl-n2)tr(St)tr(St)2
+ 6n2(2m-nt+n2)tr(S2)tr(S2)2 + 8nltr(St)3
+ 8n2~r(S2)3 + 6nt(2m-nt+n2)tr(St)2tr(S2)
+ 6n2(nt-n2)tr(St)tr(S2)2 - 3n¥tr 2 (St)2 - 3n~tr2(S2)2
- 6ntn2tr(St)2tr(S2)2] + °3}
where
and
13S
The expansion of the hypergeometric term comes directly from Lemma B.9
and is
1
(C.9)
1
2Fl[-m,~(nl-l);~(n-~);(I - ltL21Lt){I + 2(n-2)-lSl}-1]
= In2I
+ nlLll211{1 - (2n)-lm[2(nl-n2)tr(O)nln2{tr 2 (O)
- (2m-l)tr(O)2} - 4nltr(OSl)] - (48n- 2 )-lm[96(nl-n2)
• tr(Q) - 6{4m(nl-n2)2 + 4(nl-n2) + (2m-9)nln2
+ 2(2m-l)n2}tr 2 (O) + 6{4(nl-n2)2 + 4(2m-l)(nl-n2)
- (14m-5)nln2 + 2(2m-3)n2}tr(Q)2 - 4nln2(2-nl){tr3(Q)
- 3(2m-l)tr(Q)tr(O)2 + 2(2m 2 - 3m + 2)tr(Q)3}
- 3n13n2{mtr4(Q) - 2(2m 2 - m + 2)tr 2 (Q)tr(Q2)
+ (2m-l)(2m 2 - m + 2)tr 2 (Q2) + 8(2m-l)tr(Q)tr(03)
- 2(8m 2 - 10m + 5)tr(Q4)} - 192nltr(QSl) - 96{(nl-n2)
+ nln2tr(Q)}tr(OOSl) + 24{ 4JD1t l(nl-n2)tr(Q)
+ 2n¥n2[tr 2 (Q) - (2m-1)tr(Q)2]}tr(OSl)
+ 96(2m-1)nln2tr(eo2S1) - 96JD1t¥tr2(QSl) - 96tr(Sl)2
+ 96tr(6S 1)2] + °3}
where
and
Let
(C.10)
g(Z)
= II
+ 2(n-2)-lSll-~(nl-1)II + 2(n_2)-lS21-~(n2+2m-l)
'2Fl{-m,~(nl-1);~(n-2);(I - I?I 2 1I?)[I + 2(n-2)-lSl]-1},
136
then from Equations C.8 and C.9
- n2)tr(Q) + mnln2hm(Q) + 2(nl-n2)tr(Sl) + 2(2m-nl+n2)
. tr(S2) - 4mnltr(QS1) - 2nltr(Sl)2 - 2n2tr(S2)2]
• (48n 2 )-l[96m(nl-n2)tr(Q) - 6m{4m(nl-n2)2 + 4(nl-n2)
+ (2m-9)nln2 + 2(2m-l)n2}tr 2 (Q) + 6m{4(nl-n2)2
+ 4(2m-I)(nl-n2) - (14m-S)nln2 + 2(2m-3)n2}tr(Q2)
- 4mnln2(2-nl){tr3(Q) - 3(2m-l)tr(Q)tr(Q2) + 2(2m 2
- 3m + 2)tr(Q3)} - 3mn~n2{mtr4(Q) - 2(2m 2 - m
+ 2)tr 2 (Q)tr(Q2) + (2m-I)(2m 2 - m + 2)tr 2 (Q2) + 8(2m
- 1)tr(Q)tr(Q3) - 2(8m 2 - 10m + S)tr(Q4)} + 24(nl-n2){4
- 2m(nl-n2)tr(Q) - mnln2hm (Q)}tr(Sl) + 24(2m-nl+n2){4- 2m(nl-n2)tr(Q) - mnln2hm(Q)}tr(S2) - 48(nl-n2)(2m-nl
+ n2)tr(Sl)tr(S2) - 24(nl-n2)2tr 2 (Sl) - 24(2m-nl+n2)2
• tr 2 (S2) - 24{2(2m+4n l-l) - 2mnl(nl-n2)tr(Q)
- mn¥n2 hm(Q)}tr(Sl)2 - 24{2(2m+4n2-l) - 2mn2(nl-n2)
• tr(Q) - mntn~hm(Q)}tr(S2)2 - 48mnt{4-2m(nt-n2)tr(Q)
- nt n 2hm(Q)}tr(QSt) - 96m{(nt-n2) + nln2 tr (Q)}tr(eoS 1)
+ 96m(2m-l)ntn2tr(eo2St) - 96m2n¥tr2(QSt) + 96mtr(eS t )2
+ 96mnt(nt-n2)tr(St)tr(QSt) + 96mnt(2m-nt+n2)tr(S2)
• tr(QSt) + 48nt(nt-n2)tr(St)tr(St)2 + 48n2(2m-nt+n2)
• tr(S2)tr(S2)2
+ 64nltr(St)3 + 64n2tr(S2)3
- 96n¥tr(QSt)tr(St)2 + 96ntn2tr(QSt)tr(S2)2
+ 48nt(2m-nt + n2)tr(St)2 tr (S2) + 48n2(nt-n2)tr(St)
tr(S2)2 - 24n¥tr 2 (St)2 - 24n~tr2(S2)2
- 48ntn2tr(St)2tr(S2)2] + °a}
137
where
h (Q)
m
Let U1
1
V2
= A'ZZ'A,
_1
1
= tr 2 (Q)
_1
U2
= ~~~l~
_1
1
- (2m-l)tr(Q2).
_1
1
A'ZZ'A~l~~~ , V1
= B'ZZ'B,
and
= ~f~2~B'ZZ'B~2~~f
then
(C.12)
(C.13)
(C.14)
tr(8QS1)
(C. 15)
tr(8Q 2S1)
(C.16)
tr 2 (Sl)
(C.I7)
tr(Sl)2
(C.18)
tr2(QS1)
= tr(8QU 1) + tr( 8QV2)
= tr(S0 2U1) + tr(S0 2V2 )
= tr 2 (U 1) + 2tr(U 1)tr(V 2 ) + tr 2 (V 2),
= tr(U 1)2 + 2tr(U1V2) + tr(V 2 )2,
= tr 2 (QU 1) + 2tr(QU 1)tr(QV 2) + tr 2 (QV 2 )
(C.19)
tr(6S 1)2
= tr(6U 1)2
(C.20)
tr(Sl)tr(S2)
= tr(U 1)tr(U 2)
(C.21)
tr(Sl)tr(QS1)
= tr(U1)tr(QU 1)
+ 2tr(6UleV2) + tr(eV 2 )2
+ tr(U 1)tr(V 1) + tr(U 2 )tr(V 2 )
+ tr(U 1)tr(QV 2 ) + tr(V 2 )tr(QU 1 )
+ tr(V 2 )tr(QV 2 ),
(C.22)
tr(S2)tr(QS1)
= tr(U 2 )tr(QU 1)
+ tr(U 2 )tr(QV 2 ) + tr(V 1)tr(QU 1)
+ tr(V 1)tr(QV 2 ),
(C.23)
tr(Sl)tr(Sl)2
= tr(U 1)tr(U 1)2
+ 2tr(U 1 )tr(U 1V2 )
+ tr(U 1)tr(V 2 )2 + tr(V2)tr(U 1)2
+ 2tr(V 2 )tr(U 1V2 ) + tr(V 2 )tr(V 2 )2,
(C.24)
tr(Sl)3
(C.25)
tr(QS1)tr(Sl)2
= tr(U 1)3
+ 3tr(U¥V2) + 3tr(U1V~) + tr(V2)3,
= tr(QU 1)tr(U 1)2
+ 2tr(QU 1)tr(U 1V2 )
+ tr(QU 1)tr(V 2 )2 + tr(QV2)tr(U 1)2
+ 2tr(QV 2 )tr(U 1V2 ) + tr(QV 2)tr(V 2 )2,
138
(C.26)
tr(QS1)tr(S2)2
= tr(QU 1)tr(U 2)2
+ 2tr(QU 1)tr(U 2V1)
+ tr(QU 1)tr(V 1)2 + tr(QV 2 )tr(U 2 )2
+ 2tr(QV2)tr(U2V1) + tr(QV 2 )tr(V 1)2,
(C.27)
= tr(U 1)tr(U 2 )2
tr(Sl)tr(S2)2
+ 2tr(U 1)tr(U 2V1) + tr(U 1)tr(V 1)2
+ tr(V2)tr(U 2 )2 + 2tr(V 2 )tr(U 2V1)
+ tr(V 2 )tr(V 1)2,
= tr 2 (U 1)2
tr 2 (Sl)2
(C.28)
+ 4tr 2 (U 1V2 ) + tr 2 (V 2 )2
+ 4tr(Ul)2tr(U1V2) + 4tr(V2)2tr(U1V2)
+ 2tr(U 12 )tr(V 2 )2,
(C.29)
= tr(U 1)2 tr (U 2 )2
tr(Sl)2 tr (S2)2
+ tr(U 1)2 tr (Vl)2
+ tr(U 2 )2 tr (V 2 )2 + tr(V 1)2 tr (V 2 )2
+ 2tr(Ul)2tr(U2Vl) + 2tr(U2)2tr(U1V2)
+ 2tr(Vl)2tr(U1V2) + 2tr(V 2 )2tr(U 2Vl )
and the remaining terms are obtained by interchanging the subscripts in
the above equations.
With these equations, Lemma B.1 can be used to
produce the following relationship:
(C.30)
n- mr
f
g(Z)etr(-it~Z' + ~rZ')dZ
+ 2(2m-n +n
1
2
){p010000 + p001000 - 4mn {p000010
(1)
(1)
1
(1)
+ p000001} _ n {2p200000 _ p120000 + 4pl00100
(1)
1
(2)
(2)
(2)
2
"{2p020000
_ 2pl00100 + 2p000200 _ pOOOl OO}
2
(1 )
(}2)
(2)
- "2
(2)
139
2
_ pOl 0000 + 4pOllOOO _ 2pOllOOO + 2p002000
(2)
(2)
(2)
(1 2 )
2
_ pOOl OOO}]
(I2)
4
.[
L
Kl,· •• ,KG
1
1
+ P(1)[T;B2(eo)~]} + 96m(2m-1)nln2{p(1)[T;Al(eo2)~]
where
Al
A2
Bl
= ('I'
= ('I'
= ('I'
+ it~)-\A,
+ it~) -\U~\L~,
+ it~)-~,
B2 = ('I' +
B3
h (0)
m
it~)
-~ L2-\II'
\
2 I ,
= ('I' + it~)-\BI\I-\
= tr 2 (0) - (2m-I) tr CO 2) ,
140
The result is obtained by multiplying Equations C.7 and C.30.
coefficients a, become
- 2m(4m 2
-
4m + l)r - 14m + 5}nln2
- 3(2m-1)tr(Q)tr(Q2) + 2(2m 2
- 2(8m 2
alOOOOO
= a000100
(1)
a010000
= a001000
(1)
(1)
(1)
-
10m + 5)tr(Q4)},
-
3m + 2)tr(Q3)}
The
141
a000010
(1)
= a000001
(1)
a200000
(2)
2
a1 ~OOOO
=
a 020000
= a002000
(2)
a000020
= a000002
(2)
(1 )
(2)
(2)
al000l0
(2)
-
-
L
~
a 011000
(2)
= al0g0lo
= al0000l
(1 )
(2)
= a000101
= aoogl0l
(2)
(1 )
2
= aOOOOOl
(12)
142
a010010
(2)
= a010010= a010001 = a010001 =
(1 2 )
(1 2 )
(2)
= aOO~010 = a001001 = aOO~OOl
(1 )
(2)
(1 )
= -96mn1(2m-n1+n2) ,
a 300000 = a000300 -
(3)
(3)
-
1/
a200100
3
a2100000
(21)
= 1/
3
2
a1 00100 =
(21)
3
a13goooo = a0001 00 = 1/
(I )
a030000
(3)
a0210000
(21)
(I3)
2
3
a1 00100 =
(I3)
a001010
(2)
,
143
a200010
(3)
2
= a200010 = -2a1 00010 =
(21)
(21)
2
= a200001
(21)
2
= a000210 = a000210
= -2a1 00001 = -2a1 00010
(21)
(1 3 )
(3)
(21)
2
= _2a0001 10 =
(21)
= a000201 = a000201
(3)
(21)
2
= _2a0001 01 =
(21)
-- - a100110
(I3)
--
1/
3
1/ 2 a 1(03 0)101 = 1/
3
a100110
(21)
a100101
(21)
= 96n~,
a020010
(3)
2
a020001
(3)
= a020010 = -2a01 0010
(21)
(21)
= a020001
(21)
2
= -2a01 0001 =
(21)
2
2
2
2
= _2a001 010 = -2aO(01!)010 = a002001 = a002001
(3)
(21)
(21)
= _2a001 001 = _2a001 001 = 1/ aOll0l0 = 1/ aOll0l0
(1 3 )
(3)
3
(1 3 )
2
(21)
1/
=
a120000
(3)
2
a011001
(3)
96n1n2,
2
2
0000 - a102000
= a120000 = -2a11 0000 = -2a11
(I3)
- (3)
(21)'
(21)
2
2
000
= al02000 = _2a101 000 = _2a101
(I3)
(21)
(21)
=a020100
(3)
=a020100
(21)
2
2
0100 - a002100
= -2a01 0100 = -2a01
(I3)
- (3)
(21)
=
a002100
(21)
= 1/ 3
2
2
100
= _2a001
-2aOO!
= (I ) 100
(21)
1
a111000 = -a 11 000
(21)
(I )
= _a 01 pOO
(I )
= 1/ 2
-
aOl1100
(3)
1/
2
a111000
(3)
= 1/ 3
a011100
(21)
144
a210000
(3)
2
2
= a210000
=
-2a1 10000 = -2a1 §OOOO = a201000 = a201000
(21)
(21)
(1 )
(3)
(21)
2
= -2a1(21)
01000
2
-2a1 01000
(1 3 )
2
= a010200
=
a010200 = _2a0101 00
(3)
(21)
(21)
2
= -2a 0101
00 = a001200 = a012000
(13)
(3)
(21)
= 1/ 2
a400000
(4)
a110100
(3)
= _6a310000
(31)
-
-
= 1/ 3
12/
7
2
_2a0011 00
(21)
=
= _a110100
= 1/ 2
(1 3 )
a110100
(21)
..
a101100
(3)
4
= -6a21 2 ~OOOO = 4a1(14)
00000
(21 )
a2200000
(2 2 )
2
= _6a00021
00
2
(21 )
4
= 4a0001
00
4
(1 )
= a1300100
(1 4 )
-
-
1/
1/
6
4
a300100 (4)
-
a200200 -
(4)
-
_3/
2
a300100 (31)
-
_a200200
(31)
= 2/ 7
_3/
2
a2100100
(31)
a200200
(2 2 )
2
2
2
= _a2001
00 = _a2001 00 = -a1 00200
(31)
(31)
(21 2 )
= 3/ 7
a1002100 -
(22)
-
_3/
2
a1002100
(212)
2
= -6a021
000
2
(21 )
= a004000
= _6a0031000
= 12/ 7
(4)
(31)
aOO~
2
(2 )
000
=
_6a002~
2
(21 )
000
~
..,
145
3
1000
= a01
= 1/ 6
(I4)
= _a022000
= 2/ 7
(31)
a022000
(4)
2
2
2
aOO~200
(2)
2
000
000
2000
= _a021
= _a021
= _a01
= _a01
(31)
(21 2 )
(21
(31)
2 2
a01 1 000
(2 2 )
= 2/ 7
2 2
= _a01
~ 000
= 2/ 3
(21 )
~OOO
)
a01212000
(I4)
= 24n~,
a220000
(4)
2
= a220000
=
a22~000 = -2a21 0000
(31)
(2 )
(31)
2 2
4a1 ~ 0000
2
= -2a1(31)
20000
(2 )
= 4a1(12 41 2) 0000
= a202000
(2 2 )
-
-
-
a202000 (4)
-
2
_2a201 000
(31)
2 2
= 4a1(21
O~ 000
)
2
= -2a1(31)
02000
2
= 4ah~)
2
000
a202000
(31)
-
-
a020200 -
(4)
-
a020200
(31)
2
= a020200
=
_2a0201 00
2
(2 )
(31)
2
= 4a01(2 22 01
00
)
2
2
= _2a020~
00 = -2a01 0200
(31)
(21 )
= 4a01 24 01 2 00 = a002200 = a002200 = aOO~200
(1
)
(4)
(31)
(2 )
146
= _2a0021
(31)
2
01
= 4aOl
(2 2 )
= 1/ 22
2
2
2
00
00
2
2
00
200
200
= -2a002~
= -2aOOl
= _2aOOl
(21 )
(31)
(21 2 )
2 2
1
= 4aOOl
(21 2 )
a211000
(31)
= 1/ 2
2
= _1/13
= -a1(2 pOOO
)
2 2
1
= 4aOOl
(I4)
00
a21~000
(2)
2
a 1 1~000
(21 )
= 1/ 2
00
= 1/ 2
a211000
(212)
a211000
(4)
2
11000
= -a 1(31)
2
11000
= -a 1(14)
= 1/ 2
a120100
(4)
= _all 2 0100
(31)
2
2
= _a101
100 = -a10~ 100 =
(31)
(2 )
2
= _a 101
100
4
(1 )
-
1/ 2 a011200 -
(4)
-
1/
22
aOl1200 _ 1/
(31)
= -a(~~)
2
00
-
2
a011200
(22)
=
Q.E.D.
Proof of Lemma 2.3.1
Substitute a = -\(n-r-l-it), a = \i(n-r-2)t, and p = \i(n-r-3)t in
Lemma 2.2.4, and let w= ISI-\i(n-r-2)t 1s _ n(n_l)-lYY'I\i(n-r-3)tI~,
then
(C.3l)
E[w]
= [2(n-l)/n]\r I2I /-\it{r r [\(n-it)]/r r [\(n-l)]}(nn)-\r
- f II
+ n- 1XX'I-\(n-it)etr[_\I- 1MH' + (n-l)n-1~1-itI-\
4
-MX' + \(1-it)(n-l)n- 2 XX']{ I (_l)k(n_l)kn-k
k=O
-Pk[~(n-l)72n(~-~
+
~(1-it)7n X)j(I
+ (n-l)XX')\]
147
The following terms can be obtained directly from Lemmas 2.2.4, B.l, and
B.10:
(C.32)
[(a)o]-lb o
= 1,
(C.33)
[(a)1]-lb 1
= n- 1 {n- 1 (r+1)(1-it)
+ n- 2 [(r+1)(r+2) + (r+1)t 2
- (r 2 + 2r + 5)t 2 - 2i(r 2+r-1)t] + 03},
(C.35)
[(a)3]-lb 3
= -1/3
(C.36)
[(a)4]-lb 4
= 1/2
(C.37)
Po
(C.38)
(n-1)n- 1P1
n- S [3r + 7 - 3(r+1)t 2
-
2i(3r + 5)t + 0 1 ],
n- 6 [(1-it)2 + 0 1 ],
= 1,
= -1/2[r
1
+ (n-2)tr(I- 1MM') + 2~1-it tr(I-~')
+ tr(I-~'I-\xx') + 0 1 ],
(C.39)
(n-1)2n-2P 2
= 1/4
n[2(r+2)tr(I- 1 MM') + (n-4)tr 2 (I- 1 MM')
-2tr(I- 1MM') + tr(I-~'I-\xx')
+ 2~1-it tr(I-~'I-1MX') + 0 1 ],
(C.40)
(n-1)3 n -3P 3
(C.41)
(n-1)4 n -4P 4
= _1/ 8 n3 [tr 3 (I- 1MM') + 0 1],
= 1/6 n 4 [tr 4 (I- 1MM') + 0 1],
and
Substitution of these terms into Equation C.31 gives
(C.42)
E[w]
= [(n-1)/n]~rI2II-~it{r r [~(n-it)]/r r [~(n-1)]}(nn)-~r
. f II + n-1XX'I-~(n-it)etr[-~I-1MM'
+ (n-1)n-1~1-itI-~' + ~(1-it)(n-1)n-2XX']{1
- (4n)-1[2(r+1)(1-it)tr(I- 1MM') + (1-it)tr 2 (I- 1MM')]
- (96n 2 )-1[48r(r+1)(1-it) + 48{r 2 + 2r + 2 + (r+1)t 2
- i(r 2+r+1)t}tr(I- 1 MM') - 12{r 2 - 9 - (r 2 + 2r + 5)t 2
- 2ir(r+1)t}tr 2 (I- 1 MM') - 4{3r + 7 - 3(r+1)t 2
- 2i(3r + 5)t}tr 3 (I-IMM') - 3(1-it)2tr 4 (I-IMM')
+
96(r+1)(1-it)3/2tr(I-~')
148
- 48(1-it)
3/
+ 48{r + 1 +
The term n
-\r
1
2tr(~-~'~-lMX')
1
1
tr(~-lMM')}(l-it)tr(~-~'~-~')]
+ 03} dX
r r [\(n-it))/r r [\(n-1)) can be expanded to give
2-\r(1-it)n-\irt{1 - (4n)-lr[r + 2 - i(r+1)t] + (96n2)-lr[3r3
using Lemma A.s.
Expansion of
II + n- 1 XX'I-\(n-it)etr[-\(n-1)n- 2 (1_it)XX')
produces
etr(-\XX'){l -(4n)-1[2(1-2it)tr(XX') - tr 2 (XX')]
+ (96n 2 )-1[48(1-it)tr(XX') + 12(1 - 4t 2 - 6it)tr 2 (XX')
- 4(7 - 6it)tr 3 (XX') + 3tr 4 (XX')] + 03}
from Lemma B.4 and a Taylor series expansion of eX.
The substitution of these expansions into Equation C.42 along with
some simplification leads to
(C.43)
E[w]
= (2n)-~rl~I-~it
+
etr(-~~-lMM') f etr[-\XX'
(n-1)n-l~-it~-~']{1 - (4n)-1[r{r + 4 - i(r+1)t}
+ 2(r+1)(1-it)tr(~-lMM') + (l-it)tr2(~-lMM')
+ 2(1 - 2it)tr(XX') - tr 2 (XX')] + (96n 2 )-1[r{3r 2
+ 16r 2 - 24r - 88 - 3(r 3+r+4)t 2 + 3rt 4 - 2i[(3r 3 + 11r 2
- 18r - 22)t + (3r 2 + 3r + 4)t 3 ]} + 12{r 3 + r 2 - 4r - 8
- (r 3 + 2r 2 + sr + 4)t 2 - i(2r 3 + 3r 2 + r
- 4)t}tr(~-lMM') + 6{3r 2 + 4r - 18 - (3r 2 + sr + 10)t 2
- 3ir(2r + 3)t}tr2(~-lMM') + 4{3r + 7 - 3(r+1)t 2 - 2i(3r
+ s)t}tr3(~-lMM') + 3(I-it)2tr4(~-lMM')
-
96(r+l)(1-it)3/2tr(~-~')
+
48(I-it)3/2tr(~-~'~-lMX')
149
+ 2{6[r 2 + 4r + 4 - 2r(r+1)t 2 - i(3r 2 + 9r + 4)t]
+ 2(r+1)(1 - 2t 2
-
-
3it)tr(~-lMM')
3it)tr2(~-lMM')}tr(XX')
1
+ (1 - 2t 2
- 48{r + 1 +
tr(~-lMM')}(l
1
-it)tr(~-~'~-~') - {6[r 2 + 4r - 2 + 8t 2
- i(r 2 +r-12)t] +
+
2(r+1)(1-it)tr(~-lMM')
(1-it)tr2(~-lMM')}tr2(XX')
- 4(7 - 6it)tr 3 (XX')
+ 3tr 4 (XX')] + 03} dX
Equation 2.1.14 and Theorem 1.10.1 of Graybill (1976) can be used to
integrate with respect to X, giving the desired result.
Q.E.D.
Proof of Theorem 2.3.3
From Equation 2.1.26, the characteristic function of DU(Z) can be
written as
(C.44)
where
(C.45)
K(t)
= [{r r [\(nl-m-1)]r r [\(n2-1)]/r r [\(n2-m-1)]r r [\(nl-1)]}
0{nl(n2- 1 )/n2(nl-1)}\r]it,
(C.46)
$l(t)
= E-Xl,Sl [IS11-\imt
°11 - nl(nl-1)-lSil(Z-Xl)(Z-Xl)'I\i(nl-r-m-2)tI~1]'
(C.47)
$2(t)
= E-X2,S2 [IS21\imt
011 - n2(n2-1)-lS21(Z-X2)(Z-X2)'I-\[i(nl-r-m-2)t]I~2]'
(C.48)
~l
= {(Xl,Sl,Z)
II - nl(nl-1)-lSi l (Z-X 1)(Z-X l )'1 > OJ,
From Lemma 2.3.1,
(C.50)
$1(t)$2(t)
= (n2/nl)\irtII2Il1,\itetr[-\it(Sl-S2)]{1
+ \nl 1
0[ir(r-3)t - 4(t 2 + it)tr(Sl) - (t 2 + it)tr 2 (Sl)]
- ( 4n 2)-1[ir(r-3)t + 4(t 2 - it)tr(S2) + (t 2 - it)tr 2 (S2)]
150
+ (16nln2)-1[r2(r-3)2t2 - 4r(r-3)(t 2 - it 3 )tr(Sl)
- 4r(r-3)(t 2 + it 3 )tr(S2) + 16(t 2+t 4 )tr(Sl)tr(S2)
- r(r-3)(t 2 - it 3 )tr 2 (Sl) - r(r-3)(t 2 + it 3 )tr 2 (S2)
+ 4(t 2+t 4 )tr 2 (Sl)tr(S2) + 4(t 2+t 4 )tr(Sl)tr 2 (S2)
+ (t 2+t 4 )tr 2 (Sl)tr 2 (S2)] + (96ni 2 )-l[r{32(r 2 + 6r + 8)
- 3(r 3
-
8r 2 + 9r + 36)t 2 + 3rt 4 + i[4(2r 2
-
l5r - 6l)t
- 2(3r 2 + 3r + 4)t 3 ]} + 2{5r 3 + 73r 2 + 452r + 528
- 8(r 2 + 37r + 60)t 2 - i[(5r 3 + 93r 2 + 760r + 9l2)t
+ l2(r 2 - 3r - 8)t 3 ]}tr(Sl) + {25r 2 + 292r + 564
- 2(7r 2 + 237r + 622)t 2 + 48t 4
-
i[(45r 2 + 648r + l552)t
+ 2(3r 2 - 29r - l76)t 3 ]}tr 2 (Sl) + {20r + 47 - (40r
+ 247)t 2 - i[(50r + 333)t - (lOr+67)t 3 ]}tr 3 (Sl) + {5
-18t 2 + 3t 4
-
i(15t - llt 3 )}tr 4 (Sl)] + (96n22)-1[r{32(r2
+ 6r + 8) - 3(r 3
-
8r 2 + 9r + 36)t 2 + 3rt 4
-
i[4(2r 2
- l5r - 6l)t - 2(3r 2 + 3r + 4)t 3 ]} + 2{5r 3 + 73r 2
+ 452r + 528 - 8(r 2 + 37r + 60)t 2 + i[(5r 3 + 93r 2 + 760r
+ 9l2)t + l2(r 2 - 3r - 8)t 3 ]}tr(S2) + {25r 2 + 292r + 564
- 2(7r 2 + 237r + 622)t 2 + 48t 4 + i[(45r 2 + 648r
+ l552)t + 2(3r 2 - 29r - l76)t 3 ]}tr 2 (S2) + {20r + 47
- (40r + 247)t 2 + i[(50r + 333)t - (lOr + 67)t 3 ]}tr S (S2)
+ {5 - l8t 2 + 3t 4 + i(15t - llt 3 )}tr 4 (S2)] + Os}
where
S.
J
= I:l(Z-M.)(Z-M.)'.
J
J
J
151
The expansion of K(t) is
(n2/nl)\irt{1 + (4nl)-lir(r+6)t - (4n2)-lir(r+6)t + ( 16n ln2)-lr 2 (r
+6)2 t 2 - (96ni 2 )-2[3r 2 (r+6)2t 2 - 8ir(r 2 + 6r + 15)t]
- C96n22)-l[3r2Cr+6)2t2 + 8irCr 2 + 6r + 15)t] + 03},
which can be obtained using Lemma B.5 and the binomial expansion.
The
result is obtained by combining terms and integrating with respect to Z
using Equation 2.1.14.
Q.E.D.
APPENDIX D
NUMERICAL PROCEDURES FOR CHAPTER 3
This appendix gives several results that are needed to compute the
probabilities described in Chapter 3.
Fl(x) and F 2 (x) computed using
standard power series of the error function and continued fractions for
the incomplete gamma function, respectively, as well as the asymptotic
expansions for large x.
F 4 (x) was derived from Lemmas D.l and 0.2.
Lemma 0.3 provides the formula for F 6 (x).
to obtain F 7 (x).
Lemma D.4 and 0.5.
Gaussian integration was used
The coefficients gk from 3.2.24 were computed using
Lemma 0.6 provided the method to calculate h .
k
Lemma D.l
mrl even
0Cl
= aF 1 (x)e\y2
I
k=O
(a/a)
~Inrl+k
D_~rl-k-l(Y)'
mrl odd
where
y = a/a - (x-IJ)/a,
F l (x) = 4»[ (x-IJ)/a], and
(D.2)
o
D (x) = e
a
-\x 2
0Cl
f
t
-a-l -xt-~t2
dt/r( -a),
e
0
is known as a parabolic cylinder function.
a < O.
~
153
Proof of Lemma D.l
From Equation 3.2.8
~
(D.3)
F 4 (x)
=~
+
f
~
o
v
~rl-l
e
-via erf[(x-v-~)/v2a]
~
~~rl
dv/r(~rl)a~
.
Using integration by parts and Equation 3.462.1 of Gradshetyn and Ryzhik
(1965), the result follows.
When mrl is even,
(D.4)
and when mr is odd,
(D.5)
Q.E.D.
Lemma D.2
(D.6)
F4
(n+1)
(a,~,~,a;x)
= aa-~1(1)(~,a;x)e\y2
n
I
~
I
j=O k=O
(~)(_1)n+~a(j)+(j
mod 2) 2-n+(j mod 2)
J
.[(x-~)/a](-~a(j))k
n-J
• {a(j)!/[\a(j)]!}H
• r[(a+2k+(j mod 2)]
a
• D_ a - 2k -(j mod 2)(y)/[\+(j mod 2)]kr(a)~ k!
where a(j)
(D.7)
=j
- (j mod 2), H is the Hermite polynomial defined by
H (x)= 2
n
n -\
n
f ~-~ (x
n -t 2
+ it) e
dt
and D is the parabolic cylinder function defined by Equation
D.2.
154
Proof of Lemma D.2
The function H (x) can be represented by a confluent hypergeometric
n
function, i.e.,
H (x)
(D.8)
n
where a(n)
= (-1)\a(n){[a(n)]!/[\a(n)]!}(2x)n
=n
- (n mod 2).
mod 2
Equation 8.958.2 of Gradshteyn and Ryzhik
(1965)
n
(D.9)
H (x+y)
n
n
= 2-\n ~O
L (k)H _k(x~)Hk(Y~)
n
can be used to isolate the variable of integration.
Integration is then
performed term by term and the result follows from the definition of D.
Q.E.D.
Lemma D.3
The inverse Fourier transform of
(D.lO)
is
(D.ll)
f(x)
= Ixlal+a2-lexp{_~[1/~1
•
+ 1/~2) + sgn(x)(l/~l - 1/~2)]lxl}
U[\(al+a2)-~sgn(x)(al-a2),al+a2;(1/~1+1/~2)lxl]
/ r[~(al+a2)+~sgn(x)(al-a2)]~~1~~2
Proof of Lemma D.3
From Equation 3.384.9 of Gradshteyn and Ryzhik (1965), the inverse
Fourier transform of
(D.12)
f(x)
~(t)
is
= ~-~1~-~2(1/~1
•
exp[-~(l/~l
+ 1/~2)-~(al+a2)x\(al+a2-2)
-
1/~2)X]
• W~(al-a2),~(1-al-a2)[(1/~1 + 1/~2)X]/r(al)'
(x > 0)
~
155
• exp[~(l/f31 - 1/f32)X]W1~ ( al-a2 )
1
,~
(1 -al-a2 )
(x < 0)
• [-(1/f31 + 1/f32)x]/f(a2),
where W is the Whittaker function defined as
(D. 13)
and where U is the Kummer function defined in Section 13.1 of Abramowitz
and Stegun (1964) as
(D.14)
U(a,f3;z)
= [lFl(a,~;z)/f(l+a-~)f(f3)
- zl-f31Fl(l+a-f3,2-f3;z)/f(a)f(2-~)]n/sin(nf3),
if
~
is not an integer
00
+ L (a)k{~(a+k) - ~(I+k) - ~(~+k)}zk/(f3)kk!]
k=O
if
~
is a positive integer;
where
~(x)
(D. IS)
The relationship U(a,~;z)
a negative integer.
d
= dx
log f(x).
= zl-f3U(I+a-f3,2-~;z)
can be applied when f3 is
The result follows by substitution.
Q.E.D.
156
Lemma D.4
The following relationship holds:
j
00
(D. 16)
j
etr[-axSB(I - xB)-l] = L L
j=O k=1
,,()k...
L
-a -oJ' k
= l/kl~·
where
j
L k
= k,
L nk
= j.
n=1 n
k
n tr £(SB£)x j
£=1
• • k.!,
J
and
j
n=1
n
Proof of Lemma D.4
Let fey) = e Y and g(x) = tr[-axSB(I - xB)-l], then from Equation
0.430 of Gradshteyn and Ryzhik (1965)
(D.I7)
where the second summation is over all partitions of k such that
+ k
n
= k and lk 1 + 2k 2 +
+ nk
n
= n and where
(D.18)
then
A Maclaurin series is used to achieve the result.
Q.E.D.
~
157
Lemma D.5
If I IBI I < 1, then
(D. 19)
II - xBI-
a
00
etr[-bxSB(I - xB)-l]
=L
c. x j
j=O J
where
with the latter summation being taken over all partitions of Q such that
+ Q
k
= Q and
1Q
1
+ 2Q
2
+ • • •
Proof of Lemma D.5
From Equation 0.42 of Gradshteyn and Ryzhik (1965)
(D.20)
where
and
f(O)(x)
= f(x).
158
Let
(0.21)
and
(0.22)
then
f~k)(O)
(0.23)
=L
(a)KCK(B)
K
from the definition of lFo(a;x) and
(0.24)
from Lemma 0.4.
A Maclaurin series gives the desired result.
Q.E.D.
Lemma D.6
Let TO
be defined as in Equation 3.2.22, then
m
(0.25)
00
00
=j~O k~ohj(nl,···,nm;Aol;rOlll,···,rOmll)
159
where a oQ ' AoQ are defined in Equation 3.2.19 and
f
f
f
ol2
o22
o32
f
f
f
ol3
]
o23
033
with the partitions coinciding with those of A '
o
Proof of Lemma D.6
By definition of A and f
oj
o
(D.26)
tr[fOj(I + itAo)-nj ]
= tr[fOjll(1 + itAol)-n j ]
+ tr[f
oj22
(I + itA
o2
)-n j ] + tr(f
Oj33
)
The first two terms can be represented as power series by applying
Equation 3.2.17 and then by using a binomial expansion of the matrix
(I - a oQA i)k(1 + ita oQ )
6
-k-n.
Jjk!
With the use of indicator functions, the series becomes
co
co
-p
I
I a. b. (1 + ita )
o1
p=O q=O JP Jq
(1 + ita
o2
)
-q
.
A sequence of power series multiplications produces the coefficients hQ .
Q.E.D.
APPENDIX E
INTERMEDIATE RESULTS FOR CHAPTER 3
Lemmas E.1 and E.2 were derived to calculate the terms of the
asymptotic expansion for the QDF, Lemmas E.3 and E.4 for the LDF, and
Lemma E.5 for the UDF.
These results were obtained directly from Lemma
B.2.
Lemma E.l
Let T be defined by Equation 3.2.22, ~ and
= [I r
I
0], B'
= [0
o = AO(I-A-1),
I r ], H1
= 0,
H2
= H,
I1
= I,
r by Equation 2.2.15, A'
I2
= A,
and
then the following identities hold:
(E. 1 )
(E.2)
(E.3)
(E.4)
tr[BB'(I +
it~)-1]2
= T1 (2,Io ,I)
+ Zit T1 (Z,I ,Ao) + (it)2T 1 (Z,I ,A
o
o
zo ),
~
161
(E.5)
tr[AA'(I + itA)-1]3
(E.6)
tr[BB'(I + itA)-l]3
= T1(3,IO,I)
+ 3itT 1(3,I ,AO) + 3(it)2T 1(3,I ,A20 )
O
O
30
+ (it)3T 1(3,I ,A ),
O
(E.7)
tr[AA'(I + itA)-1]4
- 4itT 1 (4,I ,AO- 1) + 6(it)2Tl(4,IO,A20-2)
O
·)3
(
30-3)
- 4 ( ~t T1 4,Io ,A
+ (.~t) 4T1(4,I ,A 40-4 ),
6
=T1(4,IO,I)
(E.8)
tr[BB'(I+itA)-1]4
+ 4itT 1(4,I ,Ao) + 6(it)2T 1(4,I ,A20 )
o
o
30
46
+ 4(it)3T 1(4,I ,A ) + (it)4T 1 (4,I ,A ),
o
o
= T1(4,Io ,I)
(E.9)
tr[rr' (I+itA)-IAA' (I+itA)-I]
- 20itT 1(2,Io ,MM'Ao- 1)
20 - 2 )]
+ (·t)2T
I.
1 (2 , I 6' MM'A
,
=- 2[oT 1(2,Io ,MM')
(E.10)
tr[rr'(I + itA)-lBB'(I + itA)-I]
= 2[(1-0)T
1 (2,I O,MM'A-
1) + 2(1-6)itT 1 (2,I ,MM'AO- 1
O
+ (it)2Tl(2,IO,MM'A26-1)],
162
(E.11)
tr[rr'(I +
it~)-l{AA'(I
+
it~)-1}21
2{OT 1(3,IO,MM') - 30itT 1(3,I ,MM'A O- 1)
O
+ (20+1)(it)2Tl(3,IO,MM'A20-2)
=-
- (it)3Tl(3,IO,MM'A30-3)},
(E.12)
tr[rr'(I +
it~)-l{BB'(I
+
it~)-1}21
+ 3(1-O)itT 1(3,I ,MM'AO- 1 )
O
+ (3-20)(it)2Tt(3,IO,MM'A26-1)
= 2{(l-O)T 1(3,IO,MM'A- 1)
+ (it)3Tt(3,I6,MM'A36-1)},
(E.l3)
tr[rr'(I +
it~)-t{AA'(I
+
=- 2{6T t (4,I6 ,MM')
it~)-t}3]
- 46(it)T t (4,I ,MM'A6- 1)
6
+ (1+sO)(it)2Tt(4,I6,MM'A26-2)
- 2(1+O)(it)3Tt(4,I6,MM'A36-3)
+ ('t)4T
(4, I 0' MM'A 46 - 4)} ,
~
1
(E.14)
tr[rr'(I +
i~)-t{BB'CI
+
it~)-t}3]
=2{(1-0)T t C4,IO,MM'A- t )
+ 4(1-0)Cit)T t (4,I ,MM'AO- 1)
O
+ (6-S0)(it)2Tt(4,IO,MM'A20-1)
+ 2CZ-O)(it)3TtC4,IO,MM'A30-1)
+ (it)4Tt(4,IO,MM'A40-1)},
(E. 15)
tr[rr'CI +
i~)-l{AA'CI
+
=- 2{oT t Cs,Io ,MM')
i~)-t}4]
- so(it)Tt(s,IO,MM'AO- 1 )
+ (1+96)(it)2Tt(s,IO,MM'A20-Z)
- (3+70)(it)3Tt(S,IO,MM'A36-3)
+ (3+26)(it)4Tt(S,IO,MM'A40-4)
- (it)5Tt(S,IO,MM'As~-s)},
163
(E.16)
tr[ff'(I +
it~)-l{BB'(I + it~)-1}4]
= 2{(1-0)T 1(S,IO,MM'A- 1)
+ S(1-0)itT 1 (S,I ,MM'A O- 1)
O
+ (10-90)(it)2Tl(S,IO,MM'A20-1)
+ (10-70)(it)3Tl(S,IO,MM'A30-1)
+ (S-20)(it)4Tl(S,IO,MM'A40-1)
+ (it)5Tl(S,IO,MM'ASO-1)},
(E.17)
tr[ff'(I +
it~)-lAA'(I
+ it~)-l]2
= 4{OT2(2,2,IO,MM'MM')
- 40itT 2 (2,2,IO,MM'MM'A O- 1)
+ 20(it)2[2T2(2,2,IO,MM'AO-l,MM'AO-l)
+ T2(2,2,IO,MM/MM'A20-2)
- 40(it)3T2(2,2,IO,MM'AO-1,MM'A20-2)
+ (it)4T2(2,2,IO,MM/A20-2,MM'A20-2)},
= 4{(1-0)T2(2,2,IO,MM /A- 1 ,MM /A-l)
+ 4(1-0)itT2(2,2,IO,MM'A-l,MM/AO-l)
+ 2(1-0)(it)2[2T2(2,2,IO,MM/AO-l,MM/AO-1)
+ T2 (2. ' 2 , I 0' MM/A-l , MM / A20 - 1)]
+ 4(it)3(1-0)T2(2,2,IO,MM'AO-l,MM/A20-1)
+ (it)4T2(2,2,IO,MM'A20-1,MM'A20-1)},
164
Lemma E.2
Under the same conditions as Lemma E.1, the following identities
hold:
(E.19)
P(1)[~(I + ita)-~;(I + ita)-~A]
O
- it[T 1(1,lo,A - 1) + oT 1(2,lo,MM')]
+ 20(it)2T 1 (2,lo,MM'AO- 1 ) - (it)3T 1(2,lo,MM'A20 - 2 )},
= -~{T1(1,lo,I)
(E.20)
P(I)[~(it)~(I+ita)-~;(I+ita)-~B]
= -~{T1(I,lo,I)
O
- it[T 1 (I,lo,A )
+ (1-0)T 1 (2,I o ,MM'A- 1 ;t)]
+ 2(1-0)(it)2T 1(2,I ,HM'A o- 1)
o
+ (it)3 + T1(2,Io,HM'A20-1)},
(E.21)
P(2)[~(I + ita)-~;CI + ita)-~A]
= -~{2T1(2,Io,I)
+ T2 (1,I,Io ,I,I)
.
0-1 ) + 2T (2,I ,A6-1 )
- 2It[T
1
2 (1,I,I o ,I,A
o
+ 20T 1(3,lo,MM') + oT 2 (1,2,Io ,I,MM')]
~ C3,I ,MM ,6-1)
+ CI. t )2[ 2T 1( 2,I ,A20-2) + 12uT
A
1
o
o
0-1 0-1
+ T2(1,I,I o ,A
,A
) + OT2(2,2,lo,MM' ,MM')
+ 46T2(1,2,lo,I,MM'A0-1 ) + 20T2(1,2,Io ,A6-1 ,MM')]
- 2(it)3[2(20+1)Tl(3,Io,MM'A20-2)
20-2
0-1
0-1
+ T2(1,2,lo,I,MM'A
) + 26T2(1,2,lo,A
,MM'A
)
+ 20T2(2,2,lo,MM' ,MM'AO- 1 )]
+ 2(it)4[2T1C3,Io,MM'A30-3)
+ T2 (1 "
20
l ' AO- 1 , MM'A 20 - 2)
+ 20T 2 (2 "
20
I ' MM'Ao- 1 , MM'A o- 1)
+ OT 2 (2,2,lo,MM' ,MM'A 20 - 2 )]
•
165
- 40(it)5T2(2,2'~O,MM'AO-1,MM'A20-2)
(it)6T2(2,2'~O,MM'A20-2,MM'A20-2)}
+
(E.22)
P(2)[~(I
+.
it~)-\r;(I
= -\{2T 1 (2,LO,I)
+
+
+
+
+
it~)-~B]
+ T2 (1,1,L ,I,I)
O
2it[2T 1 (2,L ,AO) + 2(1-O)T I (3,L ,MM'A- 1 )
O
O
T2 (1,1,L ,I,AO) + (1-O)T 2 (1,2,L ,I,MM'A- 1 )]
O
O
20
(it)2[2T I (2,L ,A ) + 12(1-0)T I (3,I ,MM'A O- 1 )
O
O
T2 (1,1,Io,A ,A ) + 4(1-0)T 2 (1,2,I ,I,MM A0-1 )
°°
,
o
+ 2(1-0)T 2 (1,2,L ,AO,MM'A-l)
O
+ (1-0)T 2 (2,2,I ,MM'A-I,MM'A-l)]
o
+ 2(it)3[2(3-20)TI(3,IO,MM'A20-1)
°
+ T2 (1,2,I . ,I,MM , A20-1 ) + 2(1-0)T2(1,2,I ,A ,MM ,0-1
A )
o
o
+ 2(1-0)T2(2,2,IO,MM'A-l,MM'AO-1)]
+ 2(it)4[2TI(3,IO,MM'A30-1)
°
+ T2(1,2,I ,A ,MM ,20-1
A
)
o
+ 2(1-0)T2(2,2,IO,MM'AO-1,MM'AO-1)
+ (1-0)T2(2,2,IO,MM'A-l,MM'A20-1)]
+ 4(it)5(1-0)T2(2,2,IO,MM'AO-1,MM'A20-1)
+ ('t)6T
(2" 2 I 0' MM'A20 - 1 , MM'A20 - 1)}
~
2
Lemma B.3
Let a, f, and
B'
= (0
B), MI
= 0,
~
be defined by Equation 2.2.20, A'
M2
= M,
II
= I,
I2
= A,
~I
= nIA- I
= (A
0),
166
(E.24)
tr[BB'(~ + it~)-l]
(E.25)
tr[AA'(~
=
(E.26)
= tr(~2LB)
it~)-1]2
+
tr(~¥~2A2) - 2(it)tr(~lAO-l~2A2) + (it)2tr(A20-2~2A2),
tr[BB'(~ + i~)-1]2
= tr(~~~2B2)
(E.27)
+ (it)tr(AOLB),
tr[AA'(~
- 2(it)tr(~2AO~2B2) + (it)2tr(A26~2B2),
+ it~)-1]3
= tr(~f~3A3)
- 3(it)tr(~¥AO-l~3A3)
•
+ 3(it)2tr(~lA20-2~3A3) _ (it)3tr(A30-3~3A3),
(E.28)
tr[BB'(~ + it~)-1]3
= tr(~~~3B3)
+
3(it)tr(~~AO~3B3)
+
3(it)2tr(~2A20~3B3)
+ (it)3tr(A30~3B3),
(E.29)
tr[AA'(~
+
it~)-1]4
= tr(~1~4A4)
- 4(it)tr(~fAO-l~4A4)
+ 6(it)2tr(~¥A20-2~4A4) _ 4(it)3tr(~lA30-3~4A4)
+ (it)4tr(A40-4~4A4),
(E.30)
tr[BB'(~
+ i~)-1]4
= tr(~i~4B4)
+
+
4(it)tr(~tA6~4B4)
+
6(it)2tr(~~A26~4B4)
4(it)3tr(~2A30~4B4) + (it)4tr(A46~4B),
~
167
(E.31)
tr[rr'(~ + it~)-lAA'(~ + it~)-l]
- - 2[6tr(MM'~~L2A) - 26(it)tr(MM'~lA6-1L2A)
+ (it)2tr(MM'A26-2L2A)
(E.32)
tr[rr'(~
+
it~)-lBB'(~
+
i~)-l]
= 2[(1-6)tr(MM'A-l~~L2B)
+ 2(1-6)(it)tr(MM'~2A6-1L2B)
+ (it)2tr(MM'A26-1L2B),
(E.33)
tr[rr'(~ + it~)-lAA'(~ + it~)-1]2
= 4{6tr2(MM'~¥L2A)
- 46(it)tr(MM'~¥L2A)tr(MM'~lA6-1L2A)
+ 26(it)2[2tr2(MM'~lA6-1L2A)
. tr(MM'~¥L2A)tr(MM'A26-2L2A)]
+ 46(it)3tr(MM'~lA6-1L2A)tr(MM'A26-2L2A)
+ (it)4tr2(MM'A26-2L2A)},
(E.34)
tr[rr'(~
+ it~)-lBB'(~ + it~)-1]2
= 4{(1-6)tr2(MM'~~A-IL2B)
+ 4(1-6)(it)tr(MM'~~A-IL2B)tr(MM'~2A6-1L2B)
+ 2(1-6)(it)2[2tr2(MM'~2A6-1L2B)
+ tr(MM'~~A-I12B)tr(MM'A26-112B)
+ 4(it)3tr(MM'~2A6-1L2B)tr(MM'A26-1L2B)
+ (it)4tr2(MM'A26-112B)},
(E.36)
tr[rr'(~
+ i~)-lAB'(~ + it~)-lBA'(~ + it~)-l]
=-
2{6tr(MM'~112A2B2)-6(it)tr(MM'I2A2B2)
168
(it)2tr[MM'{(1-0)~1-O~2}L3A2B2]
+
+ (it)3tr(MM'A30-ZL3A2B2)},
(E.37)
tr[rr'(~
+
itA)-lBA'(~
+
itA)-lAB'(~
+ itA)-l]
= Z{(1-O)tr(MM'~lL2A2B2)
+ (1-o)(it)tr(MM'A- 1L2A2B2) +
(it)2tr[MM'A-l{OA~2
- (1-O)~1}L3A2B2] - (it)3tr(MM'A30-ZL3A2B2)},
(E.38)
tr2[rr'(~
+
=-
itA)-lAB'(~
+ itA)-l]
4{(it)2tr2[MM'AO-~lL2AB)
+ Z(-1)O(it)3tr(MM'A20_3/2L2AB)tr[MM'AO-~lL2AB]
+ (it)4tr2(MM'A20_3/2L2AB)},
(E.39)
tr[rr'(~
+
=-
it~)-l{AA'(~
+
it~)-1}2]
Z[otr(MM'~iL3A2) - 30(it)tr(MM'~¥AO-IL3A2)
+ (20+1)(it)2tr(MM'~lA20-2L3A2)
- (it)3tr(MM'A30-3L3A2)],
(E.40)
tr[rr'(~
+
itA)-l{BB'(~
+
it~)-1}2]
= 2[(1-o)tr(HM'~~A-II3B)
+ 3(1-O)(it)tr(MM'~~AO-IL3B2)
+ (3-20)(it)2tr(MM'~2A20-1L3B2)
- (it)3tr(MM'A30-1L3B2)],
Lemma E.4
Under the conditions of Lemma E.3, the following identities hold:
•
169
(E.41)
Pl[~(~ + it~)-~,(~ + it~)-\A]
= -\{tr(~lLA)
- it[tr(AO-1LA) + otr(MM'~lI2A)J
+ 20(it)2tr(MM'~lAO-II2A) - (it)3tr(MM'A20-212A)},
(E.42)
Pl[~(~ + it~)-\r,(~ + i~)-\a]
= -~{tr(~2IB)
- it[tr(AOIB) + (1-6)tr(MM'A-l~~I2B)]
+ 2(1-o)(it)2tr(MH'~2A6-112B) + (it)3tr(MM'A20-112B)},
(E.43)
P2[~(~
+
i~)-~,(~
= -\{tr(~lLA)
+
i~)-~A]
+ 2tr(~¥I2A2) - 2it(tr(~lLA)] + tr(AO-1IA)
+ 2tr(~lAI-OI2A2) + otr(~lLA)tr(MM'~¥I2A)
+ 26tr(MM'~¥I3A2)] + (it)2[tr 2 (AO- 1LA)
+ 2tr(A26-212A2) + 46tr(~lLA)tr(MM'~lAO-II2A)
+ 26tr(A6-1!A)tr(MM'~¥I2A) + 12otr(MM'~'¥A6-113A2)
+ 6tr2(MM'~¥I2A)] - (it)3[2tr(~lIA)tr(MM'A20-212A)
+ 4otr(A6-1LA)tr(MM'~lA6-112A)
+ 4(26+1)tr(MM'~lA26-213A2)
+ 4otr(MM'~¥I2A)tr(MM'Ao-II2A)]
+ (it)4[2tr(A6-1LA)tr(MM'A26-2I2A)
+ 4tr(MM'A 30 - 3I3A2) + 4otr2(MM'~lA6-1I2A)
+ 2otr(MM'~¥I2A)tr(MM'A26-2I2A)]
- 4(it)Str(MM'~lA6-1I2A)tr(MM'A20-2I2A)
+ (it)6tr 2 (MM'A26 - 212A)},
170
(E.44)
P2~(~
+
itA)-~r,(~
= -\{tr2(~2LB)
+ tr(AOLB) +
+
itA)-~B]
+ 2tr(~~I2B2) + 2(it)[tr(~2LB)]
2tr(~2AOL2B2)
+
(1-0)tr(~2LB)tr(MM'~~I2B)
+ 2(1-0) tr(MM '~~A -lL 3B2) I + (it) 2[tr 2 (A°LB)
+ 2tr(A
+
20 L2B2 ) + 4(1-0)tr(~2LB)tr(MM'~2AO-IL2B)
2(1-0)tr(AOIB)tr(MM'~~A-II2B)
+ 12(1-0)tr(MM'~~AO-II3B2) + (1-0)tr2(MM'~~A-IL2B)]
- (it)3[2tr(~2LB)tr(MM'A20-112B)
+
4(1-0)tr(AOLB)tr(MM'~2AO-II2B)
+ 4(3-20)tr(MM'~2A20-113B2)
+
4(1-0)tr(MM/~~A-I12B)tr(MM'~2AO-II2B)]
+ (it)4[2tr(AOLB)tr(MM'A20-112B) + 4tr(MM'A30-1L3B21
+
.
4(1-0)tr2(MM'~2AO-II2B)
+ 2(1-O)tr(MM'~~A-II2B)tr(MH'A20-1L2B)]
+ 4(it)5tr(MM'~2AO-II2B)tr(MM'A20-112B)
+ (it)6tr2(MM/A20-112B)},
(E.45)
p(~)[~(~ + itA)-\r;(~ + itA)-~A,(~ + itA)-~]
= 1/12{3tr(~lLA)tr(~2LB)
+
3it(tr(~lLA)tr(AOLB)
- tr(AO-lLA)tr(~2LB) - otr(~2LB)tr(MM'~¥I2A)
+
(1-0)tr(~lLA)tr(MMA-l~~L2B) + 2(-1)Otr(MM'~lI2A2B2)]
- 3(it)2[tr(AO- 1LA)tr(AoIB) + 2tr(Ao- 1I2AB)
- 2otr(~2LB)tr(MM'Ao-l~112A)
+
otr(AoIB)tr(MM'~¥I2A)
- 2(1-O)tr(~lLA)tr(MH'Ao-l~212B)
+
(1-0)tr(Ao-lLA)tr(MM'A-l~~I2B)
4It
171
- 2tr(MM'AO- 112A2B2)) + 3(it)3[tr(~lLA)tr(MM'A20-112B)
- tr(~2IB)tr(MM'A20-2L2A) + 2otr(AOLB)tr(MM'AO-1~lL2A)
- 2(1-0)tr(AO-1LA)tr(MM'AO-1~2L2B)
-
2tr(MM'{(1-0)~1
-
o~2}I3A2B2)
+ 2tr(MM'{oA~2 - (1-0)~1}I3A2B2))
- (it)4[3tr(AOLB)tr(MM'A20-2I2A)
+ 3tr(AO-lLA)tr(MM'A20-112B)
+ 12tr(MM'A30-2I3A2B2) + otr(MM'~¥I2A)tr(MM'A20-1I2B)
+ (1-0)tr(MM'A20-2I2A)tr(MM'A-l~~I2B)
+
2tr2(MM'A\{O~1 - (1-a)~2}I2AB)]
+ (it)S[2otr(MM'~lAO-I12A)tr(MM'A20-112B)
- 2(1-O)tr(MM'A20-212A)tr(MM'AO-l~212B)
+
4tr(MM'A\I2AB)tr(MM'A\{O~1 - (1-a)~2}I2AB)]
- (it)6[tr(MM'(A20-212A)tr(MM'A26-112B)
1
+ 2(2o-1)tr2(MM'A~I2AB)]},
Lezmna E.5
Let T be defined by Equation 3.2.22, PjK(t) be defined as in Equation 2.3.7, a
= o-j+l,o and La = AO(I-A- 1),
then
= [T2(1,1,L6 ,Aa ,Aa ) + 2Tl(2,IaA2a )]
a
a-2
+ 2(a-it)2T2(1,1,Ia,A MMA
)
+ 4(a-it) 2Tl(3,LO,MMA 3a-2 )
. 4
+ (a-2t)
T2(1,1,Ia ,MMA«-2 ,MMA«-2 )],
BIBLIOGRAPHY
Ahrens, J. H. and Dieter, U. (1972). Computer Methods for Sampling
from Exponential and Normal Distributions. Communications of the
ACM 15,873-882.
Anderson, T. W. (1951). Classification by multivariate analysis.
Psychometrika 16, 31-50.
Anderson, T. W. (1958). An Introduction to Multivariate Statistical
Analysis, led., John Wiley and Sons, Inc., New York.
Anderson, T. W. (1984). An Introduction to Multivariate Statistical
Analysis, 2ed., John Wiley and Sons, Inc., New York.
Anderson, T. W., and Bahadur, R. R. (1962). Classification into Two
Multivariate Normal Distributions with Different Covariance
Matrices. Annals of Mathematical Statistics 33, 420-431.
Arvesen, J. N. (1969). Jackknifing U-statistics.
Mathematical Statistics 40, 2076-2100.
Annals of
Bartlett, M. S., and Pease, N. W. (1963). Discrimination in the Case
of Zero Mean Differences. Biometrika 50, 17-21.
Barnes, E. W. (1899). The Theory of the Gamma Function.
Missing Math 29, 64-128.
Blackwell, D., and Girshick, M. A. (1954). Theory of Games and
Statistical Decisions, John Wiley and Sons, Inc., New York.
Bunke, O. (1967). Stabilitat Statistischer Entschridungsprobleme
und Anwendung in der Diskriminanzanalyse. Z. Wahrschein. Theorie
und Verwandte Gebiete 7, 131-146.
Chang, Y., (1980). Discriminant Analysis with Categorical Data.
Dissertation: Department of Preventive Medicine and Environmental
Health, University of Iowa.
Chikuse, Y. (1980). Invariant Polynomials with Heal and Complex
Matrix Arguments and Their Applications. Technical Report No. 80-3,
Institute for Statistics and Applications, University of Pittsburgh.
Clunies-Ross, C. W., and Riffenburgh, R. H. (1960). Geometry and
Linear Discrimination. Biometrika 47, 185-189.
173
Constantine, A. G. (1963). Some Non-central Distribution Problems in
Multivariate Analysis. Annals of Mathematical Statistics 34,
1270-1285.
Constantine, A. G. (1966). The Distribution of Hotelling's Generalized
T. Annals of Mathematical Statistics 37, 215-225.
Cooper, P. W. (1963). Statistical Classification with Quadratic Forms.
Biometrika 50, 439-448.
Cooper, P. W. (1965). Quadratic Discriminant Functions in Pattern
Recognition. IEEE Transactions on Information Theory IT-II,
313-315.
Crowther, N. A. S. (1975). The Exact Non-central Distribution of a
Quadratic Form in Normal Vectors. South African Statistical
Journal 9, 27-36.
Das Gupta, S. (1964).
25-30.
Nonparametric Classification Rules.
Sankhya 26,
Das Gupta, S. (1973). Theories and Methods in Classification: A
Review. Discriminant Analysis and Applications, T. Cacoullos,
ed., Academic Press, New York.
Davis, A. W. (1979). Invariant Polynomials with Two Matrix ArgumentsExtending the Zonal Polynomials: Applications to Multivariate
Distribution Theory. Annals of the Institute of Statistical
Mathematics 31, Part A, 465-485.
Davis, A. W. (1980). Invariant Polynomials with Two Matrix Arguments,
Extending the Zonal Polynomials. Multivariate Analysis V,
P. R. Krishariah, ed., Academic Press, New York, 287-299.
Day, N. E., and Kerridge, D. F. (1967). A general Maximum Likelihood
Discriminant. Biometrics 23, 313-323.
Day, N. E. (1969). Linear and Quadratic Discrimination in Pattern
Recognition. IEEE Transactions on Information Theory IT-IS,
419-421.
Fisher, R. A. (1936). The Use of Multiple Measurement in Taxonomic
Problems. Annals of Eugenics 7, 179-188.
Fix, E., and Hodges, J. L. (1951). Nonparametric Discrimination:
Consistency Properties. USAF School of Aviation Medicine,
Project No. 21-49-004, Report No.4, Randolph Field, Texas.
Fujikoshi, Y. (1970). Asymptotic Expansions of the Distributions of
Test Statistics in Multivariate Analysis. Journal of Science of
Hiroshima University, Series A-I 34, 73-144.
174
Ghurye, S. G., and Olkin, I (1969). Unbiased Estimation of Some
Multivariate Probability Densities and Related Functions. Annals
of Mathematical Statistics, 40, 1261-1271.
~
Gilbert, E. S. (1968). On Discrimination Using Qualitative Variables.
Journal of the American Statistical Association 63, 1399-1412.
Gilbert, E. S. (1969). The Effect of Unequal Variance-covariance
Matrices on Fisher's Linear Discriminant Function. Biometrics 25,
505-516.
Giri, N. C. (1977).
New York.
Multivariate Statistical Inference, Academic Press,
Glick, N. (1972). Sample Based Classification Procedures Derived from
Density Estimators. Journal of the American Statistical
Association 67, 116-122.
Goldstein, M., and Dillon, W. R. (1978). Discrete Discriminant
Analysis, John Wiley and Sons, Inc., New York.
Gradshteyn, I. S., and Ryzhik, I.M. (1965). Table of Integrals, Series,
and Products, Academic Press, New York.
Gray, H. L. (1972).
Generalized Jackknife Statistics, Dekker, New York.
Graybill, F. A. (1976). Theory and Application of the Linear Model,
Duxbury Press, North Scituate, Massachusetts.
Han, C. (1968). A Note on Discrimination in the Case of Unequal
Covariance Matrices. Biometrika 55, 586-587.
Han, C. (1969). Distribution of Discriminant Function When Covariances
Are Proportional. Annals of Mathematical Statistics 40, 979-985.
Han, C. (1970). Distribution of Discriminant Function in Circular
Models. Annals of the Institute of Statistics and Mathematics,
Tokyo 22, 117-125~
Han, C. (1974). Asymptotic Distribution of Discriminant Function When
Covariance Matrices are Proportional and Unknown. Annals of the
Institute of Statistics and Mathematics, Tokyo 26, 127-133.
Herz, C. S. (1955). Bessel Functions of Matrix Argument.
Mathematics 61, 474-523.
Annals of
Hodges, J.L., and Lehmann, E.L. (1970). Deficiency.
Annals of Mathematical Statistics 41, 783-801.
Hoel, P. G., and Peterson, R. P. (1949). A Solution to the Problem
of Optimum Classification. Annals of Mathematical Statistics 20,
433-438.
~
175
Hsu, P. L. (1940). On Generalized Analysis of Variance (I).
Biometrika 31, 221-237.
James, A. T. (1961). Zonal Polynomials of the Real Positive Definite
SYmmetric Matrices. Annals of Mathematics 74,456-469.
James, A. T. (1964). Distributions of Matrix Variates and Latent Roots
Derived from Normal Samples. Annals of Mathematical Statistics
35, 475-501.
Johnson, N. L., and Kotz, S. (1972). Distributions in Statistics:
Continuous Multivariate Distributions, John Wiley and Sons, Inc.,
New York.
Khatri, C. G. (1977). Distribution of a Quadratic Form in Noncentral
Normal· Vectors Using Generalized Laguerre Polynomials. South
African Statistical Journal 11, 167-179.
Khatri, C. G., and Srivastava, M.S. (1971). On Exact Non-null
Distributions of Likelihood Ratio Criteria for Sphericity Test
and Equality of Two Covariance Matrices. Sankhya A33, 201-206.
Kullback, S. (1952). An Application of Information Theory to
Multivariate Analysis. Annals of Mathematical Statistics
23,88-102.
Lachenbruch, P. A., and Mickey, M. R. (1968). Estimation of Error Rates
in Discriminant Analysis. Technometrics 10, 1-11.
Lachenbruch, P. A., Sneeringer, C., and Revo, L. T. (1973). Robustness
of the Linear and Quadratic Discriminant Functions to Certain Types
of Nonnormality. Communications in Statistics 1, 39-56.
Lachenbruch, P. A. (1975). Discriminant Analysis, Hafner Press,
New York.
Mallows, C. L. (1953).
Sequential Discrimination.
Sankhya 12, 321-338.
Marks, S., and Dunn, O. J. (1974). Discriminant Functions When
Covariance Matrices Are Unequal. Journal of the American
Statistical Association 69, 555-559.
Moore, D. H. (1973). Evaluation of Five Discriminant Procedures for
Binary Variables. Journal of the American Statistical Association
68, 399-404.
Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory,
John Wiley and Sons, Inc., New York.
Okamoto, M. (1961). Discrimination for Variance Matrices.
Osaka Mathematics Journal 13, 1-39.
176
Okamoto, M. (1963). An Asymptotic Expansion for the Distribution of the
Linear Discriminant Function. Annals of Mathematical Statistics 34,
1286-1301.
Olver, F. W. J. (1974).
Press, New York.
Asymptotics and Special Functions, Academic
41'
I
Revo, L. T. (1970). On Classifying with Certain Types of Ordered
Qualitative Variates. North Carolina Institute of Statistics
Mimeo Series 708.
Smith, C. A. B. (1947). Some Examples of Discrimination.
Eugenics 13, 272-283.
Annals of
Specht, D. F. (1967). Generation of Polynomial Discriminant Functions
for Pattern Recognition. IEEE Transactions on Electronics and
Computers EC-16, 308-319.
Van Ryzin, J. (1966). Bayes Risk Consistency of Classification
Procedures Using Density Estimation. Sankhya A 28, 261-270.
Wald, A. (1944). On a Statistical Problem Arising in the Classification
of an Individual into One of Two Groups. Annals of Mathematical
Statistics 15, 145-162.
Wald, A. (1947).
New York.
Sequential Analysis, John Wiley and Sons, Inc.,
Wegman, E. J. (1972). Nonparametric Probability Density Estimation:
I. A Summary of Available Methods. Technometrics 14, 533-546.
Welch, B. L. (1939).
218-220.
Note on Discriminant Functions.
Biometrika 31,
I
•