St. Pierre, Jacques; (1954)Distribution of linear contrasts of order statistics." (Air Research and Dev. Command)

DISTRIBUTION OF LINEhR CONTRASTS OF ORDER STATISTICS l
by
Jacques
St~Pierre
University of North Carolina, Chapel Hill
Institute of Statistics
Mimeograph Series No. 110
JUly, 1954
1. This research was supported in part by the United
States Air Force, through the Office of Scientific Research
of the Air Research and Development Command.
11
ACKNOHLEDGMENT
The author wishes to thank heartily Professor
R. C. Bose, who suggested the problem, and without
whose enthusiasm and inspiring guidance the work
would not have been completed.
The financial help of the Ministere de la
Sa~te
of Canada, through its federal-provincial
plan with the Province of Quebec, and the grants
in aid from Universite de Montreal, are acknowledged
with gratitude.
Thanks are also due to the United States
Air .1"orce for support 'through the Office of
Scientific Research of the Air Research and
Development Command.
iii
TABLE OF CONTENTS
Page
ACK1\JO\, LEDGMl!.'NT
ii
INTRODUCTION
v
NlJ.LL DISTRIBUTION OF LINEnR CONTRH.STS OF ORDER
STfl.TISTICS. Cfl.Sh: OF THImE. ~.NDFOUR VAR~B:::"ES
1
Chapter
I.
II.
III.
1.
Case of Three Variables
2.
C~se
13
of Four Variables
NULL DISTRIBUTION OF LINEAR CONTful.STS OF ORDER
STATISTICS. CASE OF n+l VlillIABLES.
34
1.
General Linear Contrast
34
2.
Null Distribution of the Difference
between the two largest sample values
in the general case
43
MOm;NTS OF THE NULL DISTRIBUTION OF LINEAR
CONTRtI.STS OF ORDER STATISTICS
1.
General expression for
of those dimensions
2.
Moments of Low Order
~
in the case
53
53
60
3. Brief study of the skewness and kurtosis
of the distribution
IV •
NON NULL DISTRIBUTION OF LIN&.R CONTful.STS
OF ORDER SThTIS'I'ICS
1.
Case of Three Dimeusionsj General
Considerations
78
Case of three dimensions under two
particular hypotheses
91
A DECISION RULE TO PICK OUT THE POPULi-loTION
t\ITH THE LARGEST MEI-.N
100
2.
V.
71
iv
Table of Contents (Continued)
1.
General Considerations
100
2.
Some Properties of the Suggested
Decision Rule in the Case of Three
Populations
105
3. Suggestions for Further Research, and
Concluding Remarks
113
BIBLIOG&~PHY
115
INTRODUCTION
It can be said that, up to the last decades or so,
two different types of approaches have been used in the
tackling of practical and theoretical problems of statistical
nature.
The first, mostly applied to data of the quantitative
type, relied upon properties of the assumed distribution of
the parent population, and on the properties of the eampledistributions ·of the statistics derived from the observations.
That type may be called the parametric approach.
The second,
mostly applied to data of qualitative flavor, relied more
heavily upon the observations and their properties, especially
the rankings.
That is the nonparametric approach.
It was felt that a mixture of the two
~ethods,
using,
for instance, the information contained in the ordered (or
ranked) sample values, and some properties of the assumed
distribution of the parent population, would provide
answers that could not be obtained some other way.
Many authors, working abng that line, have brought
1
forth a great wealth of precious information.
1.
Wilks [10]
Numbers in square brackets refer to bibliography.
e.
vi
presented a comprehensive study of order statistics including
a rather complete bibliography up to the time of its publication.
Mosteller [12] , Godwin [5], and Nair [15], studied the
properties of particular combinations of 1I 0rdered" sample
values, called, by Mosteller, "systematic statistics ll •
However, no unified treatment of a class of ordered
statistics has ever been published, especially in non null
situations, and it was thought worthy of investigation to
study, in more detail, the properties of the distribution of
order statistics.
In the following work, some properties of the distribution
of linear contrasts of order statistics are investigated.
The null distribution of linear contrasts of order
statistics is obtained, first, (Chapter I) in the cases of
three and four variables; then (Chapter II), in the general
case of n variables.
Chapter III, consists of a detailed study of the
properties of the moments, and related quantities, of the
linear contrasts of ordered statistics in the case of three
dimens ions.
In Chapter IV, the non null distribution of linear
contrasts of order statistics, in the case of three variables,
is derived, and two particular hypotheses are considered.
vii
Finally, on the basis of the information gathered
in the first four chapters, a decision rule to pick out
the population with the largest mean is suggested in the
fifth chapter, and some properties of the decision rule are
discussed.
CHAPTER I
NULL DISTRIBUTION OF LINEAR CONTRASTS
OF ORDER STATISTICS.
CASE OF THREE AND FOUR VARIABLES.
Case of three variables. Suppose we are given xo' Xl and x2 '
three independent random variables, normally distributed with
1,
unknown means, mO' ml and m , respectively, and with a known
2
2
common variance 0 , We shall assume, for the present, that
0
2
•
= 1,
We may order the set of values, in a random sample of the
above variates, from greatest to least, and denote the ranked
values by X(O)
> X(l) > x(2)' It is to be noted that the proba-
bility of ties is zero in the continuous case; however, because
of the limitations of the measuring instruments, ties will occur
in experimental cases.
In those situations, the tied sample
values will be "ranked ll using a random procedure assigning equal
probability to each ranking.
Any combination of the ordered sample values of random
variables is called "order statistic,"
In this work, we shall
be concerned with linear contrasts of order statistics.
contrast is defined as follows:
consider a set of random vari-
n
+ a y 1s a linear
n n
A linear
contras~provided
L a , = 0,
1=1
1
2
In the case of three random variables, we may consider the
2
following contrast
cOx(O) + cIX(l) + c 2x(2)' where
It will be convenient to set
Co = Ij
1:0 c i
= 1.
eo we shall, henceforth,
consider the expression
,
(1.1.1)
where c 1 + c = 1, c i ~ 0, i = 1, 2.
2
The above expression can be written, in terms of a single
unspecified parameter, as fol1ow3:
We shall) presently, derive the probability density function of
(1.1.2) und8r the null hypothesis:
(1.1.3)
The
join~
probability density of x(O)' x(l) and x(2) may
be written, (see ~lillts
C191),
(1.1.4 )
First, let us introduce the following transformation
(1.1.5)
3
In matrix notation, the above can be written,
u* =
(1.1.6)
,
AX
where U* and X are column vectors; i.e.,
U
o
U*
=
u
x(O}
,
1
X
=
x(l)
u
2
..,
,
x(2)
and where A is the square matrix:
..,
=
A
1
1
1
1
-1
o
o
1
-1
The Jacobian of the transformation is easily checked to be equal
to 1/3. We thus have
2
2
t X(")
i=O 1.
=
XIX
=
1
1
(A- U*)I A- U*
= U*t
B* U* , where B*
= (A- 1 )1
It is readily found that
B* = 1/3
1
o
o
2
0
o
1
:
]
=
1/3 B(say), the definition
of B being obvious.
We thus have, for the joint density of the u-variables, tho
following expression,
1
A- •
4
(1.1. 7)
= 2
L 2rc_7
-3/2
1
exp ( - b u* I B u* )
,
where
- 00
< Uo < +
00
The variable U
o is
j
+
u.~
>0
,
i
= 1,
2.
readily integrated out; and, since,
00
- 00
we get
where
c =
2
.,
1
U being a column vector, and C a symmetric matrix obtained from
the matrix B by deleting the first row and first column.
We are now in a position to introduce our linear contrast
given by (1.1.2).
Let
where 0 .$ c .$ 1.
(1.1.10)
,
The variate zl is precisely the linear contrast (1.1.2).
The
Jacobian of the transformation is obviously equal to one.
In
matrix notation, (1.1.10) becomes
(1.1.11)
Z
=D U
,
,
D
.
where
Z
=
=
It follows that
(1.1.12)
UI C U:= Z'(D-l), C D-
1Z
= ZI M Z,
where
M = (D- l ) 1 C D- l •
It is readily verified that the symmetric matrix M has the
following form
2
(1.1.1.3)
M
2c-l
=
The mapping of the u-space onto the z-space, involved in the
transformation (1.1.11), is rather simple; the region of variation
in the latter space being a wedge-shaped region, in the upper right
hand quadrR.nt, limited by the lines z 2
Thus, the joint density of z 1 and z 2 is
= Q. and z 2 = z 11(l-c )" c
~ 1-
6
where M is defined by (1.1.13); the limits of variation being
given by,
(1.1.15)
o < Z 2 < Z 11(l-c)
The density function of Z 1
,0
<Z 1
= x( 0)
j
c
f
1.
- eX(l) - (l-c )x(2)
will be obtained from (1.1.14) by integrating out the variable
z 2 ; formal'iy, we have
Zl/(l-c)
g(z 1)
=
[
o
In the present case, the integration is easily carried out.
Expanding the matrix M and collecting terms,we can write f(Zl'Z2)
as follows:
exp
- "31 ( c 2
- c
1
2 + (2c - 1) z 1 Zz J _
+ 1) Z 2
Since
- (2c -1) :2 Z 12/ 4 (c 2 -c+1) ,
'7
we get, after rearranging the terms,
Hence, we can write formally
Making use of the transformation
finally, write the density of zl
follows:
= x(o)
- cX(l) - (1-C)X(2) as
8
From expression(1.1.19)severa1 important particular
cases can be derived.
(i)
If we set c
(1.1.20)
= 0,
w
we get the case of the range
= x(o)-
x(2) , and the density comes out as
wi fb
,
or,
(1.1.21)
,wl (6
g(w)
dt
which is the form obtained by McKay and Pearson 1:11
(1i)
Setting c
= ~,
we get
1.
,
9
(1.1.22)
where
(1.1.23)
The statistic u has been studied by McKay tlO
studentized form of u by Nair t14
deviate from the sample mean".
J
J , and the
who called it the "extreme
It is readily seen that
v/2
2
exp{-t /2)
putting t =
/2
dt
I
t
J
and substituting in the above expression,
we get I (dropping the prime),
v
2
exp{-t)
o
dt
•
10
It follows that the density for the statistic u is given by
3u,/2
3
(1.1.24) g{u)
= (3) 2 (2)-~n-lexP(_3u2/4)
which is the form gotten by McKay tlO
(iii)
2
exp(-t ) dt ,
J.
The case of the difference between the two largest sample
values can be obtained,as a limiting case from (1.l.l9), allowing
c
->
1.
.. 00
Ordinates of g(zl) have been obtained for values of zl'
0(0.2)4., with the help of tables of the normal probability
function, [17], [18]
j
and, tables of the exponential
11
function C16]. Table I summarizes the results for several
values of the parameter c.
TABLE I
Table of ordinntes Ofg(zll, for various values of the constant
g(z)
z
c
a
0.1
0.2
0.4
0.6
0.8
0.9
1.0
0.0 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.84628
0.2
.10917
.12101
0.4
.21095
0.6
.29932
.32877
0.8
.36927
.1+0194 .43988 .53834 .64049 .63299 .58344 .53652
1.0
.41774
.44958
.48459
.55897
1.2
.44376
.47102
.49861
.54388 .53555
.45020 .40687 .36855
1.4
.44833
.46822
.48548
.49838 .45187
.36497 .32709
1.6
.43408 .44502 .45086 .43473
1.8
.40476
.40647 .40149
2.0
.36474
.35800 .34410 .29160 .22168
2.2
.31842
.30485
.13709 .17898
.26659
.49282
.73839
.78334
.23318 .26050 .33877 .47562
.70969 .75554
.70763
.36410 .45941
.71048 .67281
.62378
.28468
.59859
.60386
.36725
.36270 .28937
,22659
.16529
.54116 .49344
.28832
.45022
.29429
.25636 .22920
.22196 .19588 .17410
.16649 .14590
.12896
.12170 .10593
.09315
12
Table I (continued)
g(z.)
0
0.1
2.4
.26981
.25145
2.6
.22221
.20121 .17665
2.8
.17809 .15639
3.0
.13903 .11819 .09715
3.2
.10580 .08692
3.4
.07853
.06225
.04775
.02707 .01625
3.6
.05690 .04345
.03215
.01727 .01006 .00656 .00548 .00469
3.8
.04026 .02975
.02109 .01073 .00606 .00389 .00327 .00277
4.0
.02782
.01348 .00650 .00356 .00225
z.
c
.01971
0.2
0.4
.23115 .17064
0.6
0.8
0.9
1.0
.12000 .08668 .07497 .06560
.12479 .08484
.06016 .05171
.04504
.13290 .08871 .05840 .04068 .03477 .03016
.06135
.03918 .02679 .02278 .01968
.06904. .04130 .02556 .01720 .01455 .01252
.010'(6 .00905 .00777
.00188 .00159
13
xl' x 2 } and x ' norraally dis3
tributec1 HUh unknownneans, mO' ii11' v'l:? and m respectively, and
3
i.ndependent random variabl'3s
:;~o'
lV'I,th a !mOlm oormnon variance (} = 1 (say).
X(O) >
XCI)
Let us denote by
> x(2) > x(3)' the ordered s~nple values of the above
variates; 1'16 shall derive the pro')ability density function of
3
1: o.
i=l
c
l
=1
J.
+ O ~ 1,
2
a -<
a~
c. < 1, (i
J.-
Ci~ 1, (i
= 1,
= 1,
2, 3); or, more precisely, of
2),lli1der the null hypothesis
7irst, the joint density of x(O)' XCI)' x(2) and X(3) is
'riven h.'
Proceedj.n,:' as i-'1 the previous caSG,
formation
1:8
now make use of the trans-
(i = 1,2,3).
In inatrix notation, (1.2.2) becomes
..Thore U-l~ and X are col wnn vectors, i. e. ,
-uo
U=
1
-x(O)
~'
! u2
X
!
X(l)
::>
1
,
X(2)
Lt:3J
~C(3)
j
and A is the follOi"1ing matrix,
A
1
1
1
1
1
-1
0
0
0
1
-1
0
1
10
0
1
-1
=
L
The Jacobian of the transformation is eCjual to
3
?
.6 x(-.) =
i=O J..
that
-
1/4.
~}
-
.
Ool' B be:m" obyious.
l\. litt18 al co o'bra ShO"1S that
It follows
15
(102.4)B* = 4:J.
I
0
0
0
0
3 2
1
0
2 4 2
0
1
=
1
'4 B (say)
•
2 :;
Consequently, we can write
where
-00
< U o < +00; ui > 0 , (i=1,2,3). Since U o is orthogona.l
to ui ' (i=1,2,3), it can easily be integrated out.
But
!
,,,", 00
16
so. we have
where
3 2 1
U
1
U =
u
2
J
C =
1~3_
2
4
2
1
2
"
.
J
the matrix C being
obtained from B by deleting the first row and first column.
(1::::2,5); the variate
Z. =
~
being precisely the linear contrast we are interested in.
2
1
In
matrix nota,tion, the above tri3,nsformation can be written Z = DU,
where
Z;
and U are
C01Ulllrl
vectors, arid li is given by
17
D =
o
1
o
o
0
1
The Jacobian of the transformation is obviously equal to unity.
1ve thus have
r
r
1
U C U = Z (lJ-)
r
C D-
1
Z
= Z r Ivl
Z
where
Algebraic manipulation leads to the following form for the
symmetric matrix M:
3c 1 - 1
(3c -l)(c -l)+2(c +l)
1
1
1
3c 1+3c -2
2
(3c 1- 1 ) (c 1+c 2 -1 )+c 1+ 1
Dc 1+ 3C2 -2) (c 1+C 2 -1)+c 1+c +2
2
18
N.B.
Some of the elements of the matrix M can be expanded and
written down as polynomials; however, the above form sheds a
little more light on the way the elements are built up.
We now
can write
3
f(zl,z2,z3)
= 3l
2 (2fi)- 2
exp
1:- ~
Z'M
z]
In order to know the domain of variation of the variables
z, we have to investigate the mapping involved in the transformation (1.2.7).
The relevant region of variation in the
u-space consists of the octant limited by the positive portion
of the axes Ou i , (1=1,2,3).
The mapping of these three axes,
involved in the transformation (1.2.7), proceeds as follows:
(i)
The semi-aXis
O~,
determined by u = u = 0, is mapped
2
3
I
into the semi-axis 0 zl: z2= z3= O.
(ii)
The semi-axis OU2 is determined by ul = u = 0; the
3
transformation (1".2.1), in this case, implies
(1.2.10)
19
°IJ2 ,defined by the equations
I
Consequently 0u
2
is mapped into
(1.2.10).
(iii)
Finally, for the axis Ou), defined by u
l
= u2= 0,
the
transformation (1.2.7) implies
(1.2.11)
z2 = 0 ' )
z
I
hence, OU) is mapped
into the line 0 L) defined by (1.2.11).
ThUS, in the z-space, the region of variation is a wedgeshaped region generated by the lines
Since the equation of the plane determined by
the variables z have the following limits of variation:
20
(1.2.12)
In order to get an expression for the density of
Zl
= x(O)
- clx(l) - c 2x(2) - (1 - c l - c 2 )x(3)' we have to
integrate out the variables z2 and z,'
This is best accomplished
if, first, we reduce the matrix M to a diagonal form (except
possibly, the first row and first COlumrl).
That reduction 1s
obtained by making use of the following transformation
(1.2.13)
v
2
=
,=
v
the polynomial coefficients of z2 and z, being merely the
expanded form of the elements m and m of the matrix M,
22
23
(expression 1.2.8). In matrix notation, the above transformation
can be written V
= NZ,
where
1
(1.2.14)
N =
o
o
o
1
o
o
21
The Jacobian of the transformation is obviously equal to
2
.
)-1
2
, where 3cl - 2c l + 3 > 0, for all values
l - 2c l + 3
of c l • The expression Z'MZ in (1.2.9) becomes
( 3c
Z'MZ
= V'(N-l)'MN-lv = V'PV,
where P
= (N-l}'MN- l •
Actually,
the symmetric matrix P takes the form
(1.2.15)
p
= ~2=-=1_­
1
o
3cl-2cl+ 3
j
the relevant elements, only, having been written.
Consequently,
the transformation (1.2.13), together with the density (1.2.9),
lead to
or, to its expanded form
(1.2.17)
Formally, we can get,from (1.2.17), the density of
22
VI = X(O) - cIX(l) - c 2x(2) - (1 - c l - C2 )X(3),by integrating
out the variables v and v over the proper domain. This domain
2
3
will be given by the mapping, of the z-space onto the v-space,
brought forth by the transformation (1.2.13).
We shall, presently,
study the mapping involved in this case.
Transformation (1.2.13) can be written
VI = zl
(1.2.18)
v
v
2
=
aZ + bZ
2
3
3
=
z3
a
= 3c 2i
b
= 3c 2l +
where
j
- 2c l + 3
3c 1c 2
3c l - c 2 + 2.
and 01L , which
2
3
determine the wedge-shaped region in the z-space, are mapped
We shall now see how the three lines Olzl' 0lL
into the v-space.
(i)
Case of line O'zl defined by z2
= z3 = O.
The transformation (1.2.18) implies that v = v
2
3
hence, the line 0 I zi is mapped into the line 0" v1 •
(i1)
Case of line 0 11 2 defined by z3
= 0,
z2
= Zl/(l
= 0,
- c 1 ).
The transformation implies now
(1.2.20)
relations determining the image 0"L
(in the z-space).
2
(in the v-space) of 0'L
2
2)
( iii)
z3
C,:.;,se of the line 0lL
= z/(l
3
determined by z2
= 0,
- c1 - c2 ) •
This time, the transformation implies
(1.2.21)
relations defining OIlL3' the image of 01L " 'l'he orthogonal
3
projection, 0"L , of the line 0"1 on the plane v = 0, is
4
3
3
given by the equations
.,
consequentl~according as b/(I-c -c ) is greater than or less
l
2
than a/(l-c ), we have two analogous wedge-shaped regions, one
l
of them being convex"
It is quite evident that, for the purpose
of integrating out the variables v2 and v , the two regions
3
coalesce.
For the sake of discu6sion, we shall use the region
obtained when
This wedge-shaped convex region, determined by the axis O"v I ,
and by the straight lines 0"L and 0"L ,can be split up into
2
3
two regions; HI and R " RI is genei'ated by the axis O"v l ' and
2
the straight lines 0"L
straight lines O"L
2'
3
and 0"L 4; "'hile R is generated by the
2
O"L
3 and OIlL1+"
The equations of the flats bounding R1 are
(1)
v
3
= 0,
determined by O"v and 0"L4,
l
2 = bV 1/(1-c l -c 2 ), determined by 0"L 3 and 0"L4 ,
(ii) v
(iii)V
3
= V2/b
, b ~ 0, determined by O"Vl and 0"L •
3
In the case of R , the equations of the flats are
2
(i)
v = 0, determined by 0"L 4 and 0"L2 '
3
(ii) v = bV /(1-c -c ), determined by 0"L 4 and 0"L '
l
l 2
2
3
(1ii) aV l - (1-c l )v2 + ((a-b)c l + aC 2 + b - a} v = 0,
3
and 0"L •
2
3
Consequently we have the following limits for the variables;
determined by 0"L
case of R :
l
o < v3 <
inequalities defining the region T
~:
b
= 3c12
l
V
2
/b
(say).
+ 3clc2 - 3cl - c 2 + 2
> 0,
(1.2.24 )
inequalities defining the region T (say);
2
for 0
~
ci
~
1, i = 1,2.
25
Thus,forma11y, we have
I
,/
T +'f
1 2
Making use of the expanded form (1.2.17) of f(v 1 , v2 ' v,),
the above expression becomes
(1.2.26)
3~2 •
exp (-3vi/8)
= (2n)3!2(3c 21 _2C 1+3)
I
f
;
I
!
,I
I
v
r
I
I
o
"/0
ravl+(c1-l}v2
r
f
. tb-a)cl-ac2~a-b
7 1 (v
1
,V
2
)dV
I
2
!
'-./
o
-~
26
where
(1.~.28)
all c l ' c ; the density (1.2.26) becomes, after some obvious
2
simplifications,
27
o
o
I:J.V
av +(c -l)V
1
2
1
(b-a)c -ac +a-b
1
2
1
'Y
(v ,v )dV
3 1 2 2
J.
where
Setting
28
we can write
~.
!
h (v1 )
5
!
!
)
a(t )dt
2 2
h (v )
4 1
/
h2 (v ,t 2 )
1
...
I
j
a(tj)dt j
h1(v1 )
(1.2.jj) •
+(
)
f
h {V )
5 1
h j (v ,t )
1 2
r
h (V )
6 1
i
I
I
f
a(t )dt 2
2
I
a(tj)dt j
I
I
Jh (v )
1
1
-,
29
,
.where
50
Using numerical integration methods, graphs of f(V ) can
l
be obtained for various values of c l and c 2 ' 0
< cl '
c2
<
1,
We shall, presently, write down three particular cases of
the density function defined by (1.2.33) and (1.2.34).
(i)
Case of the range.
Setting c 1 = c 2
= 0,
we get w=x(O)-x(l)'
Then, expressions (1.2.33) and (1.2.34), after a few obvious
steps, simplify into
dt,
which is the form given by Nair
(ii)
Case of the "extreme deviate from the sample mean".
Letting, c
where
Ct4] .
l
= c
2
= 1/3
in our linear contrast, we get
51
- 1- , and
The statistic u was studied by McKay, '-10
studentized form by Nair, [14
C
l
= c 2 = 1/3
J.
its
The substitution of
in (1.2.33) and (1.2.34), yields
.vl
2vl fb
.(6
I
\
b
o
Relation (1.2~36) readily leads to the density of the statistic
u.
In fact,we get
(1.2.37)
where
l(u) is given by thetollow1ng expression:
32
I(u) =
8u/3.(6
-
,
where
. (iii)
Cdse of the difference between the two largest sample values"
Tl~
linear contrast, in this case, becomes Y=X(O)-Y(l)"
It is not permissab1e to put c l = 1, c = 0 directly into the
2
expressions (1.2.33) and (1.2.34)"
argument used previously, we obtain
However, carrying the
53
(2t 2-y)/
00
r
(1.2 • .59) fey): 3l2exp(_y2/4j)
{;
0
where, as before
a(t)
= (2n) _..i2
Q(t )dt '
3
3
Q(t )dt
2
2
y/2
2
exp (-t /2)
12
cnAPTER II
FULJ:, 'JISTRI0UTION OF
OIl!J""in STATISTIC.3.
1,
LIN·~Ji:'
COHTHASTS OF
CJl.Sn:: 0::' n + 1 Vfu:1IABLES.
Ci3neral linear contrast.
SUPI'OSO ue have
n + 1
independent
random variables, xO> xl' "., x ' nor.lally distributed l'lith unn
lmOlm ;ueans, m ' m , , .. , m , respectively) and \'1ith a kn01m
o
l
n
the ordered sample values of the
~)ove
variates, we shall indicate
how can be derived the probability density function of the linear
n
- cnx(n);
o .....<
c.J..< 1,
(i
= 1,
Z c. = 1,
i=l
1.
2, , .. , n); or, rlore precisely, of its equi-
valent form
...
(2,1,1)
n-1
Z c. < 1,
'11.J.=
under the
nUl~
o -<
hypothesis
c. < 1,
1.-
HO :: m =
O
U ::
'1
1
1, 2, ,., n-l),
= •. ,
:: In
n
= 0 (say).
The joint density of x(O)' x(l)' .,., x(n) can be written
as
35
Let us considor the transformation
,(i=l, 2, ... , n),
u.1 =
.,~
which can be written, in matrix notation, U
= AX, where U.,*" and
X are col1..urm vectors, io e. :
u
U~~
=
.x(O)
o
~
X
=
un
x(l)
;
x(n)
- -and the matrix A is given by
1
1.
1
1
1
1
1
-1
0
0
0
0
0
1
-1
0
0
0
0
0
0
1
-1
0
0
0
0
0
1
-1
A=
...
'rhe Jacobian of the transformation equals (n+l)-l.
36
It follows that
n
~
= U~~I B~~U*,
r'\
X
C
(•.• )
i=O
~
= X IX =
1'1herc ,;1\ =
,.
~~
Actually i t turns
that B"
1
=
-113,
n+
1'11'here the symmetric matrix B is given by
(2.1.1+)
-1
0
C
0
0
0
G
J
n
n-1
n-2
n-3
3
2
1
2(n-l)
2(n-2)
~J(n-3) ...
6
4
2
3(n-2)
.,(n-3)
9
6
>
,
•
•
(n-?)J
(n-2)2
0-2
(n-1)2
n-1
n
oro1y the re18vant terms havi.ng hem) l'1ritten.
The density (2.1. 2)
thus bccones
.f( u ","
L
11
1
, ... ,
Intogratinf: out tho
11 )
il
U
o
'-=
;]!
_r_ 2(n+1)
1
U~~IBU~~7.
(?H)-(n+1)/2 0xp
var'iab10, H()··'Jt
I
37
''\There
U
=
, and C is the symmetric matrix obtained from
u
n
B by ueleting the first row and first column.
:le are now in a position to introduce our statistic
(2.1.1).
Let us put
(2.1.7)
Z.
l
=
(i = 1, 2, ... , n).
We imu2diately notice that zl is )rocisoly the linear contrast we
,
are int<3rested in, and that the Jacobian of the transformation is
equal to unity.
wrHton
Z = DU,
In matrix notation, expression (2.1.7) can be
whore
1
(2~Ln)
J =
Z
l-c l
l-c 1 -c ?
-
l-c l - ... -c 1
n-
'-
O
1
0
0
0
0
1
0
o
o
o
... 1
o
C'
o
o
••. 0
1
;b
It now follows that
I
U C U
=Z
I
I
1 I
1
(D- ) C D- Z = Z. to. Z, where
The
1\i
sYmmetric matrix M has the form
,
i,J:::l,2, ••• ,n,
where
j
~
i,
i,j=1,2, ••• n.
The mapping involved in the transformation (2.1.7) is u
straightforward generalization of the one involved in the case
of four dimensions, (see Chapter I, Section 2).
The axis Qul' determined by ui=o, (i=2,5, ••. ,n) is mapped into
° zl:
I
the a.xis
The
~xes
Qu., i=2,3, .•• ,n,
I
mapped into the straight lines 0
.L.
1.
detel'mined by
1
39
i, j = 2,3, •.• ,n.
z.
J
= 0,
j
f
i
Consequcmtly: the region of variation of the variables
Z
''lill be
a 1-'lOdgo··shaped region, in the n-dimensional space, 1)ound0d by
flats passing through the origin.
Taking the variate z for inn
.
stance, we have that
o<
zn < -;-zl-(l-Cl)z~... -(l-cl-"·-C n- 2)z n- 17
r
-
l-cl-",-C n _l
I
0; for fixed zi'
i
/
-;-l-cl-···-cn- 1- 7,
= 2, 3, ... , n-l.
SimilarlY,we have
l-cl- ••• -c n _?
f
0, for fixed zi' i = 2,3, •• ', n-?.
This feature being general, we can write
(2.1.10 )
for fixed z., j
J
< i,
i
2,3,
<
•••
n.
Of course} we have also zl > O.
The above considerations allow us to write
:, ~"iven by (2.1.9)} and the roc-ion of variation, given by (2.1.10).
Integrating out the variables z2,z3"",zn from (2.1.11)
over the region (2.1.10) we get, for,ilally, the density of
which would reduce the luatrix 0 to a dia[onal matrix, (except posst
sibly for the 1
row), would simplify 'ilatters considerably. Such a
transformation exists and several practical methods yielding the
dJsired transformation are known; for instance, Lagrango's and Kronecker I S methods.
~)f'
Suppose the transformation is
VI
=
zl
v =
2
r 22 z
2
v =
+ r
r
3
vn
the form
=
z +
23 3
z
33 3
+
...
+ r
z
2n n
..
+ r
z
3n n
,
r
z ,
nnn
where the coefficients r .. are known functions of the elements
~J
m.. of the matrix M.
~J
In ~atrix notation, (2.1.12) can b~ written V = RZ, where
v :::
, and thJ triangular matrix h is given by
v
n
1
0
(2.1.13)
R
=
0
0
r
... r 2n
r
... r 3n
33
r?3
22
0
0
o
o
0
o
r
nn
The J acol;ian of the transformati.on is uqua1 to
-1
n
"IT
i=2
w· ",;r8 1" ::: ( lL... -1),
ntJ-
1,
II' 11
.. I
1-
.
tho sym.me t r1G
.
l;J...lL,J.'lX
'II t,eing 0 f the fOl'm
t~2
tIl'
tIl
t
t?2
(2.1.15)
T=
t
l3
tIn
0
0
0
33
t
nn
The problom of rnapping is rathor complicated, at least in the present fornmlation
A general discussion would be
point1es~since
too many liossibilities would have to l.Je considored.
Howevor, for
any given nu.mber of variates, the i":latrix R would be known; and,
the mapping, in that case, would oe as simple as in the cases discussed in tho first chapter.
Tho density (2.1.11) becomes
(2.1.16)
In expanded form} we have
f(vl, ••• ,vn ) = n!
In+l(2n)-n/2~jt~lriil_7-1 GXP~-tl1vi/2(n+l)_7
].=<..
. exp;--(t
-
2
v + 2t1 vlv )/2(n+l) 7.
n
n
-
nn n
43
It is obvious that by simple transformations we could introduce
the normal probability function.
Then the density of vI would be
obtained by integrating out the variables v 2 ' •• •,vn,over the proper
region.
Tho density of vI would be zivon by an expression involving
iterated integrals of the normal probability function, over a
wedge-shaped region, given essentially by the mapping involved in
the transformation (2.1.12).
We shall presently seo, in the next section, how things are
shaping up, by considering a special case of our linear contrast.
2.
Null distribution of the differonce between the two largest
sample values in the general case.
·We are now concerned with the
arc the two largest values from a smaple of n+l independent random
variables xO,xl, .•. ,xn '
We assume, as before, that the variates
are normally distributed with unknown ,noans,
mO,~,
tively, and with a known comnlon variance a?=l (say).
get the density of
(2.2.1)
it is not permissable to simply substitute
c.
~
= 0,
i
= 2,3,
... , n-l,
••• ,mn' respecIn order to
in the general oxpression (2.1.16).
hmvover, since the first.
fm" steps indicated in section 1 of this chapter do not impose
any rostrictions on the parameters
c1' ••. ,c n _1 ,
prossion (2.1.6) as a starting point.
we may use ex-
Thus we have
where
,
U=
u
n
"nd the symmetric matrix C is given by
n
n-l
n-2
3
2
1
2(n-l)
2(n-2)
6
4
2
3(n-2)
9
6
3
(n-2)3
(n-2)2
n-2
(n-l)?
n-l
(2.2.4) C =
11
45
We immediately notice that the variate U is precisely
1
th,] linear contrast (? 2.1); hence, to get the density of up we
only have to integrate out the variates u.,
~
tho proper region.
i = 2,3, •.. ,n, over
The process of integrating out the unwanted
variates is easier to carry through,if the C matrix,given by
(2.2.4),iS simplified.
The possibility of a reduction of the sym-
metric matrix to a simpler form is blOwn to exist.
Intensive in-
vestigations have shown that the fo1lowtng transformation will do
the trick.
Let us put
2un-1 + un
(n-2)u
zn =
that is, in matrix notation,
z = nu,
where
3
+ ••• + 2u _
n 1
+ un
,
z=
z
n
and, the triangular matrix D has the following form
1
(2,?6) D
=
o
o
o
o
o
n-l
n-2
n-3
2
1
n-2
n-3
2
1
2
1
1
It is roadj.ly
S83n
that the .J acobian 'Jf the transformation is equal
to l/(n-l)!
I t follows that
,.!')
, CD.
-1
= (-1)
D
Siflce jJ-l turns out to be, of the furl
4'1
0-1
1
0
0
n-l
0
0
1
0
0
1
.. n-1
=
-
1
n-2
0
•••
1
.. n:2
0
0
0
0
0
0
1
,
obvious alccbraic steps load to the following expression for tho
symmetric r.latrix N:
(2.2.8)
n
1
2
n-l
0
·..
0
0
0
0
• ••
0
0
0
n+l
(n-l){n-2)
·..
0
0
0
0
0
M ::::
n+l
h·3
n+l
3·2
0
n+l
2·1
48
lve may, at this stage, investigate tho mapping involved in tho
1118 axis oUI is rivon by u "" ... =un=0.
2
transformation (2.2.5).
The transformation (2.2.5) implies zl = ul ' zi = 0, i = ?,3, ... n.
Hence,ou
l
is mapped into O'Z
OU is Given by u
2
zl = z3 =
••• ==
l
z
n
=
u
3
= 0',
1
=
BO OU
J
i = 2,3, .•. ,n.
un = O.
Now,(2.2.5) implies
J.
".
The axis
z. = 0,
is mapped into a I z2'
2
In the other cases, it is readily seen that OU. is mapped
J.
into the line otL., i
J.
Zl
z2
= 3,4, •.• ,n;
= zi+l
= z3 = ...
==
...
= z. l'
J.-
whero o'L. is defined by
J.
=
Z = 0,
n
i = 3,4, .•• ,n.
One notices immodiately that o'L.J.-I is tho orthogonal projection
of
o'L.J_
on tho euclidian space of i-I dimensions, i
In fact, olz2
= 4,5, ... ,n.
itself is the orthogonal projection of oIL) on the
euclid ian tv..D -dimensional space.
Consequently, the variates z.,
J.
i
= 1,2, ..• ,n, have the
follouing domain of variat,ion
°<
Z.J. <
z.J.- l '
i
= 3,4, ... ,n,
The donsity (2,?,2) thus becomes
7
0''''(
·,r-;::rr(2 n )-n/'2,'~XP.Lr - :?(n+l)
1
( 2.2,1')
1 Zl"'.'Zn ) = 1l,{n+.
ZIMZ_ ,
whore M is
~3i vcn
by (2.? 8), and the domain of variation of
zl"",zn ~ivon by (2.2,9).
Since M is very close to a diagonal
matriX, it will be possible to introduce the nornlal probability
function without messing up the limits of variation,
(?,,2,ll)
Vi = z3 / /(n-i-l)(n-i+2) , i
= 3,4, ... ,no
In matr5.x notation, vJG have
v = NZ,
whero
V=
, and
v
n
Lot us sot
So
(2.2.12 )
1
/n-1
"
N
0
0
·..
0
0
- - ·..
/(n-IHn-2)
0
0
1
-,ff.3
0
0
0
/2
0
!2tn401J
/(n+1)(n-1)
0
0
0
0
0
0
0
0
1
=
·..
0
It is readily verified that
1
n+
(-1) I
R=-lN
-1
MN.
Simplo algebraic manipulation leads to
trlG
fo1lowiI!-g form for R,
-/2
1
51
-1
(2.2.14)
I{
2
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
C
1
=
ThG llinits of variation are easily seen to be
o < v. <
~
/n-i+3
v. 1 '
/n-i+l ~-
~::;::;
VI jn-1
--<v?'
/?(n+l)
-
i
= 4,5, .. ,n,
.
52
Consequently; we can write
00
( 2.2.16) f (vI)
~i
0.-, -17
= - -(n+l)!
exp.
ftn /~
4(vI !n-l)/ j2'(n+I)
/n-1+3 v.1-1
-'."""'~
a(v. )dv.
1
1
...
a(v )dv ,
n
o
aCt) =
(2n)
1
- ~
2
oxp (-t /2)
It is roudily suon that oxprussions (1.1.25) and (1.2.39) are
particular cases of thu general rosult (2.2.16).
n
MOVillNTS OF ;rRE NULL DISTRIBUTION
OF LINlJLR CONTRJ\STS OF ORDER S~ATISTICS.
1.
Genen"l eX:;Jression for
~
in the case of three dimensions.
Let us consider the linear c;)ntrust
(:;.1.1)
It was shown in Chapter I, Section 1, that, under the null
hypothesis H : m = 1111 ::: m = 0 (say), the probability density
O O
2
function of z is given by
f(z)
where I (z) has the form
1
r
(c+l)z/(l-c)
l6(c2_c-t~
I
exp
L -t 2 /z_7dt.
!
I
r:-:--?-2
c/(2c-l)z/ / 6(c _c+l)
Denoting by
k'
lJ'
the k-th moment abollt the origin, "We have
54
It is to be noted that the above integral exists for any k
since I 1 (Z) exists, ( ~
(
f2R ),
> 0,
&ld
00
(
x
k
2
<+
exp (_ax )dx
00
for
I:l.
> ° and
all k
> O•
../0
Let us now introduce a new variable v, using the relation
(3.1.4)
It follows that dt
(2c-1)/ !6(c2 _c+l)
t
=z
= vz,
for every fixed z.
dv, and the limits of v are respectively
and (c+l}/(l-c) /6(c 2 _c+l) • We thus can
write
. 00
zk+l exp Lr-
where I (z) has the fonn
2
Z
2-j!.~.(c 2-c-ll)_7 I (z) dz
2
1
55
r
(c+l)!(l-c) !6(c 2 _c+l)
The above can also be written
00
where
1 (Z)
3
has the form
Since, in the present case, it is permissable to interchange the
two integral signs, we can write
56
where
I
( v)
4
has the form
00
,
and where
Setting
z 2 i..r 1+2(c 2 -c+l)v27/
_ 4(c 2 -c+l)
t~.1.6)
=U
j
it follows that
k
k
2
z :; z (c -c+l)
k/2
k/2
U
/
2
2 _ k/2
Ll+2(c -c+l) v _I
Since
,00
k/2 e -u du,
U
we have
.
57
t
(3.1. 7) ~k==:
NOl'l
(k+1)
. k 2
- 2
k+2
3·2 (c -c+1)
f(-t""')
-(k+2)
2
2
2
/I+2(c -c+1)v 7
dv.
---n------
-
......
-
if we put
tan 9
k 2
k/2
r(~)
( 3..
1 8) ~kt = 3·2 n(c -c+1)
2
cos k 9 d 9
,
where
(3.1.9)
91 = arc tan{(2c-1)/ /3),9 2 = arc tan(c+l)/(l-c)
t
r
Note: He should write IJ.k = IJ.k(c), since
of tho paramoter c.
I
~k
/3) .
is, in fact, a function
58
From relations (3.1.8) and (3 1.9) wo c&n get expressions of
I
lJ.k for
'~he
three particular cases th2.t we have pointed out
previously.
(i)
For instance, we havo
case of the range, obtained from (3.1.1) by setting c =
I
lJ.k(c) bocomos
n/6
(3.1.10)
o
This is a result obtained by NcKa;r and Pearson [11
(ii)
J.
Case of the extreme deviate from the sample moan.
Sotting c = 1/2 in (3.1.1) we got
whore
- x-
-
, x
=
2
l: x(. )/3
i=O ~
o.
59
The kth moment of v is readily obtained from (3.1.8) and (3.1.9);
in fact, we have, after obvioLlS simplifications,
n/3
3(k+2)/2 r(~)
I
!J.k(1/2) =
--n---
cos
k
Q d Q
Now
= (2/3) k
' 1/2);
!J.k(
hence
.n/3
(J .1.11)
o
(iii)
v!ilm:.s.
Case of the difforence batW(:8j.1 tho tl.vO li:lt'gust sample
If
loW
let c - > 1, we got
60
cos k Q de
2.
lIoments of low order.
~··je
shall nmv consider a few moments
of low order, and study their properties, considering the moments as functions of ~he parameter c,
0 < c ~ 1.
from expressions (3.1.9), that
sin Q
1
= (2c-l)/2 /c2
-c+l ,
C3.2.1)
these four quantities
~eing
useful in the sequel.
It follows,
tt
61
f
1st moment about the origin ~l(c)
= ~l(c).
From 0.1.8), wo have
cos
e de;
the above, together with (3.2,1), lead to
It, is clear that
Values of
f.l.
l
~l(c)
is a monotone decreasing fUl1ction of c.
(c), for particular choicos of c, ar·J 1i.stod in
table II.
From (3.1,8)
WG
get
62
~I(C)
2
2 2
n
= 3·2 (c -c+l)f(Zl
cos
2
steps, involving Tol",tions (3.2.1),
~~(c)
(3.2.3)
2
=
3!(c -~+1)f(2)
9 d Q
HO
after obvious
obtain
- /3(2-2c-0 2 )
]
2
+ 9 2 - 91
4(0 -0+1)
J e shall now study the behavior of the function of
givon by
(J.?3). First,
all values of
OJ i,8.
found to bG n/3.
92- 9 is
1
'I'hus )
.
=
''1
0
0 for
cvns-(,cmt; its value is rt.iadily
63
= f(2)Lf4n-3 !3)c-(2n+3 /3)_7 / n
o =: c =:
of c,
< 0, for all
I
1; consequently, !J.2(c) is a monotone decreasing function
0:: c :: 1.
I
V31uGS of !J.2(c), for a few particular c's,
are listod in table II.
3rd momont about
~h3
origin:
I
!J.3(c).
F'rom (J.1.8), we get
ink gratin::, the function of 9, and -iwkinr: use of (3.2.1), we have
(3.2.4)
It is r)adily·fo1JIld that
64
,
d~ (0)
~O -
2
:=
f(5/2){ -15 c +42c-33} /
I
of c;
hence'~3(c)
o<
-
fC
0 for all valUl..;s
<
is a monotone decreasing function of c,
c < 1.
Tabla II cOlltains a few valuoo of
,
~3(c)
for particular
choicos of tho parametor c.
I
4th moment about tho oriei~: ~4(c).
Again, from (3.1.8) we have
~~(C) =J024(C2;c+l)2ru)
I
J9
fran th:i.s,
lITO
:" 9 2
cos 4 9 d 8
,
l
got, aftar a fmv obv::i.ous steps,
,
In ordor to study tho behavior of ~4(c),for 0 ~ c ~ 1,
•
65
1
df.L4(e)
Lt US consider
-~­
de
140 easily find
"" -r3 /3 g(e)- 7 / n
,
whoro
0.2.6)
Thus, we have
dg(c)
de
= g(c) = 3(8n-9 /3) e 2 -24nc+12n+9 j3
'1l1d,
1
"lorGov.r, g (e) > 0 for all vnlum3 of c.
i_S
a strictly
iner~aslnr' .fUll~tion
oIlly ono r::!al root.
gee) < 0,
° ~ c ~ 1.
of
0,
Cons.:;qucntly, gee)
and, hencG,
Dossess~;s
Now, since: gel) < 0, i t follows that
'1'h3 af)ovo argw.:-:.:nt t.hus shows that
is a monotono decreasinE
fun~ti0n
of e, 0 < c < 1.
I
1lo4 (c)
66
Using the above four mom:mts, we shall now d.. Jri V8 exprt)ssions
for tho !1lOmcnts about the me-9.n and for the corrosponding cumulants.
The caS0 of the first mom3nt about tho mean is trivial.
i"Jo always have
For tho
s~)c{)nd i~lomont
'fLth tho socond cUffiulallt.,
HQ
a'!o11t
[iii,:: ,..Jan,
idontical in fact
h~vo
I
snbstitutin~tbo v·11u..38 of \.11(0) and 1-1'2(0) given by (3.2.2) and
(-'1. ?. 3)
j
'V1O
gc t, aft.~.r si~nDlificatiions,
It is readi.ly vori..fiod th:1t I3n-6
13 -
9 > 0, and
67
consequently,
~2(c)
> 0, for all values of c, which is what we
would expect of an expression giving the variance of a variate.
Moreover, from
it follows that
<0
if 0
,
<c <
2(:;
If +
21t .; 9)
&-6/3-9
and
2(3
ff
+ 21t -
9)
<c <1
&-6/3-9
in fact
= 0,
Consequentl~ ~2(c)
a~
c
< co;
is a monotone decreasing function of c,
a monotone increasing function of c,
and a stationary function for c
c
o --
c = 2(3 ff + 21t - 9) ... 0.868.
87t .. 6!3-9 -
if and only if,
2(3 ff + 27t .. 9)
&-6/3-9
...
0.868
= cO'
Co < c
~
1;
where
in
fact'~2(c)
passes through
68
a minimum at c
= cO.
From the symmetry of
~2(c)
and the above
considerations, it follows that f.L (1) < ~2(O). Particular
2
values of f.L (C) are listed 1n Table II.
2
The third moment about the mean, identical with the third
cumulant, is given by
With the help of expressions (3.2.2), (3.2.3) and (3.2.4), we
can write
(3.2.10)
~5(C)
=
3 [[7rc - 9/3 - 9)c 3 + (54 - 15rc)c 2 - 3(36 - 18
+ 2(36 - 18
~
-
In order to study the properties of f.L3(c)
rc17/
4rc
ff - Jl) C
3/ 2
= K3 (C),
let
us consider its first derivative with respect to the parameter
c.
We
get
df.L ( C ) .
3
dc
= g(c) = 9 L(7rc
- 9 ./3 -
2
9)c
- (36 - 18
It 1s readily found that
g(c) < 0 ,
g(c)
> 0,
for 0
c
< cO;
<c
:::; 1;
~
for Co
+ (36 - 10Jt)c
13 - rc }_7/41(3/2 •
6~
and
g(c)
= 0,
for
c
= cO'
where
J
-(18 - 5n) - [(18 - 5n)
2
+ (7n - ~ ff - 9)(36 - 18
(7n - 51
ff -
13 -
)1)
::. 0.5201.
Moreover, ~3(cO)
>
0 ; and, since the coefficient of c 3 in
(3.2.10) is negative, we have
~3 (c)
So,
~3(c)
o~
c
~
0 for 0 ~ c ~ 1.
is a monotone decreasing function of c for
< cO'
Co < c
>
1.
and a monotone increasing function of c, for
Table II exhibits a few values of ~3(c)
= K,(C).
The fourth moment about the mean can be written as
follows
().2.11)
ilith the help of expressions (3.2.2), (j.2.3), (3.2.4), and
(3.2.5), we get
wheI'e
n)
2
7
70
(3.2.1;)
b0
= 16(4n 2
bl
=-
b2
= 24(8n 2
b;
= - 8(1~2
b4
= 64n 2
+ 9
/3 n
+ 54
13 -
81),
- 21n + 6
13 n
+ 27
13 -
81),
-
,Ore
2b O '
- ;n - 27 ~ - 81),
+ 24n - 72/3n - 108
In order to study the properties of
the b's being defined above.
dg(c)
dc
/3 -
~4(c),
81.
let us consider
Since
= g'(c) > 0, for all values of c, it follows that g(c)
is a strictly increasing function of c; hence, g(c) possesses
only one real root.
It is readily found that the value of the
root, co,say, is in the interval 0.75 < c < 0.8.
A more precise value of the root was not computed, since
an exact knowledge of it is not too important in the case.
sequently, we have, from the above considerations, that
is a monotone decreasing function for 0
increasing function for
Co < c
~
~.
Con-
~4(c)
c < cO' and a monotone
1; cO' the real root of (;.2.14),
being on the open interval (0.75, 0.8).
The value of K4 (C) can be otained from (;.2.8) and (3.2.12),
using the relation
71
~
little ulgebra leads to the following result:
(3.2.16)
where
(5.2.17)
dO
= 16(21t
d
=-
1
- 11
ff
ff -
63
J
2d OJ
u2 = 16(1111
21t
= -8(291t
- 41t
d:;
+ 36
d 4 = 500 + &
ff
+ 18
f3 -
13 -
72
ff -
54) ,
18/3 - 36) ,
ff -
~O
The properties of K4(C) have not been investigated at this
point, since K4 (C) will be considered, at a later stage, as
a component of the quantity 7
2
= K4/K; •
3. Brief study of the skewness and kurtosis of the distributions.
We may, for instance, use the coefficient
as a measure of the skewness of our distributions.
From the
expressions K (C) and K (C) previously derived, we get
2
3
,
where
~(2
() •.5.3) .
glee)
=
(7n-9 ~ -9)c 3+(54-l5n)c 2-3(;6-18 ~ -n)c+2(36-18
g2(c)
=
(&-6/3 -9)c 2+4(9-2Jt-3!3)c + 4()
j3i
-n),
/3 +2n-9).
Properties of r 1 {c).
(i)
rl(c)
> 0,
for 0
~
~
c
1.
This follows immediately from the
previous discussion where it was shown that both K (C) and K (C)
2
3
are positive, 0 ~ c ~ 1. Our distributions are thus skewed to
the right.
(ii)
rl(c) is a monotone increasing function of c,
= O.
it is stationary at c
0
<c
~ 1j
This is easily shown)by considering
the first derivative of rl(e) with respect to c.
After some
obvious algebraic steps and simplifications, one gets
(3.3.4)
'.lhere
Since, a
O
+ alc + a c
2
o<c
(11) •
~
2
>
0,
Ij and
for 0
~
c
~
1, it follows that
= 0,
c
= OJ
hence,property
The coefficient
measure of kurtosis.
r2
=
Since
distribution, then, if
r2 >
K4/K~ is ordinarily used as a
r2 = 0
in the case of the normal
0, we label the distribution as
"leptokurtic," meaning that the "peakedness" of the distribution
is greater than in the "normal" case.
When
r2
< 0, the distribu-
tion is labeled as "platykurtic," the overall "flatness" of the
curve being somewhat more than that of the "normal" curve.
In the present case, we shall study the properties of
r2 (c) ,
considered as a function of the unspecified parameter c.
To start with, we have
, where K (C) and K4(C) are given by the
2
expressions (3.2.8) a.nd (3.2.16), (3.2.17), respectively.
I'le
shall show that
(i)
r 2 (c) is a non decreasing function of c, 0
~
c
fact, r 2 (c) is a monotone increasing function of c,
and
~
1;
0<
in
C
< 1,
O<c<l.
-
In order to establish property (i), it suffices to
consider
dr (c)
2
dc
ing expression
Tedious algebraic steps lead to the follow-
where
a
a
a.
b.
o
l
2
3
= 160Jt 2
+ 32
/3 1t 2
= - 5041£2 + 23700
= l041t 2
+ 641t
2
- 2280st
ff -
ff -
ff
+ 2880Jt - 3240 + 2592
13 ,
1620Jt - 2916 ,
1321t
13 -
720Jt + 324 + 468
13 ,
::
and
where h2 (c)
>
° for 0
~ c ~ 1.
Some more algebra will show that
Hence we have
o< c
~
c
1,
= O.
These two
expressions establish property (1).
Property (1i) follows
the fact that
r2 (c) > 0,
immedi~tely
for c
from property (i) and
= 0.
Table II exhibits a few values of
r 2 (c).
In the cases of illl..)re than three dimensions, general
•
expressions for the K-th moments about the origin can be derived,
75
using methods similar to the one described previously.
However,
we cannot get exact results, since the expressions involve
iterated integrals.
The case of four dimensions would still be
within the realm of possibility, assuming calculators of the desk
type are available.
However the process, involving quadrature,
would be time consuming.
Further research in this direction
will be necessary before the results are worth the time employed
to get them.
TABLE II
Table of Moments and Related Quantities
for the Linear Contrast
I
1
c.::
0
2
C
= 0.5
3
c
=
1
0.8l~628
4
5
6
(c = 1)
u
(u)
L69257
1.26943
J.L'2
3.65399
2.12024 1.17301 (1.1730)
0.91~233
--
1J.'
9.30913
4.28431 2.11571 (2.1158)
1.26943
--
J.L4
~6.88588
10.00629 4.55706 (4.5575)
1.97655
--
K
2=j.l.2
0.78920
0.50880 0.45681 (0.4577)
0.22613
( .2261)
K3=j.l.3
0.45296
0.30105 0.3 4983 (0.3495)
0.08920
--
j.l.4
2.04688
0.96148 0.89690 (0.8988)
0.18992
1<4
0.17838
0.18484 0.27087
0.03651
---
1'1
0.63799
0.82949 1.13307 (1.28*)
0.82949
(.8296)
1'2
0.28640
0.71400 1.29807 (1.29)
0.71400
( .7135)
3
*
(0.8458)
0.84628 (.8463)
J.L'1
--
ObVious typographical error; the result
should read 1.128
77
Remarks.
(i)
columns 1, 2 and 3 contain results for the specified
values of Cj
(i1)
column 4 contains the results obtained by Irwin
1:9] ,
for the case of the difference between the two largest
sample values, using approximationsj
(iii)
column 5 contains the values of the moments of the
statistic u
= X(O)-(xO+x1+x2 )/3,
and later by Grubbs
(iv)
column
6 lists
studied by Nair
1:14] ,
[6] ;
the results obtained by Grubbs
1:6 J , for
the statistic u.
------------
-----------~
CHAPTER IV
NON NULL DISTRIBUTION OF LINEAR CONTRASTS
OF ORDER STATISTICS.
1.
______..-...__
_
Case of three dimensions; general considerations.
We assmue,
~
.-...:.....l;;.~_.
as before, that we are dealing with three independent random
variables) x o' xl and x 2' normally distributed with unknown means,
mO'
cr
2
and m , respectively, and with a known common variance
2
~
= 1 (say). Let us denote hy
values of the a"Jove variates.
x(O) > x(l) > x(2)
the sample
VIe shall be interested in finding
the density function of the linear contrast
(4.1.1)
where c is an arbitrary real number
t~cing
values on the closed
The joint denslty of x(O)' x(l) and x(2) is
given by
(4.1. 2)
"!-
where ~' stands for the sumnlation ovsr all the pernutations
79
i O' i l , i 2 of the numbers 0, 1 and 2.
Expanding the right hand side of
(4.1.?), and collecting
terms, we can write the density as follows
• l:~~exp (IJ.! X) ,
1
where IJ., X and IJ.. are the following column vectors
1
m
o
(4.1.4)
IJ. =
~
m0
c
m.
x(O)
,
X=
x(l)
,
10
lJ.i =
x(2)
Proceeding as in the null case, let us put
where
m.1
1
m.
1
2
80
U
o
U
=
u
,
1
A ==
u;:>
1
1
1
1
-1
0
0
1
-1
,'Ie have, as before,
" t .>L
XIX == -1 U"ll" BU'<
3
'
and
B ==
where
(4.1.3)
Consoquently,
1
o
o
o
2
1
o
1
2
becomes
1 I exp (- 1
1 6) f( uo' u ' u,., ) =
(4 •.
-2 Il. I Il. ) exp i\ - 7)1 U~PBU~~)
1 c
3( 2n)3 2
~~
.l: exp(jJ,! l\.
l
The variate u
o
-1
~~
U).
can be separated froY:l the others, and we can write
81
where
u
u
(4.1.8)
,
=
u
Since
f
-2
1
C
+oo
-00
it follows that
We may now set
(4.1.10)
Z
= DU
,
2m
,
=
,-1
2
1
2
vi =
i
-mi -mi
012
m1 +mi -2m.
o 1 1.2
,
82
where
zl
(4.1.11)
Z
=
D
l-e
0
1
=
z2
It is readily seen that zl
1
= u1 +
(1-e)u
2
linear contrast (4.1.1).
The
transfor~ation
(4.1.10) implies
where
2c-l
(4.1.12)
M
=
2
2c-l
2(c -c+l)
it also impl:i.es
VI
i
U
= VIi
D- 1 Z =
~! Z
1
where
2m.
1.
(4.1.13)
~1
=
-Ill
0
i1
-ro.
1
2
is precisely the
Consequently, (4.1.9) becomes
.r.*
1
I
exp{,AiZ).
The limits of variation are, as in the null case, given by
(4.1.15)
In order to get the density of the linear contrast
Zl = x{O) - eX{l) - (1-c)x(2) ,
we have to integrate out
the variate z2 over the proper region.
Expanding (4.1.14)
and rearranging the terms, we have
where
K
=
1
21c{3
2
exp j: !(~I~ - m /3)
-
-
7 ,
2
a=e-c+l,
•
b = (2e-l)zl - f(2c-1)m 1 + {2-c )m.
o
J.
- {l+c )m
1
i2
}
•
84
Now,us1ng the identity
2
at + bt =
2
f1,fat + 2 bra } 2 - 4a
b
'
and collecting terms, we have, after a few algebraic steps,
(4.1.18)
2
1
/
012
• exp[{ (2c-l)mi + (2-c )mi -( l+c )mi
, .xp ["-
~
( .j8 "2 + b/2.j8 )2J }
2
l2(c -c+l)_7
,
where K, a and b are defined by (4.1.17).
Setting
and formally integrating out the variable t, we have
(4.1.20)
f(zl)
-
= K1L* gl(zl,~,c)g2(mi'c)
f
h 2 (zl)
h1(zl)
(2n)-~exp(-t2/2)dt
,
where
(4.1.22)
gl(zl,mi ,c)
g2(m1 ,c)
= exp ~- ~i-2Z1(miO"cmi1-(1-C)~2)1
-
-
= exp ~
2
/ 4(c -c+1)-7>
2
J(2c-l)m.
[ 10
+ (2-c)mi
1
- (l+c)m.
12
2
] / 12(c .C+l)-7
(4.1.24 )
_
(C+l)Zl-(1-C)I-(2c-l)m. +(2-c)m. -(l+c)m.
_
J.
o
J.
J.
1
1
..J'-
2_
_.
I
It is to be noted that two expressions involving the
population parameters are closely related.
~l
(4.1.25)
= m.J.
O
- cm.1
1
In fact, if we put
- (l-c)m.1
2
'
86
expressions (4.1.22), (4.1.23) and (4.1.24) reduce to
g3(z,m i ,c)
(4.1.26)
= exp L={z2-2A1zJ/
2
4(c -c+l)_7,
2
g4 (mi , c) = exp [(')0'1+ 2A2 )2/12(c -c+ 1)_7
J
h (z,m1 ,c) = ((2c-1)z-(A1+2A2 ) J/!6(c 2-c+1) ,
3
h4 (z,m1 ,c)
Note:
= (C+1)Z-(1-C)(~+2~2) J/(l-c) 16(c 2 -c+1)
The subscript 1 has been dropped from the z for
•
s~mp1ic1ty.
The above quantities, A and A , are related by a very simple
1
2
cyclic permutation of i O' i l and i 2 •
The density (4.1.20),when expanded, takes the following
(4.1.27)
(C+1)z-(1-C)(Al+~A2)
(l-c)
j6(c2_c~~)
• r.*
exp(_t2/2)~ dt
/21(
!
(~c-1)z-(Al+2A2)
16(c 2 -c+1)
•
form~
the quantities
~,
m, h , and h having been defined previously.
1
2
From (4.l.27),we shall derive three cases of special
interest.
(i)
Case of the range.
Setting c
=0
in (4.1.1), we get
The density (4.1.27) simplifies out in the form
-
2
exp{(m.1 -m.1 )w/2) exp(m.1 -2m.1 +m.1 ) /12)
0
2
0
1
2
88
(ii)
Case of the extreme deviate from the sample mean.
Setting c = 1/2 in (4.1.1), we get
where
It is readily found that (4.1.27) becomes, in this case,
exp((2m. -m. -TIl. )v/3J exp((m. -m. )2/ 4 }
1
1
1
1
1
0
1
1
2
2
2v-m. +m.
1
1
1
2
r
I
I.
.
!!
j
V~(m. -ro. )
1
1
1
2
-----
/2
Finally, (4.1.30), together with (4.1.29), lead to
(4.1.31)
J
(iii)
Case of the difference between the two largest sample
values.
Setting c
f
~
lt
l
. (4.1.32)
=1
in (4.1.1), we get
90
It is not permissable to set c
=1
However, taking the limits, as c
(4.1.33)
f(y) = - 1
2.ji
in the density (4.1.27).
->
1, we get
r1'
1 exp (-y2 /4)
exp _-'2(J.1 J.1-m2 /3)_
exp(m -m )Y/2} exp(m. +m. -2m. )2/121
io i1
~O ~l
~2
+00
2
exp( -t /2} dt
f2i
y-m. -m i +2m.
1
1. 0
~2
91
2.
Case of three dimensions under two particular hypotheses. In
this section, the non null distribution of the linear contrast
(4.2.1)
will be obtained in two particular cases of special interest.
110 shall, first, take the case of the distribution of
(4.2.1) under the hypothesis
(4.2.2)
Under HI'
,,= -~
1
m
- 2m = ma +
~
+ m becomes m
2
= 5; so, we have
The quantities Xl and A , defined by (4.1.25), cannot bo written
2
directly ll1 terms of 5; however, takine thG summation indicated
-l~
by L: , we get
(4.2.3)
f(zIH
l) =
1
In( c 2_c+l)
92
where f(z
IH1 )
stands for the density of the statistic z, defined
in (4.2.1), under the hypothesis HI; and, gl' g? and g) are
functions of z and of the parameters 6 and c defined by the expressions
,;--()z 2+6(I-c)oz-(I+c)
.
202J/12(c 2-c+l)_rI)(z;o,c
~
)•
8 (Z;O,c) = oXPL
3
The functions II' I 2 and I) arc givon by
(c+l)z-(I-c)(2c-l)6
(1-0)/6(0 2-0+1)
I
1
,
=
(2c-l)z-( 20-1)0
/ 6(
0'
-0+1)
93
(c+l)z-(1-c)(2-c)5
(1-c>16(c 2-c+l)
(2c-lh~-( 2-cl2...
/6( c 2-0+1)
r
I
(0+1)z+(1-c)(1+c)5
2
(1-cl/6(c -c+ij
..;.;ex~p~(_-.;;.t:?~/..;.?.:..) dt •
ffn
( :?c-ili~,:c)5
16(c 2-c+l)
The density (4.2.3) has been evaluated, in the particular case
6
= 1,
for certain values of z,and for a few values of the para-
meter c.
The results are listed in Table III.
Let us, now, consiclor the distribution of (4.2.1) under the
hypothesis
(4.2.6)
Thon,the column vector
becomes
I-L
a1so}the quantity m becomes m = 36.
1
'2
I-L 'I-L - m
Consequently t
/3
The density (4.1.27) can now be written
2
1
exp (-6 )£r1 + r 2
2
2Jn( c -c+1)
where the functions,
r , .•• , r , arc definod by
6
1
+ ...
+ r6-7,.
95
(4.2.9)
the functions G1 , 02
and G being given by tho expressions
3
(c~l)z-36c(l-c)
(l-c)/6(c 2-c+l)
(c+l)z+ 35( i-e)
2
2
(l-e )/6(e -e+ 1)
2
exp (_t /2) dt
.j2;
,
(2c-l)z+ 35(1- c)
/6(c 2 _c+i)
(c+l)z-.35(1-c)
_(~~::.1)z-35
--2/ 6(c -c+l)
The density (4.2.8) has been evaluated, in the
particular case 5 = 1, for certain values of z, and for a
few values of the parameter e.
table IV.
The results are listed in
97
Tho non null distribution of linear
contrast~in
the cases
of four or norG order statistics, can bo obtained, 1lSing procedures sll2ilar to tho one givon in tho case of three variates.
Tho distributions were actually obtained; but the expressions are
so bulky that they have not beon included in the present work.
Some "lore research on possible recurronce formulae needs to be
done before the rosults can be put to usc.
TABLE III
Table of ordinates of the density of the linear contrast
Z
= X(O)
- CX( 1) - (l-c )X(2)' under the hypothesis
HI : IIt:J = 8 = 1; m1 = m2
Z
c
0
0.0
0.1
0.2
0.4
= O.
0.6
0.8
0.9
1.0
0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.69550
0.2 .07843 .08707 .09783
.12988 .19255 .36249 .58377 .65223
0.4 .15340 .16984
.24916 .35644
.19015
.56737 .63842
.60265
0.6 .22169 .24434 .27187 .34847 .47043 .60539 .58628 .54862
0.8 .28049 .30717 .33879
1.0 .32764 .35584
.42129 .52678 .56566 .52858
.38797 .46387 .53322 .50692
.49195
.46915 .43436
1.2 .36168 .38868 .41801
.47716
.51027 .44557 .40985
.37744
1.4 .38200 .40550 .42904
.46589 .45370 .38510 .35211
.32259
1.6 .38882
.40689 .42265
.43486 .39556 .32714
.29733 .27098
1.8 .38314
.39449 .40156
.39238 .33636 .27293
.24660 .22357
2.0 .36658 .37079 .36936 .34034
.27999 .22364
.20067
.18002
2.2
.34126 .33851 .32948 .38770 .22839 .17958 .16014 .14373
2.4
.30954
.30284
.28546
.23706 .18255 .14116 .12632
2.6 .27387 .26024
.24117
.19065
2.8 .23387 .21939 .19834
.14980
3.0 .19953 .18053
3.2 .16449 .14500
3.4 .13256
.15882
.14288
.11184
99
TABLE IV
Table of ordinates of the density of the linear contrast
z = X(o) - cX(l) - (1-c)x(2)' under the hypothesis
H2 : mO = 28; m1 = 8,
0.0
0.1
0.2
~
0.4
= 0; 8
= 1.
0.6
0.8
0.9
1.0
0.0 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 .54'17
0.2 .04056 .0'960 .05069 .06759
0.4 .08109 .09037 .101"
0.6 .12140 .1,481
.13497
.101'7 .20168 .,8016 .52172
.20146 .'7844
.51800 .49788
.15152 .20107 .29509 .47802
.50288 .47151
0.8 .16106 .17860 .200'9 .26,60 .37'11 .49628 .47401 .44252
1.0 .199,4
.22058 .24647 .31926 .42638 .47505
.44270 .41092
1.2 ..23518 .25927 .28808 .36'77 .45112 .44210 .40851 .'7691
1.4 .267'1
.29'05 .32298 .39430 .44918 .40526 .'7183 .34096
1.6 .294'7 .'2027 .,4906 .40863
.42718 .,6591
.33"1 .'0'72
1.8 .'1476 .33859 . .,6482
.39249 .32495
.29381 .26591
2.0 .328'7 .34951 .36940 .38986 .'5119 .28344
.25435 .22887
2.2
.33,6, .35004 .36285
2.4 .3'070 .,412'
.30177 .29972
3.0 .27902
.27019
,.2 .25165
.2'777
3.4
.36159 .'0690 .23920 .21601 .19316
.,4610 .'2502
2.6 .'1996 .32398 .52083
2.8
.40664
.28390 .2179'
.28927 .24141
.25585
.22185
e
_____ u
.26260 .199,4 .17977 .15978
_
CHAPTER V
A
DECISION RULE TO PICK OUT THE POPULATIOV
lrITTH THE LARGEST HEAN.
1.
Gen~~al co~siderations.
It has been realized, in the last
decade or so, that the statistical techniques, based on the
classical concepts of Iltesting of hypotheses ll and "confidence
intervals", do not always give the experimenter the answer that
he was hoping to get.
This applies, in particular, to the
situations where the analysis of variance is routinely used.
For instance, testing procedures tell the experimenter that all
the "treatments", which are the subject
not identical; or, at least, the
of investigation, are
h3~othesis
of equality, if it
is true, has a probability a of beine rejected, where a is preassigned by the statistician.
Very often, an answer of this
tJ~e
is inadequate, as it may be conjectured, even before the experiment is carried out, that the treatrnonts could be shown to be
different from one another, by sufficiently increasing the size
of the sample.
In many instances the experimenter likes to take
some decision; for example, to rank the treatments according to
means, or to select the treatment with the largest mean.
The.
problems are real and complex, and it "lay very well be that a
unique tool does not exist which can provide an adequate answer
to all "decision problems".
101
Ue shall, in this chapter" discuss the bearing of too
work in the previous chapters, on the following problem.
Let
us assume that it is wanted to find the "best ll of several
populations, where, by the "best ll population, we shall mean:
the population having the greatest mean.
Let us consider n+l
independent normal populations, each one of them being characterized by its mean.
For the present, we assume that the
2
common variance of the above population is known, (cr = 1 say).
Let the unknown, ranked population means be m ~ m
l
O
>
-
m.
n
~
_
Suppose that a sample of size N is available from each
population, and let the sample means be xo' xl' •• ,-, x n sider the sronple point X
Con..
(xO' xl' •• _, x n ), defined by the
=
set of sample means, and let Wbe a region in the euclidian
(n+l)-dimensional space; then, the decision rule is of the following t;ype:
i)
••• J
X )
n
e W, decide the population
corresponding to the largest sample mean is
the best;
ii) ,if X = (xo ' xl' ..• , x n ) iF, do not decide (withhold
judgment) .
The region W is so determined
that the probability of
taking a wrong decision is always less than a preassigned value,
102
(say). The number l-a may be called the safety level of the
O
O
decision rule. In the following 11 will be defined by means of
a
,
an auxiliary statistic.
Let x(O) > x(l) > ... > x(n) be the ordered, (or ranked,)
sample means.
Of course, it is unknovm whether or not x(i)
comes from the population with mean m.,
(i
~
= 0,1, ••• , n). Let
us consider regions W based on the class of auxiliary statistics
composed of linear contrasts of the ordered sample values.
Let
n
where
~
i=l
c.
~
=1
, D. > 0, (i
~-
= 1,
2, ..• , n),and let the
critical region Wbe of the form g(x(O)' ••• , x(n»
>
k, where
k is a positive number determined in such a way that the decision
rule provides a given safety.
Now it is easy to show that all the linear contrasts of
the form (5.1.1), except
cannot, when used as an auxiliary statistic in our decision rule,
103
provide a reasonable safety.
In fact, consider the linear
contrasts
where
n
(5.1.4)
Z
c.
i=l
~
= 1,
c. > 0, (i
~-
= 1,
2, ... , n) •
Let the population means be
= mn
where e is a small positive number.
when m ~
when
-00,
e ~
°, for
==
m (say) ,
Then, it follows that
any given k.
(5.1.5) is true and m is tending to
This means that,
-00,
the probability
of taking a decision tends to unity and hence, the probability
104
of taking a "wrong decision ll tends to
0.5. Thus, a reasonable
saf'3ty is not provided by the class of auxiliary statistics
given by
w
= x(o)
(5.1.3) and (5.1.4).
In particular, the range
- X(n)' or Nair's "extreme deviate from the sample
moan II
n
X(o) - x ;
x
= . Z0
l=
x(.)/(n+l) ,
l
arc seon to belong to the discarded class of auxiliary statistics.
Consequently, only one member of
(5.1.1) is left; that is,
(5.1.6)
In order to use
(5.1.6) as all aUXiliary statistic in the
proposed dGcision rule, properties of the distribution of (5.1.6)
'tvill have to be studied.
wron~
The uppor bound of the probability of
decision, corresponding to all possible spacings of the
population means, has to be found. out.
been solved.
This problem has not yet
Howover, the case of three dimensions has been
studied in more detail since the distribution of x(o) - x(l) ,
'in the null and non-null cases, has previously been derived.
105
We shall indicatc,in the next section,the results that are
available at this moment.
2.
Some proDerties of the suggested decision rule in the case
of three populations.
We have, now, threo normal, independent
populations with unknown means, mO ~
conmon variance
,l = 1
(say).
~
.:: m2 ' and with a known
Let the observed means be xo'
Xl and x 2 ' and the ranked means be
x(O)
>
x(l)
>
x(2)'
As be-
fore,it is unknown whother or not xCi) comes from the population having the mean m. , (i = 0, 1, 2).
~
Our auxiliary statistic
will bo
(5.2.1)
'ihe decision rulo takes the form
if Y > k , decide x(O) belongs to tho IIbost ll popu-
(5.2.2)
lation,
ii) if Y ~ k , do not decido,
where k will be determined in such n way that the probability
of taking a wrong decision is less than a preassigned value,
•
106
ao > 0 (say).
Of course, a
O
being given, to determine k we will have to
find out the ·'worst" scattering of the true means; Le., the one
which will load to the greatest probability of wrong decision.
This invGstigation will be based on the non-null distribution of
our auxiliary statistic.
Denoting by f(y I FI )
the probability density of (5. ?l)
O
where the null hypothesis, H : mO
O
= ;~~
= m2 , is not true,
have, from 0.1-.1. 33),
(5.2.3)
"
• L;")\'
r
00
2
'I
··1 y-(
exp(-t /2) dt
rrn
+m. -2m. )) /
Hl.
1
1
1
1
1
2
f6
we
107
Setting up
Gxpression (5.?3), after a few obvious stops, becomes
exP~(2B+y)2/12_7coSh~
00
2
exp(-t /2) dt
pn
v-2y-5
/b
00
exp(-t:?/2) dt
/'Tn
-5+y
(b
108
The null distribution is readily obtained from (5.2.5), by
putting 6
= Y = O.
Now suppose, for the moment, that the constant k in
(5.?2) is known.
Thon the probability of taking a decision,
denoted by Pd(y, 6), is given by
00
(5.2.6)
k
This probability can be split up into two parts, (i) tho proba...
bility of taking a wrong decision, donoted by P (y, 6),
w
(ii) tho probability of taking a good decision, denoted by
Pg(Y, 6).
Explicitly, Pg(Y, 6) is the probability that y
and x(O) comes from tho population with mean mO'
> k
Formally, wo
have
,00
(5.2.7) P (y,6)
g
= -
1
2/ii
exp
r ... -3(6
1 2+Y 2+6y) 7
-
-
k
Hl (y;y,6) dy,
109
where
(0
(3y2_65Y_(2Y+5)2)/1~
~cxp(-t2/2) at
v-2y-o
f6
00
.1....cxp ( - t 2/ 2 )
(2n
v-6+y
SimilJrly,
F~(y,o)
is tho probability
th~t
not como from the population with nann mO.
write
P (y,6)
w
DS
y
>
k and x(O) dous
Formally, we can
follows:
00
;-.
1
p ( y,o) = ---oxp
w
2m
I
-/ "
122 .
r - -3(6
+5y) 7,
-
+Y
J
k
H.~.(y;y,6)
dy,
clt,
110
where
(5.2.10 )
00
v-2y-o
/b
,_ 00
00
1
2
--- cxp(-t /2) dt
ffn
00
1,
2
--- OXP\-t /2) dt •
/'Tn
III
It has not yet boon possiblo to find analytically tho
valuos of y and 6 which will
') wrong decision.
~aximize
thL probability of taking
However, we have numerical ovidence that 1 for
do
reasonuble safety,(.90 or more)15=O,r=+oo is the worst case.
Anyway it is easy to show that when 6=0, Pw(y,O) ~ P~(y,O);
in fact, frot'! symmetry considerations, i.t follows that
(5.2.11)
i{umerical evidence
vTaS
gathered in the following m'lnnor.
'Lhe constant k was temporarily doterminod by
= a = (... 075 ,
using the null distribution of y .• \vhose
ordin~ltes
-'lrc listed in
l''lblc I, ::lnd numerical integratio111othods C:Joddlo IS rulo);
was found to be approxirnat,uly equal to 1. 95700.
k
The value
a = C 075 was chosen bocause, in th5.s case, P (0,0) = -3?a '" 0.05.
vJ'
Using the above valu,; of k, tL.:: following six C3ses wore;
~unsidorod:
e·
112
5 '" 1, Y == 0
HI
,
.
H • 5 = 1, Y = 1 ,
2
H : 5 = 0, Y :: 0.6
3
,
,
H
4
5 = 0, Y :: 1.0
H
5 '" 0, Y = 1.4 ,
5
H : 5
6
= 0,
Y
~
+00
.
First, tho densities were obtained uitb the help of tables,
13.7]
and
1)8] j
and then tho qmmtitios
1}6] ,
Pd(Y, 5) and pw(Y, 5)
wore obtained by numerical integration :.lothods.
The results nrc
slllnm.arizod in Tabla V.
TABLE V
P,(y,6)
Q
P (y,5)
1'1
5 :: 1
5 = 1
5 == 0
5 :: 0
5 = 0
y=O
Y = 1
y=o.6
y=1.0
y=1.4
0.13476
0.20622
0.015
0.014
6 :: C
y~
+00
0.08768 ('.10196 0.12140 0.16642
0.01.+77
.0541
.0620
.0832
ll}
The above numerical evidence points out that H6 is the
"worst" case.
The first two cases, H and H , indicate that the
l
2
power of the procedure is rather good and that Pw(r, 1) is in
fact qUite small.
3. Suggestions for further research, and concluding relnarks.
It is conjectured, that, if a reascnable safety is
required, then, in the (n+l)-dimensional case, the maximum
value of .the probability of wrong decision, using a procedure
based on a critical region of the type x(O) - x(l)
occur when mO = ml , and m
2
negatively infinite.
> k,
= m} = ... = mn = m (say),
will still
where m is
Further investigation of the properties
of the non-null distribution in the general case, will be required.
The next step will be to studentize the auxiliary statistic,
using, in the analysis of variance situation, the available inde2
pendent estimate of the common variance a.
This step should be
easy to make, since methods of studentization are known; see, for
instance, Hartley
1:7]
and Nair l:l}
1,
to mention only a few.
In' conclusion, reference may be made to work along similar
lines by other authors.
A large number of testing procedures for
"outliers", or "stragglers", have been proposed, see, for instance,
Irwin
1:8]
and
Recently, Duncan
1:9],
1:4],
Grubbs
1:6],
Dixon [ ) ] and [}] •
has presented a procedure for ranking
114
means, based on successive applications of the range.
Bechhofer
1:1 J
Finally,
has tackled the decision problem along lines
somewhat different from ours.
115
BIBLIOGRAPHY
t)]
Bechhofer, R. E.,
"A Single-Sample Multiple Decision
Procedure for Ranking Means of
Normal Populations with Known Variances," Annals of Mathematical
Statistics, XXV (1954), 16-39.
[21
Dixon, W. J.,
"Analysis of Extreme Values," Annals
of Mathematical Statistics, XXI (1950),
488-506.
[3 1
Dixon,W. J.,
"Ratios Involving Extreme Values,"
hnnals of Mathematical 8tatistics,
XXII (195i), 68:78.
[41
Duncan, D. B.,
"A Significant Test for Differences
Between Ranked Treatments in an Analysis of Variance," Virginia Journal of
Science, II (1951), 171-189.
[5]
Goodwin, H.
[6]
Grubbs, F.
[7]
Hd.rtley, H. 0.,
"Studentization or the Elimination of
the Standard Deviation of the Parent
Population from the Random SampleDistribution Statistics," Biometrika,
XXXIII (1945), 173-180.
[8]
Irwin, J. 0.,
"The Further Theory of Francis Galton's
Individual Diffe~ence Problem," Bio- ..
metrika, XVII (1925), 100-128. ---
[9 1
Irwin, J. 0.,
"On a Criterion for the Rejection of
Outlying Observations," Biometrika,
XVII (1925), 238-250.
J.,
:lli.,
"On the Estimation of Dispersion by
Linear Systematic Statistics," Biometrika, XXXVI (1949), 92-100. --"Sample Criteria for Testing Outlying
Observations," Annals of Mathematical
Statistics, XXI (1950), 27-58.
.
n6
[)01 McKay, A. T.,
[11 1 ~.icKay, A. T.,
[12
I
liosteller, F.,
1)3 1 Nair, K. R.,
1
"The Distribution of the Difference Between the Zxtreme Observation and the
Sample Mean in Samples of n from a Normal Universe," Biometrika, XXVII (1935),
-466-471.
Pearson, E. S., "A Note on the Distribution of Range in Samples of Size n,"
~iometrika, ;~XV (1933), 415-420.
"On Some Useful 'Inefficient' Statistics," Annals of Hathematical Statis~, XVII (1946), 377-408.
"The Studentized Form of the Extreme
Mean Square in the Analysis of Variance,"
Biometrika, XXXV (1948), l6-?1.
Nair, K. R.,
"The Distribution of the Extreme Deviato
from the Sa;lple 1i ean and its Studentized
Form;" Biometrika, XXV (1948), 118-144.
1)5 1 Nair, K. R.,
"Efficiencies of Certain Linear Systematic Statistics for Estlinating Dispersion
from Normal Samples," Biometrika, XXXVII
(1950), 182-183.
----
[14
1}6
1
Jational Bureau of Standards, "Tables of the Exponential
Function eX, "Applied. l'1e.tbematics Ser"ies,
XIV (1951).
. _._-
1)7
1
Hational Bureau of Standards, "Tables of Normal Probability Functio:.l.s, 11 P..~;:Dl iad !let,hcrl9:.ic8 Serios,
XXIII (1953). -'~"--'-~'-'--'--'--'-
1}8 ]
1}9
I
Pearson, K.,
Uilks, S
s.,
I}o ] T.Tilks, S. s.,
"Tables for Statisticians and Biometricians," Part I, Cambridge University
.£!:.e..~.~, 193°.
Statistics," Princeton
Press, 19L~3.
-
1~''1ath8maticC11
1]ni'\T~~ity
"Order Statistics," Bulletin of American
Mathematical Society, LIV (1948), 6-50.