Winter 14 – AMS225 Homework 3 Solution

Winter 14 – AMS225 Homework 3 Solution
1. (a) By differentiation, we see that for a fixed B, an optimal µ must satisfy
0 = AT A(µY − µ − B T µX ).
This equation is satisfied if µ = µY − B T µX . Substituting into the objective function
we have that an optimal B minimizes
E||Z[(Y − µY ) − B T (X − µX )||22 .
Differentiating once more, we have that an optimal B satisfies
0 = (ΣXY − ΣXX B)AT A.
This equation is satisfied if B = Σ−1
XX ΣXY . Looking back, we see that (µ? , B ? ) satisfy
the two equations above, so they minimize
E||A(Y − µ − B T X||22 .
However, they may not be unique minimizers.
(b) From the previous part, the optimal value of E||A(Y − µ − B T X||22 is obtained by
(µ? , B ? ) and
T
E||A(Y − µ? − B T? X)||22 = Tr{A(ΣY Y − ΣY X Σ−1
XX ΣXY )A }
Since it is optimal for all A ≥ 0,
T
T
2
Tr{A(ΣY Y − ΣY X Σ−1
XX ΣXY )A } ≤ E||A(Y − µ − B X)||2
= ETr{A(Y − µ? − B T? X)(Y − µ? − B T? X)T AT }
for all A ≥ 0. Now let A = aaT , where a ∈ Rq and ||a||2 = 1. Then aaT aaT = aaT ,
aT a = 1 and the above inequality implies
T
T
T
T
aT (ΣY Y − ΣY X Σ−1
XX ΣXY )a ≤ a [E(µY − µ − B µX )(µY − µ − B µX ) ]a.
This is true for all unit vectors a, and, by homogeneity, it remains true if we allow
||a||2 = 1. So the above inequality implies
−1
ΣY Y − ΣY X ΣXX
ΣXY ≤ E(µY − µ − B T µX )(µY − µ − B T µX )T .
(c) First, we will show that if C ≥ D and D > 0, then det(C) ≥ det(D).
Proof. If C ≥ D, then xT (C − D)x ≥ 0 for all x ∈ Rq . Since D is non-singular, we may
take x = D −1/2 y for some y ∈ Rq and conclude that
y T (D −1/2 CD −1/2 − Iq )y ≥ 0
for all y ∈ Rq . Of course, this means that the eigenvalues of D −1/2 CD −1/2 are no less
than 1 and so
1 ≤ det(D −1/2 CD −1/2 ) = det(D −1 )det(C).
1
D > 0 by assumption, so det(D −1 ) > 0 and the above inequality implies det(D) ≤
det(C).
Now specializing to our particular problem, we take C = E(Y − µ − B T X)(Y − µ −
B T X)T , D = (ΣY Y − ΣY X Σ−1
XX ΣXY ) and verify that D > 0, because it is the Schur
complement within a positive definite matrix. The result of the previous part is that
C ≥ D.
2. (a)
b̂1
b
∼ MVN2p ( 1 , (X T X)−1 ⊗ Σ ).
b2
b̂2
(b) Define
b̂
δ̂ = (Ip ⊗ [1, −1]) 1 .
b̂2
Then our δ̂ follows multivariate Normal distribution with
E(δ̂) = δ = b1 − b2 ,
and
Var(δ̂) = (Ip ⊗ [1, −1]){(X T X)−1 ⊗ Σ }(Ip ⊗ [1 − 1])T
1
T
−1
)
= (Ip ⊗ [1, −1]){(X X) ⊗ Σ }(Ip ⊗
−1
= (X T X)−1 (σ11 + σ22 − 2σ12 ),
where
σ11 σ12
Σ =
.
σ12 σ22
(c) Let
(δ̂ − δ)T (X T X)(δ̂ − δ)
∼ χ2q .
σ11 + σ22 − 2σ12
We know that nSn ∼ W2 (Σ , n − q) that is independent of δ̂.
n
σ̂11 + σ̂22 − 2σ12
d
∼ W1 (1, n − q) ∼ χ2n−q ,
σ11 + σ22 − 2σ12
where
σ̂11 σ̂12
Sn =
.
σ̂12 σ̂22
So, we have
n − q (δ̂ − δ)T (X T X)(δ̂ − δ)
∼ Fq,n−q .
q
σ̂11 + σ̂22 − 2σ12
2
Now we invert this quantity to get a confidence region for δ. The set
(
)
q
(δ̂ − δ)T (X T X)(δ̂ − δ)
≤
δ:
Fq,n−q (α)
σ̂11 + σ̂22 − 2σ12
n−q
is a (1 − α) × 100% confidence region for b1 − b2 .
3. The column span of the design matrix includes the vector of 1s, because each row of X sums to
1. So we should not include an (extra) intercept in this regression. Using R, we can compute
the least squares fits by the commands:
> library(MMST)
> data(root.stocks)
> obj <- lm(cbind(Y1, Y2, Y3, Y4) ~ 0 + type, data = root.stocks)
(a) The estimated regression coefficient matrix is
> obj
Call:
lm(formula = cbind(Y1, Y2, Y3, Y4) ~ 0 + type, data = root.stocks)
Coefficients:
Y1
typeI
113.75
typeII
115.75
typeIII
110.75
typeIV
109.75
typeIX
85.87
typeV
108.00
typeVI
103.62
typeVII
105.25
typeX
114.75
typeXII
135.62
typeXIII
113.87
typeXV
122.00
typeXVI
121.37
Y2
2977.12
3109.12
2815.25
2879.75
1675.00
2557.25
2214.62
2899.37
2998.25
4699.37
3055.50
3628.88
4047.25
Y3
373.88
451.50
445.50
390.62
235.62
431.25
359.62
325.00
371.12
477.25
391.50
448.37
450.75
Y4
871.12
1280.50
1391.37
1039.00
369.87
1181.00
735.00
659.00
841.12
1897.37
1081.37
1289.25
1695.62
(b) An unbiased estimate of the error covariance matrix is [104/(10413)]Sn :
> round(crossprod(resid(obj)) / (104-13), digits=1)
Y1
Y2
Y3
Y4
Y1
64.6
4100.9
164.7
814.6
Y2 4100.9 356106.1 13794.6 75982.3
Y3 164.7 13794.6 1127.5 6565.0
Y4 814.6 75982.3 6565.0 47838.2
4. We estimate the standard errors by the square root of the diagonal of (X T X)−1 ⊗ Σ . Since
the design is orthogonal, i.e. X T X = 8I4 , the estimated standard errors are the equal for
each coefficient within each variate:
3
> sqrt(diag((crossprod(resid(obj)) / (104-13))/8))
Y1
Y2
Y3
Y4
2.842083 210.981666 11.871782 77.329000
Alternatively, you could use summary(obj) or sqrt(diag(vcov(obj))) to get the estimated
standard errors.
4