MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Two-Sample Inference

MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS
Solutions to Problems on Two-Sample Inference
1. Suppose: n1 = 45, n2 = 55,
¯ T1 = [204.4, 556.6],
x
¯ T2 = [130.0, 355.0],
x
"
13825.3 23823.4
23823.4 73107.4
#
"
8632.0 19616.7
19616.7 55964.5
#
S1 =
and
S2 =
.
Then
"
Spooled =
10963.69 21505.42
21505.42 63661.31
#
.
So,
T
(x1 − x2 )
1
1
+
Spooled
n1 n2
−1
(x1 − x2 )
"
= ([204.4, 556.6] − [130.0, 355.0])
1
1
+
45 55
"
10963.69 21505.42
21505.42 63661.31
##−1
× ([204.4, 556.6] − [130.0, 355.0])T
= 16.06622.
But
(n1 + n2 − 2)p
98 × 2
F2,97 (0.05) = 6.244089.
Fp,n1 +n2 −p−1 (α) =
n1 + n2 − p − 1
97
So, there is evidence against the hypothesis µ1 − µ2 = 0.
The 95% simultaneous confidence intervals for the differences in the mean components are
s
(x11 − x21 ) ±
(n1 + n2 − 2)p
Fp,n1 +n2 −p−1 (α)
n1 + n2 − p − 1
s
≡ −74.4 ±
6.244089 ×
1
1
+
45 55
s
1
1
+
S11,pooled
n1 n2
s
1
1
+
S22,pooled
n1 n2
× 10963.69
and
s
(x12 − x22 ) ±
(n1 + n2 − 2)p
Fp,n1 +n2 −p−1 (α)
n1 + n2 − p − 1
s
≡ 201.6 ±
6.244089 ×
1
1
+
45 55
1
× 63661.31.
2. Consider the data in problem 1. We have
T
[x1 − x2 ]
1
1
S1 + S2
45
55
−1
[x1 − x2 ] = 15.65853.
But χ2p (α) = χ22 (0.05) = 5.991. So, there is evidence against the hypothesis µ1 − µ2 = 0.
3. Municipal wastewater treatment plants are required by law to monitor their discharges into
rivers and streams on a regular basis. Concern about the relability of data from one of these
self-monitoring programs led to a stud in which samples of effluent were divided and sent
to two laboratories for testing. One half of each sample was sent to the Wisconsin State
Laboratory of Hygiene and one-half was sent to a private commerical laboratory routinely
used in the monitoring program. Measurements of biochemical oxygen demand (BOD) and
suspended solids (SS) were obtained, for n = 11 sample splits, from the two laboratories. The
data are displayed below.
Sample j
1
2
3
4
5
6
7
8
9
10
11
Commerical lab
x11j (BOD) x12j (SS)
6
27
6
23
18
64
8
44
11
30
34
75
28
26
71
124
43
54
33
30
20
14
State lab
(BOD) x22j (SS)
15
13
22
29
31
64
30
64
56
20
21
x21j
25
28
36
35
15
44
42
54
34
29
39
For this data,
¯ = [−9.363636, 13.272727],
d
"
Sd =
199.2545 88.30909
88.30909 418.6182
#
and
"
S−1
d
=
0.005536320 −0.001167908
−0.001167908 0.002635186
#
.
So,
T
T 2 = nd S−1
d d = 13.63931.
But
(n − 1)p
20
Fp,n−p (α) = F2,9 (0.05) = 9.458877.
n−p
9
Hence, there is no evidence to suggest that the two laboratories’ chemical analyses agree.
2
4. A 95% joint confidence region for the mean difference vector δ using the effluent data is
T
[−9.363636, 13.272727] − δ
T
"
0.06089952 −0.01284698
−0.01284698 0.02898705
#
([−9.363636, 13.272727] − δ)
≥ 1.283668
which can be rewritten as 11{0.06089952(−9.363636 − δ1 )2 + 0.02898705(13.272727 − δ2 )2 −
2 × 0.01284698(−9.363636 − δ1 )(13.272727 − δ2 )} ≤ 9.458877.
5. Fifty bars of soap are manufactured in each of two ways. Two characteristics X1 = lather
and X2 = mildness are measured. The summary statistics for bars produced by methods 1
and 2 are
"
¯1 =
x
"
¯2 =
x
8.3
4.1
#
10.2
3.9
#
,
,
"
2 1
1 6
#
"
2 1
1 4
#
S1 =
and
S2 =
.
Then
"
Spooled =
2 1
1 5
#
.
So, a 95% confidence region for µ1 − µ2 is
(x1 − x2 )
"
⇔
8.3
4.1
T
#
"
−
1
1
Spooled
+
n1 n2
10.2
3.9
"
"
⇔
−1.9
0.2
−δ
#
×
8.3
4.1
#
!T "
−δ
!T "
#
"
−
10.2
3.9
−1
(x1 − x2 ) ≤
1
1
+
50 50
#
"
!
−δ
≤
2 1
1 5
(n1 + n2 − 2)p
Fp,n1 +n2 −p−1 (α)
n1 + n2 − p − 1
##−1
98 × 2
F2,97 (0.05) = 6.244089
97
13.888889 −2.777778
−2.777778 5.555556
# "
−1.9
0.2
#
!
−δ
≤ 6.244089,
which can be rewritten as 13.888889(−1.9 − δ1 )2 + 5.555556(0.2 − δ2 )2 − 2 × 2.777778(−1.9 −
δ1 )(0.2 − δ2 ) ≤ 6.244089.
3
6. A researcher considered three indices measuring severity of heart attacks. The values of these
indices for n = 40 heart-attack patients arriving at a hospital emergency room produced the
summary statistics:
¯ T = [46.1, 57.3, 50.4]
x
and


101.3 63.0 71.0


S =  63.0 80.2 55.6  .
71.0 55.6 97.4
All three indices are evaluated for each patient. For this data,
(C¯
x)T = [−11.2, −4.3],
"
C=
1 −1 0
1 0 −1
"
T
CSC =
#
55.5 22.9
22.9 56.7
,
#
,
and
T
CSC
−1
"
=
0.021621086 −0.008732326
−0.008732326 0.021163497
#
.
So,
T 2 = n (Cx)T CSCT
−1
(Cx) = 90.49458.
But
(n − 1)(q − 1)
Fq−1,n−q+1 (α) = F1,39 (0.05) = 4.091279.
(n − q + 1)
Hence, there is evidence against the equality of mean indices.
Simultaneous 95% confidence intervals for the differences in pairs of mean indices are
s
(1, −1, 0)T x ±
4.091279 × (1, −1, 0)T S(1, −1, 0)
,
40
s
(1, 0, −1)T x ±
4.091279 × (1, 0, −1)T S(1, 0, −1)
40
and
s
(0, 1, −1)T x ±
4.091279 × (0, 1, −1)T S(0, 1, −1)
.
40
4
7. Observations on two responses are collected for two treatments. The observation vectors
(x1 , x2 )T are:
"
3 1 2
3 6 3
#
for treatment 1, and
"
#
2 5 3 2
3 1 1 3
for treatment 2. For this data, n1 = 3, n2 = 4,
"
¯1 =
x
"
¯2 =
x
"
S1 =
2
4
#
3
2
#
,
,
1
−1.5
−1.5 3
#
and
"
2
−1.33333
−1.33333 1.33333
S2 =
#
.
Then
"
Spooled =
1
−1.399998
−1.399998
1.99998
#
.
So,
(x1 − x2 )
T
1
1
+
Spooled
n1 n2
−1
(x1 − x2 ) = 17.14157.
But
(n1 + n2 − 2)p
10
Fp,n1 +n2 −p−1 (α) = F2,4 (0.01) = 45.
n1 + n2 − p − 1
4
So, there is no evidence against the hypothesis H0 : µ1 − µ2 = 0.
The 99% simultaneous confidence intervals for the differences µ1i − µ2i for i = 1, 2 are
s
(x11 − x21 ) ±
s
≡ −1 ±
45 ×
(n1 + n2 − 2)p
Fp,n1 +n2 −p−1 (α)
n1 + n2 − p − 1
1 1
+
3 4
s
1
1
+
S11,pooled
n1 n2
s
1
1
+
S22,pooled
n1 n2
×1
and
s
(x12 − x22 ) ±
s
≡ 2±
45 ×
(n1 + n2 − 2)p
Fp,n1 +n2 −p−1 (α)
n1 + n2 − p − 1
1 1
+
3 4
× 1.99998.
5