BMED2803 BV t Dependent Samples

BMED2803 BV
Two Sample t-Test
Dependent Samples
Formulas. The sample from the population 1, X11 , X12 , . . . , X1,n , is paired with the sample from the
population 2, X21 , X22 , . . . , X2,n , such that (X1i , X2i ) represents the ith pair. Usually, the observations in
the pair are taken on the same subject or dependent subjects (brother-sister, two subjects with the same IQ,
etc.) The examples are numerous: pretest-posttest, measure before treatment-measure after treatment, etc.
It is assumed that the samples come from the normal populations with possibly different means (subject of
the test) and with the same, but unknown variances. Then, di = X1i − X2i are also normal. Define
t=
d¯
√ ,
sd / n
where d¯ is an average of differences di and sd is the sample standard deviation of the differences. The
sample size n is equal to number of pairs.
We are interested in testing H0 : µ1 = µ2 versus one of the alternatives H1 : µ1 >, 6=, < µ2 , and the
test statistics t has Student t distribution with n − 1 degrees of freedom when H0 is true.
This test coincides with one sample t, where the sample are all differences and we are testing that the
mean in the population of differences is equal to 0.
two samples
Alternative
H1 : µ1 > µ2 i.e., µ1 − µ2 > 0
H1 : µ1 6= µ2 i.e., µ1 − µ2 6= 0
H1 : µ1 < µ2 i.e., µ1 − µ2 < 0
α-level rejection region
[tn−1,1−α , ∞)
(−∞, tn−1,α/2 ] ∪ [tn−1,1−α/2 , ∞)
(−∞, tn−1,α ]
p-value
1-tcdf(t, n-1)
2*tcdf(-abs(t), n-1)
tcdf(t, n-1)
Controlling Blood Pressure. In the past, many bodily functions were thought to be beyond conscious
control. However, recent experimentation suggests that it may be possible for a person to control certain
body functions if that person is trained in a program of biofeedback exercises. An experiment is conducted
to show that blood pressure levels can be consciously reduced in people trained in this program. The blood
pressure measurements (in millimeters of mercury) listed in the table represent readings before and after the
biofeedback training of five subjects.
Subject
1
2
3
4
5
Before
137
201
167
150
173
After
130
180
150
153
162
(a) If we want to test whether the mean blood pressure decreases after the training, what are the appropriate null and alternative hypotheses?
(b) Perform the test in (a) with α = 0.05.
(c) What assumptions are needed to assure validity of results.
[(a) H0 : µ1 = µ2 versus H1 : µ1 > µ2 or in terms of differences, H0 : µ1 − µ2 = 0 versus
H1 : µ1 − µ2 > 0.
1
¯
(b) To follow the alternative H√
1 the diferrences d1 should be taken as X1i −X2i Here, di = {7, 21, 17, −3, 11}, d =
10.6, sd = 9.32, t = 10.6/(9.32/ 5) = 2.54, t4,0.95 = 2.131847.]
(c) Variances are the same, normal distributions.
Marijuana. Investigators have studied the effects of marijuana on human physiology. One common belief
held by laypersons is that marijuana affects pupil size. Weil et al.1 studied number of subjects. Each was
administered a high dose of marijuana by smoking a potent marijuana cigarette. The subjects ware all males,
21 to 26 years of age, all of whom smoked tobacco cigarettes regularly but have never tried marijuana. In
this study, pupil size was measured with a millimeter rule under constant illumination with eyes focused on
an object at a constant distance. Pupil size was measured before and after smoking marijuana. The part of
data are given below.
Individual
Before marijuana
After marijuana
1.
2.
3.
4.
1
6
6
2
5
7
3
3
9
4
3
5
5
5
9
6
3
9
Describe the hypotheses of interest for testing. (Hint. The alternative should be one sided)
What is the error of II kind in the terms of the problem?
Perform the test at 5% significance level.
You assumed data come from normal populations. Why then you can not use z cut-points.
two samples
IQ test pairing. In a study, children were first given an IQ test. The two lowest-scoring children were
randomly assigned, one to a “noun-first” task, the other to a “noun-last” task. The two next-lowest IQ
children were similarly assigned, one to “noun-first” task, the other to a “noun-last” task, and so on until all
children were assigned. The data (scores on a word-recall task) are shown here, listed in order from lowest
to highest IQ score
Noun-first
Noun-last
12
10
21
12
12
23
16
14
20
16
39
8
26
16
29
22
30
32
35
13
38
32
34
35
1. Are these two samples (Noun-first, Noun-last) independent?
2. Test the hypothesis that the population mean difference is 0 assuming the two sided alternative. Take
α = 10%. The following info may be useful: the difference sample mean is 6.583 and the difference sample
standard deviation is 11.041.
% Noun First Example
disp(’Noun First Example’)
nounfirst =[12 21 12 16 20 39
nounlast =[10 12 23 14 16 8
d=nounfirst - nounlast;
dbar = mean(d)
%dbar = 6.5833
sd = std(d)
%sd = 11.0409
n = length(d)
26
16
1
29
22
30 35 38 34];
32 13 32 35];
Weil, A. T., Zinberg, N. E., and Nelson, J. (1968). Clinical and psychological effects of marijuana in man. Science, 1968, No
162, 1234-1242.
2
%n = 12
t = dbar/(sd/sqrt(n))
%t = 2.0655
pval = 1-tcdf(t, n-1)
%pval =0.0316
Fatigue. According to the article “Practice and Fatigue Effects on the Programming of a Coincident
Timing Response,” published in the Journal of Human Movement Studies in 1976, practice under fatigued
conditions distorts mechanisms which govern performance. An experiment was conducted using 15 college
males who were trained to make a continuous horizontal right-to-left arm movement from a micro-switch
to a barrier, knocking over the barrier coincident with the arrival of a clock sweephand to the 6 o’clock
position. The absolute value of the difference between the time, in milliseconds, that it took to knock over
the barrier and the time for the sweephand to reach the 6 o’clock position (500 msec) was recorded. Each
participant performed the task five times under pre-fatigue and post-fatigue conditions, and the sums of the
absolute differences for the five performances were recorded as follows:
Subject
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Absolute Time differences
(msec)
Pre-fatigue Post-fatigue
158
91
92
59
65
215
98
226
33
223
89
91
148
92
58
177
142
134
117
116
74
153
66
219
109
143
57
164
85
100
two samples
An increase in the mean absolute time differences when the task is performed under post-fatigue conditions
would support the claim that practice under fatigued conditions distorts mechanisms that govern performance. Assuming the populations to be normally distributed, test this claim at level α = 0.01.
Presidents. No man who ever held the office of President would congratulate a friend on obtaining it. J OHN
A DAMS . In this partial list of American presidents two variables are recorded: X-life expectancy after 1st
inauguration, and Y -actual years lived after 1st inauguration. Test the hypothesis that the number of actual
years lived is substantially smaller than the life expectancy. Use α = 0.05.
3
Name
Andrew Johnson
Ulysses Grant
Rutherford Hayes
James Garfield
Chester Arthur
Grover Cleveland
Benjamin Harrison
William McKinley
Theodore Roosevelt
William Taft
Woodrow Wilson
Warren Harding
Calvin Coolidge
Herbert Hoover
Franklin Roosevelt
Harry Truman
Dwight Eisenhower
John Kennedy
Lyndon Johnson
X - life expectancy
after 1st inauguration
17.2
22.8
18.0
21.2
20.1
22.1
17.2
18.2
26.1
20.3
17.1
18.1
21.4
19.0
21.7
15.3
14.7
28.5
19.3
Y - actual years lived
after 1st inauguration
10.3
16.4
15.9
.5
5.2
23.3
12.0
4.5
17.3
21.2
10.9
2.4
9.4
35.6
12.1
27.7
16.2
2.8
9.2
two samples
lexp=[17.2 22.8 18.0 21.2 20.1 22.1 17.2 18.2 26.1 20.3 17.1 ...
18.1 21.4 19.0 21.7 15.3 14.7 28.5 19.3];
ylived = [10.3 16.4 15.9 0.5 5.2 23.3 12.0 4.5 17.3 21.2 10.9...
2.4 9.4 35.6 12.1 27.7 16.2 2.8 9.2];
d = lexp - ylived;
n= length(d)
dbar = mean(d)
sd = std(d)
t = dbar/( sd/sqrt(n))
pval = 1 - tcdf(t, n-1)
% n =19; dbar = 6.6000; sd = 10.3425; t = 2.7816; pval = 0.0062
Independent Samples
We consider testing equality of two normal means when variances are not known and the populations/samples
are independent. Assume we observed X11 , X12 , . . . , X1,n1 from population with distribution N (µ1 , σ12 )
and X21 , X22 , . . . , X2,n2 from N (µ2 , σ22 ). We are interested in testing the hypothesis H0 : µ1 = µ2 versus
the alternative H1 : µ1 >, 6=, < µ2 , at significance level α.
There are two scenarios that depend on population variances.
Scenario 1: Variances unknown but assumed equal. In this case joint σ 2 is estimated by both s21 and s22 .
The weighted average of s21 and s22 is better estimator than individual s2 ’s and the weights depend on sample
sizes:
s2p =
(n1 − 1)s21 + (n2 − 1)s22
n2 − 1
n2 − 1
=
s21 +
s2 = ws21 + (1 − w)s22 .
n1 + n2 − 2
n1 + n2 − 2
n1 + n2 − 2 2
4
One can show that when H0 is true, i.e., when µ1 = µ2 the statistic
t=
¯ −X
¯2
X
p 1
sp 1/n1 + 1/n2
has Student t distribution with df = n1 + n2 − 2 degrees of freedom.
Scenario 2: No assumption about the variances. In this case, when H0 is true, i.e., when µ1 = µ2 the
statistic
¯1 − X
¯2
X
t= p 2
s1 /n1 + s22 /n2
has t distribution with approximately
df =
(s21 /n1 + s22 /n2 )2
(s21 /n1 )2 /(n1 − 1) + (s22 /n2 )2 /(n2 − 1)
degrees of freedom. This is a special case of the so called Welch-Satterwhite formula that approximates the
degree of freedom for a linear combination of chi-square distributions
In both cases
Alternative
H1 : µ1 > µ2 i.e., µ1 − µ2 > 0
H1 : µ1 6= µ2 i.e., µ1 − µ2 6= 0
H1 : µ1 < µ2 i.e., µ1 − µ2 < 0
α-level rejection region
[tdf,1−α , ∞)
(−∞, tdf,α/2 ] ∪ [tdf,1−α/2 , ∞)
(−∞, tdf,α ]
p-value
1-tcdf(t, df)
2*tcdf(-abs(t), df)
tcdf(t, df)
two samples
Exposure to lead. To verify the hypothesis that blood levels tend to be higher for children whose parents
work in a factory that uses lead in the manufacturing process, researchers examined lead levels in the blood
of 12 children whose parents worked in a battery manufacturing factory.
The results for the “case children” X11 , X12 , . . . , X1,12 are compared to a “control” sample X21 , X22 , . . . , Y2,15
consisting of children selected randomly from the families where the parents did not work in a factory that
uses lead.
The resulting sample means and sample standard deviations were X 1 = .015, s1 = .004, X 2 = .006,
and s2 = .006.
(i) Formulate the hypotheses to be tested and use the one-sided alternative.
(ii) Perform the test of hypotheses from (i) at the level α = 0.05. To select the test analyze the equality
of population variances, also at α = 0.05 level.
(iii) Find 95% Confidence Interval for the difference in population means, µ1 − µ2 .
(iv) What power has this test against the alternative H1 : µ1 − µ2 = 0.005?
(v) In designing future experiment to test the same phenomenon as in (i), it is desired that α = 5% test
achieves the power of 1 − β = 90% against the alternative H1 : µ1 − µ2 = 0.005. What sample size is
needed?
Solution: (i) Only two research hypotheses make sense in this context, H1 : µ1 > µ2 and H1 : µ1 6= µ2 .
The one sided alternative is leads to more precise analysis. Thus, we will test
H0 : µ1 = µ2
versus H1 : µ1 > µ2 .
(ii) Before testing for the equality of means we need to make assumption about variances. This is guided
by na additional test of equality of variances. The test for variances is described in your text [pages 310-317].
5
s2
To test H0 : σ12 = σ22 versus H1 : σ12 6= σ22 , we compute the ratio F = s21 that has Fisher’s F distribution
2
with n1 − 1 and n2 − 1 degrees of freedom. Then we find p-value as 2 * fcdf(F, n1-1, n2-1) if
the statistics F < 1 and 2*(1-fcdf(F, n1-1, n2-1)) if the statistics F > 1. In our case F was less
than 1 and
n1 = 12; X1bar = 0.010; s1=0.004;
n2 = 15; X2bar = 0.006; s2 = 0.006;
Fstat = s1ˆ2/s2ˆ2
%% Fstat = 0.4444
%Since Fstat<1 the p-value is
pval = 2 * Fcdf(0.4444, n1-1, n2-1)
%% pval = 0.1825
Guided by the test of variances we assume that the population variances are the same and use t-statistics
with pooled standard deviation. The test statistics is
s
¯1 − X
¯2
X
(n1 − 1)s21 + (n2 − 1)s22
p
t=
, where sp =
,
n1 + n2 − 2
sp 1/n1 + 1/n2
two samples
and it is t-distributed with n1 + n2 − 2 degrees of freedom.
sp = sqrt(
((n1-1)*s1ˆ2 + (n2-1)*s2ˆ2 )/(n1 + n2 - 2) )
%% sp =0.0052
df= n1 + n2 - 2 %%df = 25
tstat = (X1bar - X2bar)/(sp * sqrt(1/n1 + 1/n2)) %%tstat=1.9803
pvalue = 1 - tcdf(tstat, n1+n2-2) %%pvalue = 0.0294 approx 3%
The null hypothesis is rejected at 5% level.
If we wanted to use rejection region method, the alternative is one sided and the rejection region is
RR = [tn1 +n2 −2,1−α , ∞).
tinv(1-0.05, df)
%%%ans =1.7081
By rejection-region arguments, the hypothesis H0 is rejected since t > tn1 +n2 −2,1−α , that is 1.9803 >
1.7081.
(iii) The expression for confidence interval for the difference of population means µ1 − µ2 follows from
the form of t statistics in (ii), see the text, page 308-309. The CI of confidence (1 − α) × 100% is
p
p
¯1 − X
¯ 2 − tn +n −2,1−α/2 sp 1/n1 + 1/n2 , X
¯1 − X
¯ 2 + tn +n −2,1−α/2 sp 1/n1 + 1/n2 ].
[X
1
2
1
2
LB = X1bar - X2bar - tinv(0.975, df)*sp * sqrt(1/n1 + 1/n2) %%LB=-0.00016
UB = X1bar - X2bar + tinv(0.975, df)*sp * sqrt(1/n1 + 1/n2) %%UB = 0.0082
(iv) The equation 8.28 modified for one-sided alternative (text page 333, zα = −z1−α ),
Ã
!
∆
1 − β = Φ zα + p 2
,
σ1 /n1 + σ22 /n2
gives power of one-sided, level α test against the alternative H1 : µ1 − µ2 = 0.005(= ∆). The normal
approximation is used and s21 and s22 are plugged in the place of σ12 and σ22 .
6
power = normcdf( norminv(0.05) + 0.005/sqrt(s1ˆ2/n1+s2ˆ2/n2) )
%%power1= 0.8271
Thus the power is about 83%.
(v) The sample size for the new test is prospective in nature and we assume that σ12 and σ22 are known
and equal to s21 and s22 from the study (now considered to be a pilot study). The formula 8.26 (text page 332)
adjusted for one-sided test is
(σ 2 + σ22 )(z1−α + z1−β )2
n= 1
,
∆2
which in MATLAB gives
ssize = (s1ˆ2 + s2ˆ2)*(norminv(0.95)+norminv(0.9))ˆ2/(0.005ˆ2)
%% ssize = 17.8128 approx 18 each
The number of children is 18 per group, if the sample sizes are desired the same, n1 = n2 . Consult the book
for the case when n2 = k × n1 , if such sampling is desired.
Stress, Diet and Acids. In the study “Interrelationships Between Stress, Dietary Intake, and Plasma Ascorbic Acid During Pregnancy” conducted at the Virginia Polytechnic Institute and State University, the plasma
ascorbic acid levels of pregnant women were compared for smokers versus non-smokers. Thirty-two women
in the last three months of pregnancy, free of major health disorders, and ranging in age from 15 to 32 years
were selected for the study. Prior to the collection of 20 ml of blood, the participants were told to avoid
breakfast, forego their vitamin supplements, and avoid foods high in ascorbic acid content. From the blood
samples, the following plasma ascorbic acid values of each subject were determined in milligrams per 100
milliliters:
two samples
Plasma Ascorbic Acid Values
Non-smokers
Smokers
0.97 1.06
0.48
0.72 0.86
0.81
1.00 0.85
0.98
0.81 0.58
0.68
0.62 0.57
1.18
1.22 0.64
1.36
1.24 0.98
0.88
0.89 1.09
1.64
0.90 0.92
0.74 0.78
0.88 1.14
0.94 1.18
Propose statistical inference.
nonsmo =[0.97 0.72 1.00 0.81 0.62 1.32 1.24 0.99 ...
0.90 0.74 0.88 0.94 1.06 0.86 0.85 0.58 0.57...
0.64 0.98 1.09 0.92 0.78 1.14 1.18];
smo =[ 0.48 0.81 0.98 0.68 1.18 1.36 0.78 1.64];
%test hypothesis that the plasma ascorbic acid levels are different for
%the two groups. Use alpha=0.05.
7
X1bar = mean(nonsmo); s1 = std(nonsmo); n1 = length(nonsmo);
X2bar = mean(smo); s2 = std(smo); n2= length(smo);
%s1 = 0.2045 s2 = 0.3833 we will check for equality of variances
F = s1ˆ2/s2ˆ2 %is smaller than 1
pval1 = 2*fcdf(F, n1-1, n2 -1)
% pval1 =0.0208 < 5\% and we will not assume equality of variance in
%
comparing the two means.
% The "nasty" df for the t test is [text 318p]
ndf = (s1ˆ2/n1 + s2ˆ2/n2 )ˆ2 /( (s1ˆ2 /n1)ˆ2/(n1-1) + (s2ˆ2/n2)ˆ2 /(n2-1) )
t = (X1bar - X2bar)/sqrt( s1ˆ2/n1 + s2ˆ2/n2 )
pval = 2*tcdf(-abs(t),ndf)
% ndf = 8.3689;
t =-0.5730;
pval = 0.5817
% the two means are not significantly different at level 5\%
Satterthwaite, F. E. (1946), ”An Approximate Distribution of Estimates of Variance Components.”, Biometrics Bulletin 2: 110-114 Welch, B. L. (1947), ”The generalization of ”student’s” problem when several
different population variances are involved.”, Biometrika 34: 28-35
two samples
8