Statistics 100 – Two-sample Inference

Statistics 100 – Two-sample Inference
Inference for difference in means:
We have two random variables, X and Y , and we assume
• X ∼ N(µX , σX )
• Y ∼ N(µY , σY )
Want to make inferences about µY − µX .
Example: Remember her?
You were asked on the first day of class how old she is.
Question: To the extent that the students in the class were a representative sample from the Harvard student population, what inferences can we make about the difference in average guessed
age of the girl between male and female students?
• Confidence interval for the difference in mean guessed age between male and female students
1
• Hypothesis test for whether a difference in mean guessed age exists between male and female students
20
16
18
Guessed age (yrs)
22
24
Girl’s guessed age by gender
Female
Male
Student gender
Typical application – determining the effect of a treatment from experimental data (though applications to survey data are also common)
• Two treatments (“treatment” and “control”)
• Want to infer whether there is a difference between the mean response when exposed to
treatment versus when exposed to control.
Two situations we’ll consider:
1. Independent samples (e.g., samples from a completely randomized design)
(a) Assume no restrictions on (population) standard deviations
(b) Assume (population) standard deviations are identical
2. Matched pairs design
Analysis for independent samples:
2
Let x
¯, sX and nX be the sample mean, sample standard deviation, and sample size for the X
variable, and let y¯, sY and nY be the same quantities for the Y variable.
100C% CI for µY − µX :
s
(¯
y−x
¯) ± t
∗
s2X
s2
+ Y
nX
nY
where t∗ is the 100C% critical value from a t-distribution, where the degrees of freedom is the
smaller of nY − 1 and nX − 1.
Example: Effect of MSG (mono-sodium glutamate) on weight of ovaries in rats:
A completely randomized experiment was performed to compare average weights (in milligrams)
of ovaries in female rats when fed diets with and without MSG (mono-sodium glutamate). Eleven
female rats were fed a diet containing MSG, and ten “control” rats were fed a normal diet. For the
treated rats, the mean and standard deviations were 29.35 and 4.55, respectively. For the control
rats, the mean and standard deviations were 21.86 and 10.09, respectively. Find a 90% CI for the
difference in average weights of rat ovaries when fed MSG versus not.
Solution:
Let X be the ovary weight of a control rat, and Y be the ovary weight of a treated rat.
Assume X ∼ N(µX , σX ), and Y ∼ N(µY , σY ). Computed from the data, we have
x
¯ = 21.86
y¯ = 29.35
sX = 10.09
sY = 4.55
nX = 10
nY = 11
Want a 90% CI for µY − µX .
Step 1: Degrees of freedom for t-distribution in non-pooled procedure is the smaller of 11 − 1 and
10 − 1 which is 9.
Step 2: Critical value for 90% confidence on t(9) distribution is 1.833 (from Table D).
s
(29.35 − 21.86) ± 1.833
(10.09)2 (4.55)2
+
10
11
= 7.49 ± 6.37 = (1.12, 13.86)
We are 90% confident that the difference in average ovary weights between MSG-fed rats and
ordinary rats is from 1.12 to 13.86.
As an aside, because 0 is not in the interval, µY − µX = 0 is not “plausible.”
Hypothesis testing:
In most cases, Ho : µY = µX (can also be written as Ho : µY − µX = 0).
3
The alternative hypothesis is either
• Ha : µY > µX ,
• Ha : µY < µX , or
• Ha : µY 6= µX .
The first two correspond to one-sided tests, and the third corresponds to a two-sided test.
Test statistic:
t=
(¯
y−x
¯) − (µY − µX )
r
s2X
nX
+
s2Y
nY
(¯
y−x
¯)
=r
s2X
s2Y
nX + nY
Under Ho , the test statistic has a t-distribution where the degrees of freedom is the smaller of
nY − 1 and nX − 1.
Another example:
Do telephone calls made by the sales division of a company last longer, on average, than calls
made by the customer service department? A random sample of 40 calls from the sales division
revealed an average of 10.26 minutes per call with a standard deviation of 8.65 minutes. A random
sample of 20 calls from the customer service department provided an average of 6.93 minutes per
call with a standard deviation of 4.93 minutes. Test at the α = 0.01 level.
Solution:
Let X be the length of a phone call (minutes) placed by a customer service representative, and let
Y be the length of a call placed by a salesman.
Assume X ∼ N(µX , σX ), and Y ∼ N(µY , σY ).
x
¯ = 6.93
y¯ = 10.26
sX = 4.93
sY = 8.65
nX = 20
nY = 40
Step 1:
Ho : µY = µX
Ha : µY > µX
Step 2:
The significance level is 0.01.
Step 3: Degrees of freedom for t-distribution in the non-pooled procedure is the smaller of 40 − 1
and 20 − 1 which is 19.
4
Step 4: Calculate the test statistic:
3.33
10.26 − 6.93
=
= 1.90
t= q 2
1.75
4.93
8.652
+
20
40
Step 5: Calculate the p-value:
On 19 degrees of freedom, we have 1.729 < 1.90 < 2.093, so that 0.025 < P(t > 1.90) < 0.05. For
this one-sided test, therefore,
0.025 < p-value < 0.05
Thus, at the α = 0.01 level, we cannot reject Ho . We do not have enough evidence to conclude that
sales division personnel use the phone, on average, more per call than customer service representatives.
Comments:
• y¯ − x
¯ is a point estimate of µY − µX .
r
•
t∗
r
•
s2X
nX
s2X
nX
+
+
s2Y
nY
s2Y
nY
is the margin of error for the confidence interval.
is the standard error of y¯ − x
¯.
• These techniques assume σY may be different from σX .
Another major possibility:
What if it is believable that σX = σY = σ?
In this case, we are assuming that
• X ∼ N(µX , σ)
• Y ∼ N(µY , σ)
We can form a single estimate of the (common) standard deviation.
Define
s
sp =
(nX − 1)s2X + (nY − 1)s2Y
nX + nY − 2
to be the “pooled” estimate of σ.
Assuming σY = σX ,
100C% CI for µY − µX :
s
∗
(¯
y−x
¯ ) ± t sp
5
1
1
+
nX
nY
where t∗ is the critical value from a t-distribution on nY + nX − 2 degrees of freedom.
Hypothesis testing:
Again, in most cases, Ho : µY = µX .
Test statistic:
t=
(¯
y−x
¯) − (µY − µX )
sp
q
1
nX
+
1
nY
=
(¯
y−x
¯)
sp
q
1
nX
+
1
nY
.
Under Ho , the test statistic has a t-distribution on nX + nY − 2 degrees of freedom.
Using pooled vs non-pooled procedure:
• If sX and sY are not within a factor of 1.5, use non-pooled procedures.
• If sX and sY are within a factor of 1.5, use pooled procedures.
Examples:
• sX = 2.84 and sY = 3.92.
Because 3.92/2.84 = 1.38 < 1.5, use the pooled procedure.
• sX = 1.43 and sY = 5.91.
Because 5.91/1.43 = 4.13 > 1.5, use the non-pooled procedure.
In the previous two data examples, the sample standard deviations were too far apart to use the
pooled procedure:
• In the study examining the effect of MSG on rat ovaries, we had sX = 10.09 and sY = 4.55,
so because 10.09/4.55 = 2.22 > 1.5, we should not use the pooled procedure.
• In the study on the comparison of phone call lengths, we had sX = 4.93 and sY = 8.65, so
because 8.65/4.93 = 1.75 > 1.5, we should not use the pooled procedure.
Example:
The Wide Range Achievement Test is given to a stratified random sample of 12 six-year olds and
16 seven-year olds. The average score and standard deviation for the six-year olds were 27.5 and
10.2, respectively. The average score and standard deviation for the seven-year olds were 44.0 and
13.2, respectively. Find a 95% confidence interval for the difference in average scores for six and
seven year olds.
Solution:
Let X be the score of a six-year old and let Y be the score of a seven-year old.
It is reasonable to assume approximately X ∼ N(µX , σX ) and Y ∼ N(µY , σY ). We want to find a
95% CI for µY − µX .
6
We’re given:
x
¯ = 27.5
y¯ = 44.0
sX = 10.2
sY = 13.2
nX = 12
nY = 16
Step 1: Because the standard deviations are less than a factor of 1.5 from each other (13.2/10.2 =
1.29 < 1.5), use the pooled procedure.
Step 2: Degrees of freedom for t-distribution in the pooled procedure is 12 + 16 − 2 = 26.
Step 3: Critical value for 95% confidence on t(26) distribution is 2.056 (from Table D).
Step 4: Compute sp .
s
sp =
(12 − 1)10.22 + (16 − 1)13.22
= 12.02
12 + 16 − 2
Step 5: The confidence interval is
r
(44.0 − 27.5) ± 2.056(12.02)
1
1
+
12 16
= 16.5 ± 9.44 = (7.06, 25.94)
We are 95% confident that the difference in average score between six year olds and seven-year
olds is from 7.06 to 25.94.
Another example:
An economist was curious whether the news media gives equal coverage to good news and bad
news that is of equal importance. He looked at television reportings of changes in the unemployment rate over the time period from 1973 to 1985. Out of the 171 times that the unemployment rate
was reported to have increased, the average news time and standard deviation were 161.8 seconds
and 110.8 seconds, respectively. Out of the 170 times that the unemployment rate was reported
to have decreased, the average news time and standard deviation were 123.6 seconds and 103.9
seconds, respectively. At the α = 0.05 significance level, is there a difference in the amount of time
the media devotes the good news versus bad news?
Solution:
Let X be the time for a reporting that the unemployment rate decreases, and let Y be the time for
a reporting that the unemployment rate increases.
Reasonable to assume X ∼ N(µX , σX ) and Y ∼ N(µY , σY ).
x
¯ = 123.6
y¯ = 161.8
sX = 103.9
sY = 110.8
7
nX = 170
nY = 171
Step 1: Because the standard deviations are less than a factor of 1.5 from each other (110.8/103.9 =
1.066 < 1.5), use the pooled procedure.
Step 2:
Ho : µY = µX
Ha : µY 6= µX
Step 3:
The significance level is 0.05.
Step 4: Degrees of freedom for t-distribution in the pooled procedure is 170 + 171 − 2 = 339. On
Table D, round down to 100.
Step 5: Calculate sp :
s
sp =
(170 − 1)103.92 + (171 − 1)110.82
= 107.42
170 + 171 − 2
Step 6: Calculate the test statistic:
t=
161.8 − 123.6
q
107.42
1
170
+
1
171
=
38.2
= 3.28
11.63
Step 7: Calculate the p-value:
On 100 degrees of freedom, we have 3.174 < 3.28 < 3.390, so that 0.0005 < P(t > 3.28) < 0.001.
The two-sided p-value is P(t > 3.28) + P(t < −3.28) which is twice P(t > 3.28), so that
2(0.0005) < p-value < 2(0.001)
or
0.001 < p-value < 0.002
Thus, at the α = 0.05 level, we can reject Ho . The news media devotes a different amount of time,
on average, to unemployment rates increasing versus decreasing.
To pool or not to pool. . . what is the question?!
• Use of the pooled procedure makes a strong assumption (σX = σY ) about the population.
• If this assumption is correct, then inferences about µY − µX will be more precise. If the
assumption is wrong, then inferences about µY − µX will be mostly meaningless.
• Using the non-pooled procedure is “safer,” though the trade-off is that inferences will not be
as precise as the pooled procedure if σX = σY .
8
Recap: Steps for inference for a difference in means:
1. Determine whether to use the non-pooled procedure, or the pooled procedure (based on
examining the sample standard deviations).
2. Carry out the appropriate confidence interval or hypothesis test.
Let’s try with the age-guessing example.
Remarks:
• Data were assumed to come from a CRD (experiment), or from a SRS or stratified sample
(survey, obs study).
• Again, both X and Y are assumed to be approximately normally distributed.
• The use of t-procedures are fairly robust to non-normality of the data, but usually a good
idea to check data for strong skewness or outliers.
Matched pairs design:
When the data from two samples come from a matched pairs experiment, using the previous
methods will not be precise enough. Can do better by incorporating the knowledge that the observations are paired.
Motivating example:
Suppose a gasoline distributor wants to know whether an additive improves cars’ mileage. He
designs a matched pairs experiment where 10 cars are given, in random order, ordinary gasoline
and gasoline with the additive.
Mileage
Mileage
Car With Additive Ordinary Gasoline
1
25.7
24.9
20.0
18.8
2
3
28.4
27.7
4
13.7
13.0
5
18.8
17.8
6
12.5
11.3
7
28.4
27.8
8
8.1
8.2
9
23.1
23.1
10
10.4
9.9
avg
18.91
18.25
sd
7.47
7.42
9
20
10
15
Car mileage
25
Car mileage with and without gasoline additive
Without additive
With additive
With and without additive
Let X be the mileage of a car without the additive, and let Y be the mileage of a car with the
additive. Want to make an inference about µY − µX .
An Idea:
Let D = Y − X be the difference between the Y and X measurements within a pair. Then D ∼
N(µD , σD ).
Because µD = µY − µX , can perform inference on µD .
Data for a matched pairs analysis:
Observe X values x1 , x2 , . . . , xn and Y values y1 , y2 , . . . , yn .
Compute the differences, d1 = y1 − x1 , d2 = y2 − x2 , . . ., dn = yn − xn .
Let d¯ and sD be the sample mean and standard deviation of the differences.
Inference for this two-sample problem has now been reduced to a 1-sample problem by analyzing
the within-pair differences.
10
Car
1
2
3
4
5
6
7
8
9
10
avg
sd
Mileage
With Additive
25.7
20.0
28.4
13.7
18.8
12.5
28.4
8.1
23.1
10.4
18.91
7.47
Mileage
Ordinary Gasoline
24.9
18.8
27.7
13.0
17.8
11.3
27.8
8.2
23.1
9.9
18.25
7.42
Difference
0.8
1.2
0.7
0.7
1.0
1.2
0.6
–0.1
0.0
0.5
0.66
0.443
0.6
0.4
0.0
0.2
Difference in car mileage
0.8
1.0
1.2
Car mileage differences with and without gasoline additive
100C% CI for µY − µX :
sD
d¯ ± t∗ √
n
where t∗ is the critical value from a t-distribution on n − 1 degrees of freedom.
Hypothesis testing:
In most cases, Ho : µY = µX , which is identical to Ho : µD = 0.
11
Test statistic:
t=
d¯ − µD
d¯
√ =
√ .
sD / n
sD / n
Under Ho , the test statistic has a t-distribution on n − 1 degrees of freedom.
To construct a 95% CI for the average increase due to the gasoline additive, we have n = 10,
d¯ = 0.66 and sD = 0.443.
For 95% confidence and df=9, we have t∗ = 2.262.
So the 95% confidence interval for µY − µX is given by
√ , 0.66 + 2.262 0.443
√ )
(0.66 − 2.262 0.443
10
10
(0.66 − 0.32, 0.66 + 0.32)
(0.34, 0.98)
We are 95% confident that the average increase in mileage due to the additive is between 0.34 and
0.98 miles per gallon.
As an aside, if we used the independent sample method (pooled procedure), the 95% CI would be
(−6.34, 7.66).
Example: Effects of alcohol on hypoxia.
Ten male subjects were taken to a simulated altitude of 25,000 ft and given tasks to perform. The
time (in seconds) at which useful consciousness ended was measured for each subject. Three
days later, the experiment was repeated with the same ten subjects 1 hour after subjects took
0.5 cc of 100-proof whiskey per pound of body weight. The time (in seconds) at which useful
consciousness ended when whiskey was ingested was then recorded. Does whiskey significantly
reduce the average time of useful consciousness (at the 0.05 level)?
The differences within each subject were: 76, 190, 590, 390, 65, –55, –5, 530, 175, and 0.
Let X be the “survival” time with whiskey, and let Y be the survival time without whiskey.
Let D = Y − X be the time difference for a subject (positive if whiskey reduces the time of consciousness). Assume D ∼ N(µD , σD ), where µD = µY − µX .
From these data, we can compute d¯ = 195.6 and sD = 230.53.
Solution:
Ho : µY = µX
↔
µD = 0
Ha : µY > µX
↔
µD > 0
Significance level: α = 0.05
12
Test statistic:
t=
d¯ − µD
195.6 − 0
√ = 2.68
√ =
sD / n
230.53/ 10
Computing a p-value:
On 9 degrees of freedom, the one-sided p-value is between 0.01 and 0.02 (because 2.398 < 2.68 <
2.821 from Table D).
Because the p-value is less than 0.05, we can conclude sufficient evidence that the whiskey is
associated with lower time of useful consciousness.
Why not act as if data came from CRD?
• Typically responses within pairs are much more similar than responses between pairs.
• For matched pairs design, it is usually true that the variability of y¯ − x
¯ is much smaller than
estimated by the “independent sample” method.
• Analysis of within-pair differences takes advantage of similarity within pairs
Inference for difference in proportions:
Usual situation:
• Experimental data with 2 treatments (treatment group and control group), or
• Survey data from stratified sample with 2 strata
Response variable for each unit is a binary categorical variable (“success” or “failure”). The information is summarized as sample proportions of success for each group.
Example: A study in 2001 sponsored by the National Sleep Foundation asked a random sample of
995 U.S. adults whether they snore. The data can be summarized by whether subjects were under
30 years old and 30 or over, and appear in the following table:
Age group
Snores
Does not snore
Under 30 y/o
48
136
30+ y/o
318
493
Among younger participants, the fraction of sample that snored was
pˆY = 48/(48 + 136) = 0.261
13
Among older participants, the fraction was
pˆO = 318/(318 + 493) = 0.392
What inferences can we make about the difference in the proportion of snoring in the U.S. population between younger and older people?
Notation:
Let
nX
= Sample size of control group
nY
= Sample size of treatment group
X = # of “successes” in control group
Y
= # of “successes” in treatment group
Also let
pˆX
= X/nX
= sample proportion of “successes” in X group
pˆY
= Y /nY
= sample proportion of “successes” in Y group
Assume
X ∼ B(nX , pX )
Y
∼ B(nY , pY )
Want to make inferences about pY − pX .
100C% CI for pY − pX :
s
(ˆ
pY − pˆX ) ± z ∗
pˆX (1 − pˆX ) pˆY (1 − pˆY )
+
nX
nY
where z ∗ is the critical value from Table D, df=z ∗ .
Comments:
• This confidence interval is approximate because we used normal probabilities to approximate binomial probabilities.
• The term
s
z
∗
pˆX (1 − pˆX ) pˆY (1 − pˆY )
+
,
nX
nY
is the margin of error of the confidence interval.
14
• The term
s
pˆX (1 − pˆX ) pˆY (1 − pˆY )
+
,
nX
nY
is the standard error of pˆY − pˆX .
Worthwhile comment:
Making inferences about parameters connected with binomial distributions never involves use of
the t-distribution even though we’re estimating a population standard deviation from data.
Example: To determine if absenteeism is more a problem among male or female workers at your
company, you obtain a stratified random sample of 130 female employees and 140 male employees. Looking at their records, you note that 12 of the females were absent from work for more
than five days last year, and 14 of the males were absent for more than five days. Construct a 90%
confidence interval for the difference in rates of absenteeism between male and female employees.
Solution: Let X be the number of male employees out of a sample of 140 that were absent for more
than five days last year, and let Y the be analogous number for women out of 130.
Then
X ∼ B(140, pX )
Y ∼ B(130, pY )
Want a 90% CI for pY − pX .
We have pˆX = 14/140 = 0.1, and pˆY = 12/130 = 0.0923. For 90% confidence, z ∗ = 1.645 (from
Table D, df = z ∗ ).
The confidence interval for pY − pX is computed as
(0.0923 − 0.1) ± 1.645×
s
0.1(1 − 0.1) 0.0923(1 − 0.0923)
+
140
130
= −0.0077 ± 0.059
= (−0.0667, 0.0513)
We are 90% confident that the difference in proportion of male absenteeism from women absenteeism is between –0.0667 and 0.0513.
Hypothesis testing:
In virtually all cases, Ho : pY = pX .
Possible alternative hypotheses:
15
• Ha : pY > pX ,
• Ha : pY < pX , or
• Ha : pY 6= pX .
Acting as though Ho is true, let pY = pX = p.
Pooled estimate of p:
x+y
nX + nY
where x and y are the observed number of “successes” in each group.
pˆ =
Test statistic:
(ˆ
pY − pˆX ) − (pY − pX )
(ˆ
pY − pˆX )
z= q
=q
pˆ(1 − pˆ)( n1X + n1Y )
pˆ(1 − pˆ)( n1X +
1
nY
)
Under Ho , the test statistic has a standard normal distribution, so the p-value can be computed
from Table A.
Example: A survey was performed to estimate the proportion of registered voters that vote. A
stratified random sample of 400 employed people and 450 unemployed people was obtained, all
of whom were registered to vote. Among the employed people, 262 voted in the last election, and
among the unemployed people, 244 voted. Test whether there is evidence at the α = 0.05 level if
the proportion of employed and unemployed voters in the population that voted last election is
the same.
Solution:
Let X be the number out of a sample of 400 employed people that voted last year, and let Y be the
number out of a sample of 450 unemployed people that voted last year.
Then
X ∼ B(400, pX )
Y ∼ B(450, pY )
Want to test
Ho : pY = pX
Ha : pY 6= pX
at the α = 0.05 significance level.
Calculate pˆX = 262/400 = 0.655 and pˆY = 244/450 = 0.542.
Pooled pˆ:
pˆ =
262 + 244
= 0.595.
400 + 450
16
Test statistic:
z =
q
=
q
= −
pˆY − pˆX
pˆ(1 − pˆ)( n1X +
1
nY
)
0.542 − 0.655
1
0.595(1 − 0.595)( 400
+
1
450 )
0.113
= −3.32
0.034
Calculate the p-value:
For this two-sided test,
p-value = P(Z < −3.32) + P(Z > 3.32)
= 0.0005 + 0.0005 = 0.001
Because 0.001 < 0.05, should reject Ho . We have significant evidence that the probability people
vote depends on whether they are employed.
Comments:
• Need to assume X and Y are independent binomially distributed random variables. In other
words, the data came from two independent binomial samples.
• Must have nX and nY large enough to use normal approximation. This means:
– nX pˆX and nX (1 − pˆX ) must both be greater than 10, and
– nY pˆY and nY (1 − pˆY ) must both be greater than 10.
For this course, the samples will be large enough so you don’t need to check.
Inference for pY − pX with matched pairs: McNemar’s procedure
Example: Data were collected on movie evaluations by Chicago film critics Roger Ebert and Gene
Siskel. The data consisted of 111 movies from April 1995 through September 1996.
Ebert
Siskel
Thumbs Down
Thumbs Up
Thumbs Down
24
10
Did Ebert and Siskel give “Thumbs up” equally often?
17
Thumbs Up
13
64
Notation:
Y
X
0
1
Total
0
w00
w10
n−y
1
w01
w11
y
Total
n−x
x
n
Notice pˆX = x/n and pˆY = y/n.
100C% CI for pY − pX :
µ
(ˆ
pY − pˆX ) ± z
∗
1√
w01 + w10
n
¶
Siskel and Ebert example: Suppose we want a 90% confidence interval for the difference in “thumbs
up” rate between Siskel and Ebert.
Letting X represent Siskel and Y represent Ebert, pˆX = (10 + 64)/111 = 0.667, and pˆY = (13 +
64)/111 = 0.694.
Also, we have n = 111, and for 90% confidence, z ∗ = 1.645. Finally, w01 = 13 and w10 = 10.
So,
¶
µ
1√
w01 + w10 =
(ˆ
pY − pˆX ) ± z
n
µ
¶
1 √
0.694 − 0.667 ± 1.645
13 + 10
111
= 0.027 ± 0.071 = (−0.044, 0.098)
∗
Thus we are 90% confidence that the difference in “thumbs up” rates between Siskel and Ebert is
between −0.044 and 0.098.
Worth noting that assuming independent samples, the confidence interval would be (−0.076, 0.130).
Hypothesis testing with paired samples:
Use same null and alternative hypotheses as with independent samples (always have Ho : pY =
pX , while Ha depends on the context of the problem).
Test statistic:
w01 − w10
z=√
w10 + w01
Under Ho , the test statistic has a standard normal distribution, so compute the p-value from Table A.
18
Note: Positive values of w01 − w10 (and therefore positive values of z) correspond to evidence that
pY − pX > 0; negative values correspond to evidence that pY − pX < 0.
Example: On the 1994 General Social Survey, respondents were asked whether a person has the
right to end his/her life if the person has an incurable disease, and whether a doctor can assist
in ending a person’s life if the person has in incurable disease. The data on 1825 respondents are
below:
Assisted suicide
Self-suicide
No
Yes
No
435
90
Yes
203
1097
At the α = 0.01 significance level, test whether there is a difference in self-suicide and assistedsuicide acceptability rates.
Solution: Let X be the number of respondents accepting self-suicide out of 1825, and Y be the
number of respondents accepting assisted-suicide out of 1825.
Ho : pY = pX
Ha : pY 6= pX
From the table, we have w01 = 203 and w10 = 90.
Test statistic:
√
203 − 90
w01 − w10
=√
z=√
= 113/ 293 = 6.60
w10 + w01
90 + 203
For this two-sided test, the p-value is
p-value = P(Z < −6.60) + P(Z > 6.60) = 2P(Z < −6.60)
From Table A, P(Z < −6.60) ≈ 0, so that the p-value is less than 0.01. We can reject the hypothesis
that the acceptability rates are the same for both types of suicide at the 0.01 level.
Comments:
• Both X and Y must come from binomial sampling, but need not be independent.
• Interestingly, for matched pairs, the standard error of pˆY − pˆX does not depend on the data
in which there is agreement.
• Standard errors based on matched pairs are generally smaller than based on independent
samples – this reflects the extra information in the study design.
• The sample sizes must be large enough to use the normal approximation. One rule of thumb
is to have the sum of w01 and w10 to be at least 25.
19