Power and Sample Size

Power and Sample Size
Sample Size Determination
Sample size determination is the act of
choosing the number of observations or
replicates to include in a statistical sample.
Larger sample sizes generally lead to increased
precision when estimating unknown parameters.
In some situations, the increase in accuracy for
larger sample sizes is minimal, or even nonexistent. This can result from the presence of
systematic errors or strong dependence in the
data, or if the data follow a heavy-tailed
distribution. A study that collects too much data
is also wasteful. Therefore, before collecting
data, it is essential to determine the sample size
requirements of a study.
Sample Size Determination
Factors That Influence Sample Size
The "right" sample size for a particular
application depends on many factors, including
the following:
Cost considerations (e.g., maximum budget,
desire to minimize cost).
Administrative concerns (e.g., complexity of the
design, research deadlines).
Minimum acceptable level of precision.
Confidence level.
Variability within the population or
subpopulation (e.g., stratum, cluster) of
interest.
Sampling method.
Large-Sample Confidence
Interval for a Population Mean
The Confidence Interval is expressed more
generally as
x ± z
2
x
= x ± z
2
n
For samples of size > 30, the confidence interval
is expressed as
s
x±z 2
n
Requires that the sample used be random
Sample Size Determination
Sample size
determination for
100(1-1)% confidence
interval for 3
In order to estimate 3 with a
sampling error (this is equal
to half the confidence
interval width) and with
100(1-1)% confidence, the
sample size is found as
follows:
where is estimated by s or
R/4
z
/2
n=
n
2
(z /2 )
( SE )
= SE
2
2
Sample Size Determination
Ex. The mean inflation pressures of footballs is
13.5 pounds, but uncontrollable factors cause
the pressures of individual footballs
from 13.3 to 13.7 pounds. We wish to
estimate the mean inflation pressure to within
0.025 pound of the true value with a 99%
confidence interval. What sample size should
be used?
= 0.01, z
n=
(z
/2)
Range 13.7 13.3
=
= 0.1, SE = 0.025
4
4
2
( 2.575 ) 2 ( 0.1) 2
=
=106.09 107
2
( 0.025 )
/ 2 = 2.575 , =
2
( SE ) 2
Large-Scale Confidence Interval for
a Population Proportion
Sample size n is large if pˆ ± 3
0 and 1
pˆ
falls between
Confidence interval is calculated as
pˆ ± z
2
p
= pˆ ± z
2
x
where pˆ = and qˆ = 1
n
pq
n
pˆ
pˆ ± z
2
pˆ qˆ
n
Sample Size Determination
Sample size
determination for
100(1-1)% confidence
interval for p
In order to estimate a
binomial probability p with a
sampling error (this is equal
to half the confidence
interval width) and with
100(1-1)% confidence, the
sample size is found as
follows:
pq
z /2
= SE
n
2
( z / 2 ) ( pq )
n=
2
( SE )
Sample Size Determination
Ex. The following is a 90% Confidence interval for
p: (0.26,0.54). How large was the sample used
to construct this interval?
0.54 + 0.26
= 0.1, z / 2 =1.645, pˆ =
= 0.4 , SE = 0.54 0.4 = 0.14
2
2
2
( z / 2 ) ( pˆ qˆ ) (1.645 ) ( 0.4 )( 0.6 )
n=
=
=33.135 34
2
2
( SE )
( 0.14 )
Comparing Two Population
Means: Independent Sampling
Large Sample Confidence Interval for µ1 µ2
2
(x
)
x2 ± z
1
2
(
)
(x1 x 2 ) = x1 x 2 ± z
2
1
n1
+
assuming independent sampling, which provides the
following substitution
(x1
x2
)=
2
1
n1
+
2
2
n2
2
1
2
2
s
s
+
n1 n2
2
2
n2
Sample Size Determination
Sample size
determination for
100(1-1)% confidence
interval for 31-32
In order to estimate 31-32
with a margin of error (this is
equal to half the confidence
interval width) and with
100(1-1)% confidence, the
sample size is found as
follows:
z
/2
2
1
n
+
n1 = n2 = n =
2
2
= ME
n
2
(z /2 ) (
2
1 +
2
( ME )
2
2)
Sample Size Determination
Ex. Assuming that n1=n2 , find the sample size
needed to estimate 31-32 for a 90%
confidence interval of width 1.0. Assume that
2
2
=
5
.
8
,
1
2 = 7.5
= 0.1, z / 2 =1.645, ME = 0.5
( z / 2 ) 2 ( 12 + 22 ) (1.645 ) 2 ( 5.8+ 7.5 )
n1 = n2 =
=
=143.96 144
2
2
( ME )
( 0.5 )
Sample Size Determination
Sample size
determination for
100(1-1)% confidence
interval for p1-p2
In order to estimate p1-p2
with a margin of error ME
(this is equal to half the
confidence interval width)
and with 100(1-1)%
confidence, the sample size
is found as follows:
p1q1 p2 q2
z /2
+
= ME
n
n
( z / 2 ) 2 ( p1q1 + p2 q2 )
n1 = n2 = n =
( ME ) 2
µ1 µ 2
Sample Size Determination
Ex. Assuming that n1=n2=n , find the sample size
needed to estimate p1-p2 for a 90%
confidence interval of width 0.05. Assume that
there is no prior information available to obtain
approximate values of p1 and p2.
= 0.1, z / 2 =1.645, p1 = p2 = 0.5, ME = 0.025
( z / 2 ) 2 ( p1q1 + p2 q2 ) (1.645 ) 2 ( 0.25 + 0.25 )
n=
=
= 2164.82 2165
2
2
( ME )
( 0.025 )
µ1 µ 2
Power and Sample Size of Tests
Statistical power is the probability of correctly rejecting
a false null hypothesis when a specific alternate
hypothesis is true. Power analysis allows us to
determine how likely it is that a test of statistical
significance such as a z-test will support the claims. It
also can determine how many cases we need in our
sample to attain a specific level of statistical power.
Power analysis can be used to calculate the minimum
sample size required, In addition, the concept of power
is used to make comparisons between different
statistical testing procedures: for example, between a
parametric and a nonparametric test of the same
hypothesis.
Summary of Possible
Results
accept H-0
reject H-0
H-0 true
1-
H-0 false
1-
=type 1 error rate
=type 2 error rate
1- =statistical power
Standard Case
Sampling
P(T) distribution if
H0 were true alpha 0.05
Sampling
distribution if HA
were true
POWER = 1 -
T
c
µ1 µ 2
Upper tailed Z test
We will find value z because it is defined on the
c u0
null distribution by alpha. z =
n
rewrite this equation to solve for c = u0 + z
n
Then we find the value z as the label for the
standardized score corresponding to c
z z
c u a u0 u a
z =
=
+ z and n =
(u0 ua )
n
n
2
Power and Sample Size Calculation for z-test: Use to calculate a
hypothesis test of the mean when the population standard
deviation (,) is known
z z
u0 u a
+ z and n =
Upper tailed test : z =
(u0 ua )
n
2
z +z
2
u0 u a
Lower tailed test : z =
n
Two tailed test : z
L
u0 u a
=
n
z and n =
z
/2
and n =
u0 u a
+ z /2
zU =
n
Using Z table to find value, power = 1 -
(u0 ua )
z1
+z
(u0 ua )
2
/2
Suppose we wish to have a sample large enough to have power of
90% to detect 2a-20 = 4.0 where the standard deviation is 10.0,
using one-tailed Z test with alpha error rate of .01.
If n=82, what is the power?
Thia is upper tailed Z test
= 0.01, z = 2.326,
n=
z
z
(u0 u a )
2
=
= 1 - 0.9 = 0.1, z = 1.282,
1.282 2.326
4 / 10
2
= 81.36 82
If given n = 82,
u0 ua
4
+z =
+ 2.326 = 1.30
z =
n
10 82
= 0.5 0.4032 = 0.0968
power = 1 0.0968 = 0.9032
= 10
We determine whether a cereal box filling machine was derviating
from u = 12 ounces. H 0 : u = 12 vs H a : u 12. Calculate the power
of the test of the rejection region when ua = 11.9.
Assume n = 100,
= 0.05, z
z
z
L
U
=
/2
u0 u a
n
U
L
power = 1
= 0.05.
= 1.96, u0 = 12, ua = 11.9
z
u0 u a
=
+z
n
=
= 0.5,
/2
/2
12 11.9
1.96 = 0.04,
0.5 / 100
L
= 0.5160
12 11.9
=
+ 1.96 = 3.96,
0.5 / 100
U
= 0.99996
=
= 0.99996 0.5160 = 0.48396
= 1 0.48396 = 0.51604
If given power = 0.6, n = ?
(
z
n=
/ 2 + z1
)
2
(u0 ua ) 2
2
(1.96 + 0.253) 2 (0.5) 2
= 122.43 123
=
2
(12 11.9)
Power and Sample Size Calculation for t-test
An estimate of the population standard deviation (A). For
Power and Sample Size calculations, the estimate of A
(the population standard deviation or experimental
variability) depends on whether you have already
collected data.
· Prospective studies are done before collecting data so
A has to be estimated. You can use related research, pilot
studies, or subject-matter knowledge to estimate A.
· Retrospective studies are done after data have been
collected so you can use the data to estimate A.
For sample size calculations, the data have not been
collected yet so A has to be estimated. You can use
related research, pilot studies, or subject-matter
knowledge to estimate A.
Power and Sample Size Calculation for t-test: Use to calculate one
of the following for a one-sample t-test or paired t-test.
t t
u0 u a
+ t and n =
Upper tailed test : t =
(u0 ua ) s
s n
u0 u a
Lower tailed test : t =
s n
Two tailed test : t
L
u0 u a
=
s n
t and n =
t
/ 2 and n =
u0 u a
+ t /2
tU =
s n
Using T table to find value, power = 1 -
2
2
t +t
(u0 ua ) s
t1
+t
2
/2
(u0 ua ) s
Properties of 9
For fixed n, 1, and s, D decreases as
3a-30 increases.
For fixed n, s, 3a and 30, D increases as
1 decreases.
For fixed 1, s, 3a and 30, D decreases as
n increases.