Download Report

Statistical Power And Sample Size Calculations................................................................................................. 1
When Do You Need Statistical Power Calculations, And Why? ................................................................... 1
Preparation For The Question “What Is Statistical Power?”.......................................................................... 1
Statistical Hypothesis Testing........................................................................................................................... 1
When Ho Is True And You Reject It, You Make A Type I Error................................................................... 2
When Ho is False And You Fail To Reject It, You Make A Type II Error. .................................................. 2
The Definition Of Statistical Power ................................................................................................................. 2
Calculating Statistical Power ............................................................................................................................ 3
How Do We Measure Effect Size?................................................................................................................... 3
Cohen's Rules Of Thumb For Effect Size........................................................................................................ 3
Calculating Cohen’s d ................................................................................................................................... 4
Calculating Cohen’s d from a t test .............................................................................................................. 4
Conventions And Decisions About Statistical Power ..................................................................................... 5
Considering Statistical Power When Reviewing Scientific Research ........................................................... 6
Statistical Power Analysis In Minitab.............................................................................................................. 6
Summary: Factors That Influence Power......................................................................................................... 7
Footnote.............................................................................................................................................................. 7
Using Minitab To Calculate Power And Minimum Sample Size....................................................................... 8
Example 1: Statistical power of a t-test on scores in 2 groups ....................................................................... 8
Example 2: Required sample size for a given power: 2 group comparison................................................... 9
Example 3: Example of calculating power for a one-way ANOVA............................................................10
Power and Sample Size Calculations for Other Designs and Tests .............................................................11
Sample Size Equations – Background Theory................................................................................................... 12
Determining The Necessary Sample Size For Estimating A Single Population Mean Or A Single
Population Total With A Specified Level Of Precision.................................................................................... 13
Table Of Standard Normal Deviates (ZD) for Various Confidence Levels .................................................14
Example: .......................................................................................................................................................... 15
Example: .......................................................................................................................................................... 16
Determining The Necessary Sample Size For Detecting Differences Between Two Means With Temporary
Sampling Units. ................................................................................................................................................... 16
Table of standard normal deviates for ZD ......................................................................................................17
Table of standard normal deviates for Zß ....................................................................................................... 17
Example: .......................................................................................................................................................... 17
Example: .......................................................................................................................................................... 19
Determining The Necessary Sample Size For Detecting Differences Between Two Means When Using
Paired Or Permanent Sampling Units. ............................................................................................................... 20
Table of standard normal deviates for ZD ......................................................................................................21
Table of standard normal deviates for Zß ....................................................................................................... 21
Example............................................................................................................................................................22
Example............................................................................................................................................................23
Example............................................................................................................................................................24
Determining The Necessary Sample Size For Estimating A Single Population Proportion With A Specified
Level Of Precision. ..............................................................................................................................................25
Table of standard normal deviates (ZD) for various confidence levels ........................................................26
Example: .......................................................................................................................................................... 26
Example: .......................................................................................................................................................... 27
Determining The Necessary Sample Size For Detecting Differences Between Two Proportions With
Temporary Sampling Units.................................................................................................................................28
Table of standard normal deviates for ZD ......................................................................................................28
Table of standard normal deviates for Zß ....................................................................................................... 29
Example: .......................................................................................................................................................... 29
Example: .......................................................................................................................................................... 31
Caveat ................................................................................................................................................................... 32
Bibliography.........................................................................................................................................................32
Sample Size Calculations in Clinical Research by Shein-Chung Chow, Jun Shao and Hansheng Wang 32
i
Sample size estimation: How many individuals should be studied? By Eng John .....................................33
Power and Sample Size Estimation in Research by Ajeneye Francis.......................................................... 33
Sample size determination by Dell RB, Holleran S, Ramakrishnan R ........................................................33
Statistical power and estimation of the number of required subjects for a study based on the t-test: A
surgeon's primer by Livingston EH, Cassidy L...........................................................................................34
Sample Size Correction Table for Single Parameter Estimates........................................................................ 35
ii
Statistical Power And Sample Size Calculations
When Do You Need Statistical Power Calculations, And Why?
People embarking on scientific projects in new fields need statistical power analysis in order to design their
studies - particularly to decide on how many cases are needed. A prospective power analysis is used before
collecting data, to consider design sensitivity - that is, the ability to detect what you are looking for at a time
when you can do something about it. For example, you can increase the design sensitivity by increasing the
sample size, or by taking measures to decrease the error variance, e.g., by controlling extraneous variables.
Thus, prospective power analysis is what you use when you are planning your own project.
Readers and reviewers of the scientific literature may need to draw on retrospective power analysis, in
order to know whether the studies they are interpreting were well enough designed - especially if these
report failures to reach statistical significance. For example, suppose you read a paper about a study in
which the authors had conducted an experiment and the data analysis did not reveal any statistically
significant results. You would need to know whether that study had ever had a chance of coming up with a
significant result - e.g., whether the investigators gathered a big enough sample. To do this you would need
to estimate whether the study had sufficient statistical power.
You can use Minitab to perform both prospective and retrospective power studies.
Preparation For The Question “What Is Statistical Power?”
The answer to this question depends on your having a clear understanding of the following technical terms:
•
the null hypothesis, (“Ho”),
•
significance level, .,
•
Type I error,
•
Type II error.
If you are very unsure about these, please refer to your own statistics notes and to your usual statistics
textbook - but let's briefly review these concepts as a preliminary to understanding statistical power:
Statistical Hypothesis Testing
When you perform a statistical hypothesis test, there are four possible outcomes. These outcomes depend
on:
1
•
whether the null hypothesis (Ho) is true or false, and
•
whether you decide either to reject, or else to retain, provisional belief in H o.
These outcomes are summarised in the following table:
Decision
Ho is really true i.e., there is
Ho is really false i.e., there
really no effect to find
really is an effect to be found
Retain Ho
correct decision: prob = 1 -. Type II error: prob = Reject Ho
Type I error: prob = .
correct decision: prob = 1 -
When Ho Is True And You Reject It, You Make A Type I Error.
(Translation: when there really is no effect, but the statistical test comes out significant by chance, you
make a Type I error.) When Ho is true, the probability of making a Type I error is called
.. This
DOSKD probability is the significance level associated with your statistical test.
When Ho is False And You Fail To Reject It, You Make A Type II Error.
(Translation: When, in the population, there really is an effect, but your statistical test comes out nonsignificant, due to inadequate power and/or bad luck with sampling error, you make a Type II error.)
When Ho is false, (so that there really is an effect there waiting to be found) the probability of making a
Type II error is called beta (
The Definition Of Statistical Power
Statistical power is the probability of not missing an effect, due to sampling error, when there really is an
effect there to be found.
In technical terms:
power is the probability (prob = 1 - ) of correctly rejecting Ho when it really is false.
2
Calculating Statistical Power
Power depends on:
1.
the sample size(s),
2.
the level of statistical significance required, and (here's the tricky bit!) on
3.
the minimum size of effect that it is reasonable to expect.
How Do We Measure Effect Size?
For a comparison of two groups, e.g., an experimental group with a control group, the measure of effect
size will probably be Cohen's d. This handy measure is defined as the difference between the means for the
two groups, divided by an estimate of the standard deviation in the population - often we use the average of
the standard deviations of the samples as a rough guide for the latter.
The reason why the issue of effect size is tricky is that, all too often in Psychology, we don't know how big
an effect we should expect. One of the good things about the recent development of power analysis is that it
throws us back to thinking about the appropriate psychological theory, which ought to tell us how big an
effect to expect. In Physics, people don't just predict that one variable will have a statistically significant
effect on another variable: they develop theories that predict how big the effect should be. It's good for us
to be forced to think how big the effect should be. But, as Psychology is so much more complex than
Physics, we often cannot do much more than guess at expected effect sizes. Often we end up saying “Well,
we can only afford to test so many subjects, so we will probably only be able to pick up an effect if it is
big.” - So, we make some guess at what would be a “big” effect size. Even this is still likely to be useful,
however, if the study is not powerful enough to pick up even a big effect, it is very unlikely to pick up a
small one.
Cohen (1992) gives useful rules of thumb about what to regard as a “big”, “medium” or “small” effect.
Cohen's Rules Of Thumb For Effect Size
Effect size
Correlation Difference between means
coefficient
“Small effect”
r = 0.1
d = 0.2 standard deviations
“Medium effect”
r = 0.3
d = 0.5 standard deviations
“Large effect”
r = 0.5
d = 0.8 standard deviations
3
The two commonest effect-size measures are Pearson's r and Cohen's d. Cohen's tutorial paper gives similar
rules of thumb for differences in proportions, partial correlations and ANOVA designs.
Cohen, J., (1992) A Power Primer. Psychological Bulletin112: 155-159
Calculating Cohen’s d
Notation
x −x
d= 1 2
sPooled
d
Cohen’s d effect size
x
Mean
s
Standard deviation
Subscript refers to the two conditions being compared
Notation
sPooled =
(n1 − 1)s12 + (n1 − 1)s22
n1 + n2 − 2
s
Standard deviation
n
Sample size
Subscript refers to the two conditions being compared
Calculating Cohen’s d from a t test
Notation
 n + n2  n1 + n2 


d = t  1
 n1n1  n1 + n2 − 2 
d
Cohen’s d effect size
t
t statistic
s
Standard deviation
Subscript refers to the two conditions being compared
2t
where n = n1 + n2 .
n−2
If the two sample sizes are approximately equal this becomes d ≈
If standard errors rather than standard deviations are available then
Notation
s = SE n
s
Standard deviation
SE
Standard error
n
Number of subjects
4
Conventions And Decisions About Statistical Power
For statistical significance, there is a convention that we will usually accept a 1 in 20 risk of making a
Type I error. Thus, we usually start thinking of findings as being statistically significant if they attain a
significance level of 0.05 or lower (i.e., risk of 1 in 20).
There is a similar rule of thumb for statistical power, but the acceptable risk of making a Type II error is
usually set rather higher. There is an overall implicit consensus that Type I errors, that lead to false
confirmations of incorrect predictions, are about four times as dangerous as Type II errors. Type II errors
lead to a false disbelief in effects that are actually real, but there seems more chance that these mistakes
will be corrected in due course, at less cost than the results of Type I errors.
The outcome of such considerations is that the conventional acceptable risk of a Type II error is often set at
1 in 5, i.e., a probability of 0.2. The conventionally uncontroversial value for “adequate” statistical power is
therefore set at 1 - 0.2 = 0.8. Another way of expressing this is to say that people often regard the minimum
acceptable statistical power for a proposed study as being an 80% chance of an effect that really exists
showing up as a significant finding.
If anyone (e.g., an ethical committee) asks you “What is the proposed power of your study”, you are on
fairly safe ground if you can reply “0.8”. But this is just a convention and it does depend, just like the
setting of significance levels, on weighing the relative costs of doing the study, plus the costs of the various
forms of getting the wrong answers, against the benefits of getting the right answer.
When you are deciding on acceptable values for . and , for a given study, you need to consider the
seriousness of each type of error. The more serious the error, the less often you will be willing to allow it to
occur. Therefore, you should demand smaller probability values for risks of more serious errors.
Ideally, you want to have high power to detect effects that you care about, and low power for any small
effect that would be meaningless. This will affect decisions about, e.g., the numbers of subjects to test in a
psychological experiment.
Example proposed in Minitab's Help pages: suppose you want to claim that children in your school scored
higher than the general population on a standardised achievement test. You need to decide how much
higher than the general population your test scores need to be so you are not making claims that are
misleading. If your mean test score is a mere 0.7 points higher than the general population, on a 100 point
test, do you really want to detect this as a significant difference? Probably not. (But for a note of caution,
see Rosenthal (1991), p. 263 - outlined below).
5
Considering Statistical Power When Reviewing Scientific Research
In this situation, we are usually considering the implications of a null finding. The researchers report,
perhaps, that A did not correlate with B significantly, or that there was no significant difference between
the experimental group and the control group.
They are unlikely to say this about their main findings, or they would probably never have got their paper
published. But if any substantive argument is ever being made on the basis of not finding any significant
effect (e.g., “no evidence that X is dangerous to health”, we should certainly be alert to whether or not the
power of the study was ever adequate for H o to be rejected.
More often, researchers make positive claims on the basis of null results when discussing any checks they
may have made concerning extraneous, or perhaps even potentially confounding, variables - e.g., whether
the experimental group and the control group showed statistically significant differences in intelligence
scores, social class, etc. In deciding whether to accept the fact that there was no significant difference
between groups as any kind of evidence that they were similar, we need to think about whether the
comparison had adequate statistical power.
Statistical Power Analysis In Minitab
Minitab provides power and sample size calculations under its main STAT menu.
It caters for the following procedures:
Stat > Power and Sample Size >
1-Sample Z
1-Sample t
2-Sample t
1 Proportion
2 Proportions
One-Way ANOVA
2-Level Factorial Design
Plackett-Burman Design
These facilities in Minitab are very easy to use, as should be evident from the accompanying Minitab
examples,
6
Summary: Factors That Influence Power
The following 3 sets of factors influence power:
•
Sample Size. As sample size increases, power increases.
•
Alpha, the probability that you are prepared to accept for making a Type I error (i.e., the level of
significance that you are prepared to use, e.g., 0.05.). As .WKHSUREDELOLW\ RI D7\SH I error, increases,
WKHSUREDELOLW\RID7\SH II error, decreases. Since power is 1 -DV.LQ FUHDVHVDQGWKHVLJQLILFDQFH
level gets less stringent, so statistical power to find a real effect power also increases.
Note that if you demand a very stringent level of significance, you are less likely to get a significant result
and your statistical power decreases. In Psychology, unduly high levels of stringency in the significance
testing of post-ANOVA comparisons has probably been a major cause of failure in finding predicted effects
that have really been there in the population (Rosenthal, 1991).
Rosenthal (1991) Psychosomatic Medicine 53: 247-271
•
s, the standard deviation, which gives an estimate of variability in the population. As s increases,
power decreases, because effects get lost in the “noise”.
•
the real size of the effect that we are looking for in the population. .
As the size of the effect in the population decreases, power decreases.
Footnote
Note of caution: Be aware that not all small effects are meaningless - e.g. in the study of changes in serious
risks, most effects are small, but they can still be very important. Rosenthal (1991) gives a telling example
in a study with very high statistical power (due to its sample comprising 22,071 cases) on the effect of
aspirin on the incidence of coronary heart disease in American physicians. The correlation between taking
aspirin and not having a heart attack came out as 0.034 (and that is the value of Pearson's r, not the
significance level!) but that was equivalent to a reduction of 4% in the incidence of heart attacks - a “weak”
effect that was far too strong to ignore. In fact, the American Medical Association closed the trial down
prematurely on the strength of these interim results, because it had shown that the risk associated with
being in the control group and not taking prophylactic aspirin was unacceptably large. The participants in
the study were members of the AMA.
7
Using Minitab To Calculate Power And Minimum Sample Size
Example 1: Statistical power of a t-test on scores in 2 groups
Suppose we have two samples, each with n = 13, and we that propose to use the 0.05 significance level,
Difference between means is 0.8 standard deviations (i.e., Cohen's d = 0.8)
1.
Click on the Stat Menu, select “Power and Sample size”, and from that select “2-sample t”. : A
dialogue box appears: If you now click on “Help”, you won't really need to read this document but carry on doing so if you want to find out now how easy it is.
2.
Go to the top section of the dialogue box, “Calculate power from sample size” Against “Sample
size:” enter 13. Against “Difference:” enter 0.8.
3.
Go to the bottom left section of the dialogue box, “Sigma”, which will hold your estimate of the
standard deviation in the population. Check that it contains the value 1.0.
4.
Click on the “Options” button and check that the default significance level of 0.05 is shown. Click
the Options Box “OK” button.
5.
Now that you have specified sample sizes of 13 and an effect size of 0.8 standard deviations, and
alpha = 0.05, you click on “OK”. You get the following output in the Session Window:-
MTB > Power;
SUBC>
TTwo;
SUBC>
Sample 13;
SUBC>
Difference 0.8;
SUBC>
Sigma 1.0.
Power and Sample Size
2-Sample t Test
Testing mean 1 = mean 2 (versus not =)
Calculating power for mean 1 = mean 2 + difference
Alpha = 0.05 Assumed standard deviation = 1
Difference
0.8
Sample
Size
13
Power
0.499157
The sample size is for each group.
8
Minitab responds that the power will be 0.4992. If, in the population, there really is a difference of 0.8
between the members of the two categories that would be sampled in the two groups, then using sample
sizes of 13 each will have a 49.92% chance of getting a result that will be significant at the 0.05 level.
The power value of approximately 0.5 is probably unacceptably low. We see from the Minitab output that
this is based on a 2-tailed test. If we are using power analysis, we probably know enough about what we are
doing to have a theory that predicts which group should have the higher scores, so perhaps a one-tailed test
is called for. To repeat the analysis, basing it on a 1-tailed test, we repeat the procedure but after clicking to
obtain the “Options” dialogue box, we change the “Alternative Hypothesis”.
Example 2: Required sample size for a given power: 2 group comparison
Suppose we had reason to expect that, as above, in the population, there is an effect waiting to be found,
with a magnitude of 0.8 standard deviations between groups. Suppose we intend doing a one-tailed test,
with significance level is 0.05.
1.
As above, pull down the Minitab Stat Menu, select “Power and Sample size”, and from that
select “2-sample t”.
2.
Click the radio button: “Calculate sample size for each power”. Against “Power values”, enter
0.8. Against “Difference”, enter 0.8. (It is a coincidence that in this example, both values are the
same.)
3.
Go to the bottom left section of the dialogue box, “Sigma”, which will hold your estimate of the
standard deviation in the population. Check that it contains the value 1.0.
4.
As before, click the dialogue box “Options” button and in the Options dialogue box select the
“Greater than” Alternative hypothesis radio button. Click the OK buttons.
MTB > Power;
SUBC>
TTwo;
SUBC>
Difference 0.8;
SUBC>
Power 0.8;
SUBC>
Sigma 1.;
SUBC>
Alternative 1.
3RZHUDQG6DPSOH6L]H
2-Sample t Test
Testing mean 1 = mean 2 (versus >)
Calculating power for mean 1 = mean 2 + difference
Alpha = 0.05 Assumed standard deviation = 1
9
Difference
0.8
Sample
Size
21
Target
Power
0.8
Actual Power
0.816788
The sample size is for each group.
Minitab’s output shows that for a target power of at least 0.8, there must be at least 21 cases in each of the
two groups tested. This will give an actual power of 0.8168.
Interestingly, this result does not tally precisely with the power estimate given in Table 2 of Cohen’s 1992
paper in Psychological Bulletin. Maybe Cohen was proposing a 2-tailed test. Calculating the number of
cases required for a power of 0.8, when the difference between group means is 0.8 standard deviations, if
the t-test is to be 2-tailed, is left as an exercise for the reader.
Example 3: Example of calculating power for a one-way ANOVA
Suppose you are about to undertake an investigation to determine whether or not 4 treatments affect the
yield of a product using 5 observations per treatment. You know that the mean of the control group should
be around 8, and you would like to find significant differences of +4. Thus, the maximum difference you
aUHFRQVLGHULQJLVXQLWV3UHYLRXVUHVHDUFKVXJJHVWVWKHSRSXODWLRQ1LV
1.
As above, pull down the Minitab Stat Menu, select “Power and Sample size”, and from that
select “One-way ANOVA”.
2.
In Number of levels, enter 4.
3.
In Sample sizes, enter 5.
4.
In Values of the maximum difference between means, enter 4.
5.
In Standard deviation, enter 1.64. Click the OK button.
MTB > Power;
SUBC>
OneWay 4;
SUBC>
Sample 5;
SUBC>
MaxDifference 4;
SUBC>
Sigma 1.64.
Power and Sample Size
One-way ANOVA
Alpha = 0.05
Assumed standard deviation = 1.64
10
Number of Levels = 4
SS
Means
8
Sample
Size
5
Power
0.826860
Maximum
Difference
4
The sample size is for each level.
To interpreting the results, if you assign five observations to each treatment level, you have a power of 0.83
to detect a difference of 4 units or more between the treatment means. Minitab can also display the power
curve of all possible combinations of maximum difference in mean detected and the power values for oneway ANOVA with 5 samples per treatment. The symbol on the curve represents the difference value you
specified.
Power and Sample Size Calculations for Other Designs and Tests
The methods are very similar for all the options that Minitab offers, such as differences between
proportions and ANOVA designs. It is pretty self-evident what to do once you know how to do it for the
t-test. The on-line help that you get by clicking the “Help” button in the power calculation dialogue box is
very good.
Cohen (1992) remains a very useful introductory guide to power and effect size, in less then 5 pages. It
includes a table with many useful power and effect size calculations already done for the reader
11
Sample Size Equations – Background Theory
Five different sample size equations are presented in this section:
Each separate description is designed to stand-alone from the others. Each discussion includes the sample
size equation, a description of each term in the equation, a table of appropriate coefficients, and a worked
example.
The examples included all refer to monitoring with a quadrat-based sampling procedure. The equations and
calculations also work with other kinds of monitoring data such as measurements of plant height, number
of flowers, or measures of cover.
For the equations that deal with comparing different sample means, all comparisons shown are for two-tail
tests. If a one-tail test is desired, double the false-change (Type , HUURU UDWH . DQG ORRN XS WKH QHZ
GRXEOHG
.YDOXHLQWKHWDEOHRIFRHIILFLHQWVHJXVH. = LQVWHDGR I. = 0.10 for a one-tailed test with a
false-change (Type ,HUURUUDWHRI. = 0.10).
The coefficients used in all of the equations are from a standard normal distribution (ZD and Zß) instead of
the t-distribution (tD and tß). These two distributions are nearly identical at large sample sizes but at small
sample sizes (n < 30) the Z coefficients will slightly underestimate the number of samples needed. The
correction procedure described for the first example (using the sample size correction table, below) already
adjusts the sample size using the appropriate t-value. For the other equations, tD and tß values can be
obtained from a t-table and used in place of the ZD and Zß coefficients that are included with the sample size
equations. The appropriate tD coefficient for the false-change (Type I) error rate can be taken directly from
WKH
. FROXPQ RI D W-table at the appropriate degrees of freedom (v). For example, for a false-change error
rate of 0.1 XVH WKH . = 0.10 column. The appropriate tß coefficient for a specified missed-change error
level can be looked up by calculating 2(1-SRZHU DQG ORRNLQJ XSWKDWYDOXH LQWKH DSSURSULDWH. FROXPQ For example, for a power of 0.90, the calculations for tß would be 2(1-.90) = 8VH WKH . = 0.20
column at the appropriate degrees of freedom (v) to obtain the appropriate t-value.
12
Determining The Necessary Sample Size For Estimating A Single
Population Mean Or A Single Population Total With A Specified Level Of
Precision.
Estimating a sample mean vs. total population size. The sample size needed to estimate confidence
intervals that are within a given percentage of the estimated total population size is the same as the sample
size needed to estimate confidence intervals that are within that percentage of the estimated mean value.
The instructions below assume you are working with a sample mean.
Determining sample size for a single population mean or a single population total is a two or three-step
process.
(1) The first step is to use the equation provided below to calculate an uncorrected sample size
estimate.
(2) The second step is to consult the Sample Size Correction Table appearing below these instructions
to come up with the corrected sample size estimate. The use of the correction table is necessary
because the equation below under-estimates the number of samples that will be needed to meet the
specified level of precision. The use of the table to correct the underestimated sample size is
simpler than using a more complex equation that does not require correction.
(3) The third step is to multiply the corrected sample size estimate by the finite population correction
factor if more than 5% of the population area is being sampled.
(1) Calculate an initial sample size using the following equation:
n=
Where:
13
Z α2 s 2
B2
n
The uncorrected sample size estimate.
ZD
The standard normal coefficient from the table below.
s
The standard deviation.
B
The desired precision level expressed as half of the maximum acceptable confidence
interval width. This needs to be specified in absolute terms rather than as a percentage.
For example, if you wanted your confidence interval width to be within 30% of your
sample mean and your sample mean = 10 plants/quadrat then B = 0.30 x 10 = 3.0.
D
Table Of Standard Normal Deviates (Z ) for Various Confidence Levels
Confidence level
.OHYHO
$OSKD
0.20
0.10
0.05
0.01
80%
90%
95%
99%
Z
D
1.28
1.64
1.96
2.58
(2) To obtain the adjusted sample size estimate, consult the correction table of these instructions. n is
the uncorrected sample size value from the sample size equation. n* is the corrected sample size
value.
(3) Additional correction for sampling finite populations. The above formula assumes that the
population is very large compared to the proportion of the population that is sampled. If you are
sampling more than 5% of the whole population then you should apply a correction to the sample
size estimate that incorporates the finite population correction factor (FPC). This will reduce the
sample size.
The formula for correcting the sample size estimate with the FPC for confidence intervals is:
n′ =
n*
1+
n*
N
Where:
n'
The new FPC-corrected sample size.
n*
The corrected sample size from the sample size correction table.
N
The total number of possible quadrat locations in the population. To calculate N,
determine the total area of the population and divide by the size of one quadrat.
14
Example:
Management objective: Restore the population of species Y in population Z to a density of at least 30
plants/quadrat by the year 2001
Sampling objective: Obtain estimates of the mean density and population size of 95% confidence intervals
within 20% of the estimated true value.
Results of pilot sampling:
Mean ( x ) = 25 plants/quadrat.
Standard deviation (s) = 7 plants.
Given: The desired confidence level is 95% so the appropriate ZD from the table above is 1.96. The desired
confidence interval width is 20% (0.20) of the estimated true value. Since the estimated true value is 25
plants/quadrat, the desired confidence interval (B) is 25 x 0.20 = 5 plants/quadrat.
Calculate an unadjusted estimate of the sample size needed by using the sample size formula:
n=
Z α2 s 2
B2
=
1.96 2 7 2
52
= 7.53
Round 7.53 plots up to 8 plots for the unadjusted sample size.
To adjust this preliminary estimate, go to the sample size correction table and find n = 8 and the
corresponding n* value in the 95% confidence level portion of the table. For n = 8, the corresponding value
is n* = 15.
The corrected estimated sample size needed to be 95% confident that the estimate of the population mean is
within 20% (±5 plants) of the true mean is 15 quadrats.
Additional correction for sampling finite populations: The above formula assumes that the population is
very large compared to the proportion of the population that is sampled. If you are sampling more than 5%
of the whole population area then you should apply a correction to the sample size estimate that
incorporates the finite population correction factor (FPC). This will reduce the sample size. The formula for
correcting the sample size is as follows:
15
The formula for correcting the sample size estimate with the FPC for confidence intervals is:
n*
n′ =
1+
n*
N
Where:
n'
The new FPC-corrected sample size.
n*
The corrected sample size from the sample size correction table.
N
The total number of possible quadrat locations in the population. To calculate N,
determine the total area of the population and divide by the size of one quadrat.
Example:
If the pilot data described above was gathered using a 1m x 10m (10 m2) quadrat and the total population
being sampled was located within a 20m x 50m macroplot (1000 m2) then N = 1000m2/10m2 = 100. The
corrected sample size would then be:
n′ =
n*
n*
1+
N
=
15
= 13.04
15
1+
100
The new, FPC-corrected, estimated sample size to be 95% confident that the estimate of the population
mean is within 20% (±5 plants) of the true mean is 13 quadrats.
Determining The Necessary Sample Size For Detecting Differences
Between Two Means With Temporary Sampling Units.
n=
(
2s 2 Z α + Z β
)2
MDC 2
Where:
16
n
The uncorrected sample size estimate.
s
sample standard deviation.
ZD
Z-coefficient for the false-change (Type I) error rate from the table below.
Zß
Z-coefficient for the missed-change (Type II) error rate from the table below.
MDC
Minimum detectable change size. This needs to be specified in absolute terms rather
than as a percentage. For example, if you wanted to detect a 20% change in the sample
mean from one year to the next and your first year sample mean is 10 plants/quadrat
then MDC is 0.20 x 10 = 2 plants/quadrat.
D
Table of standard normal deviates for Z
False-change (Type I) errorUDWH.
ZD
0.40
0.84
0.20
1.28
0.10
1.64
0.05
1.96
0.01
2.58
Table of standard normal deviates for Zß
Missed-change (Type II) error rate (ß)
Power
Zß
0.40
0.60
0.25
0.20
0.80
0.84
0.10
0.90
1.28
0.05
0.95
1.64
0.01
0.99
2.33
Example:
Management objective: Increase the density of species F at Site Y by 20% between 1999 and 2004.
Sampling objective: I want to be 90% certain of detecting a 20% in mean plant density and I am willing to
accept a 10% chance that I will make a false-change error (conclude that a change took place when it really
did not).
17
Results from pilot sampling:
Mean (x) = 25 plants/quadrat
Standard deviation (s) = 7 plants.
Given: The acceptable False-change error rate (.LV so the appropriate ZD from the table is 1.64.
The desired Power is 90% (0.90) so the Missed-change error rate (ß) is 0.10 and the appropriate Zß,
coefficient from the table is 1.28.
The Minimum Detectable Change (MDC) is 20% of the 1993 value or .20 x 25 = 5 plants/quadrat.
Calculate the estimated necessary sample size using the equation provided above:
n=
(
2s 2 Z α + Z β
)2
MDC 2
=
2 × 7 2 (1.64 + 1.28)2
52
= 33.42
Round up 33.42 to 34 plots.
Final estimated sample size needed to be 90% confident of detecting a change of 5 plants between 1993
and 1994 with a false-change error rate of 0.10 is 34 quadrats. The sample size correction table is not
needed for estimating sample sizes for detecting differences between two population means.
Correction for sampling finite populations: The above formula assumes that the population is very large
compared to the proportion of the population that is sampled. If you are sampling more than 5% of the
whole population area then you should apply a correction to the sample size estimate that incorporates the
finite population correction factor (FPC). This will reduce the sample size. The formula for correcting the
sample size estimate is as follows:
The formula for correcting the sample size estimate with the FPC for confidence intervals is:
n′ =
n*
1+
n*
N
Where:
18
n'
The new sample size based upon inclusion of the finite population correction factor
n*
The corrected sample size from the sample size correction table.
N
The total number of possible quadrat locations in the population. To calculate N,
determine the total area of the population and divide by the size of the sampling unit.
Example:
If the pilot data described above was gathered using a 1m x 10m (10 m2) quadrat and the total population
being sampled was located within a 20m x 50m macroplot (1000 m2) then N = 1000m2/10m2 = 100. The
corrected sample size would then be:
n′ =
n*
n*
1+
N
=
34
= 25.37
34
1+
100
Round up 25.37 to 26.
The new, FPC-corrected estimated sample size needed to be 90% certain of detecting a change of 5 plants
between 1993 and 1994 with a false-change error rate of 0.10 is 26 quadrats.
Note on the statistical analysis for two sample tests from finite populations. If you have sampled more
than 5% of an entire population then you should also apply the finite population correction factor to the
results of the statistical test. This procedure involves dividing the test statistic by the square root of the
finite population factor (1-n/N). For example, if your t-statistic from a particular test turned out to be 1.645
and you sampled n = 26 quadrats out of a total N = 100 possible quadrats, then your correction procedure
would look like the following:
t
t′ =
1−
n
N
=
1.645
1−
26
100
= 1.912
Where:
19
t
The t-statistic from a t-test.
t'
The corrected t-statistic using the FPC.
n
The sample size from the equation above.
N
The total number of possible quadrat locations in the population. To calculate N, determine
the total area of the population and divide by the size of each individual sampling unit.
You would need to look up the p-value of t' = 1.912 in a t-table at the appropriate degrees of freedom to
obtain the correct p-value for this statistical test.
Determining The Necessary Sample Size For Detecting Differences
Between Two Means When Using Paired Or Permanent Sampling Units.
When paired sampling units are being compared or when data from permanent quadrats are being
compared between two time periods, then sample size determination requires a different procedure than if
samples are independent of one another. The equation for determining the number of samples necessary to
detect some “true” difference between two sample means is:
n=
(
s 2 Zα + Z β
)2
MDC 2
Where:
n
The uncorrected sample size estimate.
s
sample standard deviation.
ZD
Z-coefficient for the false-change (Type I) error rate from the table below.
Zß
Z-coefficient for the missed-change (Type II) error rate from the table below.
MDC
Minimum detectable change size. This needs to be specified in absolute terms rather
than as a percentage. For example, if you wanted to detect a 20% change in the sample
mean from one year to the next and your first year sample mean is 10 plants/quadrat
then MDC is 0.20 x 10 = 2 plants/quadrat.
20
D
Table of standard normal deviates for Z
False-change (Type I) erroUUDWH.
ZD
0.40
0.84
0.20
1.28
0.10
1.64
0.05
1.96
0.01
2.58
Table of standard normal deviates for Zß
Missed-change (Type II) error rate (ß)
Power
Zß
0.40
0.60
0.25
0.20
0.80
0.84
0.10
0.90
1.28
0.05
0.95
1.64
0.01
0.99
2.33
If the objective is to track changes over time with permanent sampling units and only a single year of data
is available, then you will not have a standard deviation of differences between the paired samples. If you
have an estimate of the likely degree of correlation between the two years of data, and you assume that the
among sampling units standard deviation is going to be the same in the second time period, then you can
use the equation below to estimate the standard deviation of differences.
s diff =
(
s1
2 1 − corrdiff
)
Where:
sdiff
Estimated standard deviation of the differences between paired samples.
s1
Sample standard deviation among sampling units at the first time period.
corrdiff
Correlation coefficient between sampling unit values in the first time period and
sampling unit values in the second time period.
21
Example
Management Objective: Achieve at least a 20% higher density of species F at Site Y in unburned areas
compared to burned areas in 1999.
Sampling objective: I want to be able to detect a 90% difference in mean plant density in unburned areas
and adjacent burned areas. I want to be 90% certain of detecting that difference, if it occurs, and I am
willing to accept a 10% chance of detecting that difference, if it occurs, and I am willing to accept a 10%
change that I will make a false-change error (conclude that a difference exists when it really did not).
Results from pilot sampling: Five paired quadrats were sampled where one member of the pair was
excluded from burning and the other member of the pair was burned.
number of plants/quadrat
Difference between
Quadrat number
burned
unburned
1
2
3
1
2
5
8
3
3
4
9
5
4
7
12
5
5
3
7
4
x =4.20 s =1.92
x =7.80 s =3.27
x =3.60 s =1.67
MTB >
DATA>
DATA>
DATA>
DATA>
MTB >
MTB >
MTB >
MTB >
SUBC>
SUBC>
burned and unburned
set c1
2 5 4 7 3
set c2
3 8 9 12 7
name c1 'burned'
name c2 'unburned'
let c3 = 'unburned' - 'burned'
name c3 'difference'
Describe 'burned' 'unburned' 'difference';
Mean;
StDeviation.
Descriptive Statistics: burned, unburned, difference
Variable
burned
unburned
difference
Mean
4.200
7.80
3.600
StDev
1.924
3.27
1.673
Given: The sampling objective specified a desired minimum detectable difference (i.e., equivalent to the
MDC) of 20%. Taking the larger of the two mean values and multiplying by 20% leads to:
7.80 x 0.20 = MDC = 1.56 plants quadrat.
22
The appropriate standard deviation to use is 1.67, the standard deviation of the differences between the
pairs.
The acceptable False-FKDQJHHUURUUDWH .LV
so the appropriate ZD from the table is 1.64.
The desired Power is 90% (0.90) so the Missed-change error rate (ß) is 0.10 and the appropriate Zß
coefficient from the table is 1.28.
Calculate the estimated necessary sample size using the equation provided above:
n=
(
s 2 Zα + Z β
MDC 2
)2
=
1.67 2 (1.64 + 1.28)
2
1.56 2
= 9.77
Round up 9.77 to 10 plots.
Final estimated sample size needed to be 90% certain of detecting a true difference of 1.56 plants/quadrat
between the burned and unburned quadrats with a false-change error rate of 0.10 is 10 quadrats.
Example
Management objective: Increase the density of species F at Site Q by 20% between 1999 and 2002.
Sampling objective: I want to be able to detect a 20% difference in mean plant density of species F at Site
Q between 1999 and 2001. I want to be 90% certain of detecting that change, if it occurs, and I am willing
to accept a 10% chance that I will make a false-change error (conclude that a difference exists when it
really did not).
The procedure for determining the necessary sample size for this example would be very similar to the
previous example. Just replace “burned” and “unburned” in the data table with “1999” and “2002” and the
rest of the calculations would be the same. Because the sample size determination procedure needs the
standard deviation of the difference between two samples, you will not have the necessary standard
deviation term to plug into the equation until you have two years of data. The standard deviation of the
difference can be estimated in the first year if some estimate of the correlation coefficient between
sampling unit values in the first time period and the sampling unit values in the second time period is
available (see the sdiff equation above).
23
Correction for sampling finite populations: The above formula assumes that the population is very large
compared to the proportion of the population that is sampled. If you are sampling more than 5% of the
whole population area then you should apply a correction to the sample size estimate that incorporates the
finite population correction factor (FPC). This will reduce the sample size. The formula for correcting the
sample size estimate is as follows:
n′ =
n*
1+
n*
N
Where:
n'
The new sample size based upon inclusion of the finite population correction factor.
n*
The corrected sample size from the sample size correction table.
N
The total number of possible quadrat locations in the population. To calculate N,
determine the total area of the population and divide by the size of the sampling unit.
Example
If the pilot data described above was gathered using a 1m x 10m (10 m2) quadrat and the total population
being sampled was located within a 10m x 50m macroplot (500 m2) then N = 500m2/10m2 = 50. The
corrected sample size would then be:
n′ =
n*
1+
*
n
N
=
10
= 8.33
10
1+
50
Round up 8.33 to 9.
The new, FPC-corrected estimated sample size needed to be 90% confident of detecting a true difference of
1.56 plants/quadrat between the burned and unburned quadrats with a false-change error rate of 0.10 is 9
quadrats.
Note on the statistical analysis for two sample tests from finite populations. If you have sampled more than
5% of an entire population then you should also apply the finite population correction factor to the results
24
of the statistical test. This procedure involves dividing the test statistic by the square root of (1-n/N). For
example, if your t-statistic from a particular test turned out to be 1.782 and you sampled n = 9 quadrats out
of a total N = 50 possible quadrats, then your correction procedure would look like the following:
t′ =
t
n
1−
N
=
1.782
9
1−
50
= 1.968
Where:
t
The t-statistic from a t-test.
t'
The corrected t-statistic using the FPC.
n
The sample size from the equation above.
N
The total number of possible quadrat locations in the population. To calculate N, determine
the total area of the population and divide by the size of each individual sampling unit.
You would need to look up the p-value of t' = 1.968 in a t-table at the appropriate degrees of freedom to
obtain the correct p-value for this statistical test.
Determining The Necessary Sample Size For Estimating A Single
Population Proportion With A Specified Level Of Precision.
Determining the necessary sample size for estimating a single population proportion with a specified level
of precision.
The equation for determining the sample size for estimating a single proportion is:
n=
pqZ α2
d2
Where:
25
n
Estimated necessary sample size.
ZD
The coefficient from the table of standard normal deviates below.
p
The value of the proportion as a decimal percent (e.g., 0.45).
q
1-p
d
The desired precision level expressed as half of the maximum acceptable confidence
interval width. This is also expressed as a decimal percent (e.g., 0.15) and this represents
an absolute rather than a relative value. For example, if your proportion value is 30% and
you want a precision level of ± I0% this means you are targeting an interval width from
20% to 40%. Use 0.10 for the d-value and not 0.30 x 0.10 = 0.03.
D
Table of standard normal deviates (Z ) for various confidence levels
Confidence level
.OHYHO
$OSKD
D
(Z )
80%
0.20
1.28
90%
0.10
1.64
95%
0.05
1.96
99%
0.01
2.58
Example:
Management objective: Maintain at least a 40% frequency (in 1m2 quadrats) of species Y in population Z
over the next 5 years.
Sampling objective: Estimate percent frequency with 95% confidence intervals no wider than ± 10% of the
estimated true value.
Results of pilot sampling: The proportion of quadrats with species Z is estimated to be p = 65% (0.65).
Because q = (1-p), q = 1-.65 = 0.35.
Given: The desired confidence level is 95% so the appropriate Zæ from the table above is 1.96. The desired
confidence interval width (d) is specified as 10% (0.10).
26
Using the equation provided above:
n=
pqZ α2
=
d2
0.65 × 0.35 × 1.96 2
0.10 2
= 87.39
Round up 87.39 to 88.
The estimated sample size needed to be 95% confident that the estimate of the population percent
frequency is within 10% (±0.10) of the true percent frequency is 88 quadrats.
This sample size formula works well as long as the proportion is more than 0.20 and less than 0.80. If you
suspect the population proportion is less than 0.20 or greater than 0.80, use 2.20 or 0.8, respectively, as a
conservative estimate of the proportion.
Correction for sampling finite populations: The above formula assumes that the population is very large
compared to the proportion of the population that is sampled. If you are sampling more than 5% of the
whole population area then you should apply a correction to the sample size estimate that incorporates the
finite population correction factor (FPC). This will reduce the sample size. The formula for correcting the
sample size estimate is as follows:
n′ =
n*
1+
n*
N
Where:
n'
The new sample size based upon inclusion of the finite population correction factor.
n
The corrected sample size from the sample size correction table.
N
The total number of possible quadrat locations in the population. To calculate N,
determine the total area of the population and divide by the size of the sampling unit.
Example:
If the pilot data described above was gathered using a 1m x 1m (1 m2) quadrat and the total population
being sampled was located within a 25m x 25m macroplot (625 m2) then N = 625m2/1m2 = 625. The
corrected sample size would then be:
27
n′ =
n*
1+
*
n
N
=
88
= 77.13
88
1+
625
Round up 77.13 to 78.
The new, FPC-corrected, estimated sample size needed to be 95% confident that the estimate of the
population percent frequency is within 10% (± 0.10) of the true percent frequency is 78 quadrats.
Determining The Necessary Sample Size For Detecting Differences
Between Two Proportions With Temporary Sampling Units.
n=
(Z α + Z β )2 ( p1 q1 + p 2 q 2 )
( p 2 − p1 )2
Where:
n
Estimated necessary sample size.
ZD
Z-coefficient for the false-change (Type I) error rate from the table below.
Zß
Z-coefficient for the missed-change (Type II) error rate from the table below.
p1
The value of the proportion for the first sample as a decimal (e.g., 0.65).
q1
1 - p 1.
p2
The value of the proportion for the second sample as a decimal (e.g., 0.45).
q2
1 - p 2.
D
Table of standard normal deviates for Z
False-change (Type I) erroUUDWH.
ZD
0.40
0.84
0.20
1.28
0.10
1.64
0.05
1.96
0.01
2.58
28
Table of standard normal deviates for Zß
Missed-change (Type II) error rate (ß)
Power
Zß
0.40
0.60
0.25
0.20
0.80
0.84
0.10
0.90
1.28
0.05
0.95
1.64
0.01
0.99
2.33
Example:
Management objective: Decrease the frequency of invasive weed F at Site G by 20% between 1999 and
2001.
Sampling objective: I want to be 90% certain of detecting an absolute change of 20% frequency and I am
willing to accept a I0% chance that I will make a false-change error (conclude that a change took place
when it really did not).
Note that the magnitude of change for detecting change over time for proportion data is expressed in
absolute terms rather than in relative terms (relative terms where used in earlier examples that dealt with
sample means values). The reason absolute terms are used instead of relative terms relates to the type of
data being gathered (percent frequency is already expressed as a relative measure). Think of taking your
population area and dividing it into a grid where the size of each grid cell equals your quadrat size. When
you estimate a percent frequency, you are estimating the proportion of these grid cells occupied by a
particular species. If 45% of all the grid cells in the population are occupied by a particular species then
you hope that your sample values will be close to 45%. If over time the population changes so that now
65% of all the grid cells are occupied, then the true percent frequency has changed from 45% to 65%,
representing a 20% absolute change.
Results from pilot sampling: The proportion of quadrats with species Z in 1999 is estimated to be p 1 = 65%
(0.65).
Because q1 = 1-p1, q1 = 1-.65 = 0.35.
Because we are interested in detecting a 20% shift in percent frequency, we will assign p2 = 0.45. This
represents a shift of 20% frequency from 1999 to 2001. A decline was selected instead of an increase (e.g.,
from 65% frequency to 85% frequency) because sample size requirements are higher at the mid-range of
29
frequency values (i.e., closer to 50%) than they are closer to 0 or 100. Sticking closer to the mid-range
gives us a more conservative sample size estimate.
Because q1 = 1-q2, q1 = 1-0.45 = 0.55.
Given: The acceptable False-change error rate (.LV so the appropriate ZD from the table is 1.64.
The desired Power is 90% (0.90) so the Missed-change error rate (p) is 0. 10 and the appropriate Zß
coefficient from the table is 1.28.
Using the equation provided above:
n=
(Z α + Z β )2 ( p1q1 + p 2 q 2 ) (1.64 + 1.28)2 (0.65 × 0.35 + 0.45 × 0.55)
=
= 101.25
p 2 − p1
(0.45 − 0.65)2
Round up 101.25 to 102.
The estimated sample size needed to be 90% sure of detecting a shift of 20% frequency with a starting
frequency of 65% and a false-change error rate of 0.10 is 102 quadrats.
Correction for sampling finite populations: The above formula assumes that the population is very large
compared to the proportion of the population that is sampled. If you are sampling more than 5% of the
whole population area then you should apply a correction to the sample size estimate that incorporates the
finite population correction factor (FPC). This will reduce the sample size. The formula for correcting the
sample size estimate is as follows:
n′ =
n*
1+
n*
N
Where:
n'
The new sample size based upon inclusion of the finite population correction factor.
n
The corrected sample size from the sample size correction table.
N
The total number of possible quadrat locations in the population. To calculate N, determine
the total area of the population and divide by the size of the sampling unit.
30
Example:
If the pilot data described above was gathered using a 1m x 1m (1m2) quadrat and the total population
being sampled was located within a 10m x 30m macroplot (300 m2) then N = 300m2/1m2 = 300. The
corrected sample size would then be:
n′ =
n*
1+
=
*
n
N
102
= 76.11
102
1+
300
Round up 76.11 to 77.
The new, FPC-corrected estimated sample size needed to be 90% sure of detecting an absolute shift of 20%
frequency with a starting frequency of 65% and a false-change error rate o 0.10 - 77 quadrats.
Note on the statistical analysis for two sample tests from finite populations. If you have sampled more than
50% of an entire population then you should also apply the finite population correction factor to the results
of the statistical test. For proportioning data this procedure involves dividing the test statistic by ( 1 − nN ).
For example, if your $2 -statistic from a particular test turned out to be 2.706 and you sampled n-77
quadrats out of a total N = 300 possible quadrats, then your correction procedure would look like the
following:
′
2.706
χ2
χ2 =
=
77
n
1−
1−
300
N
Where:
2
$
χ2
7KH
′
2
2
$ - statistic from a $ - statistic -test.
2
$ - statistic using the FPC.
7KHFRUUHFWHG
n
The sample size from the equation above.
N
The total number of possible quadrat location in the population. To calculate N,
determine the total area of the population and divide by the size of each individual
sampling unit.
You would need to look up the p-value of $2 = 3.640 in a $2 table for the appropriate degrees of freedom to
obtain the corrected p-value for this statistical test.
31
Caveat
It is well known that statistical power calculations can be valuable in planning an experiment. There is also
a large literature advocating that power calculations be made whenever one performs a statistical test of a
hypothesis and one obtains a statistically non-significant result. Advocates of such post-experiment power
calculations claim the calculations should be used to aid in the interpretation of the experimental results.
This approach, which appears in various forms, is fundamentally flawed. We document that the problem is
extensive and present arguments to demonstrate the flaw in the logic.
The abuse of power: The pervasive fallacy of power calculations for data analysis, Hoenig JM, Heisey DM
American Statistician, 55:1, 19-24, 2001
See also
Statistical methods in psychology journals - Guidelines and explanations, Wilkinson, L; Task Force Stat
Inference, American Psychologist 54:8 594-604, 1999
The Incompleteness of Probability Models and the Resultant Implications for Theories of Statistical
Interference.Preview, Macdonald, Ranald R. Understanding Statistics, 1:3, 167-189, 2002
Some Myths and Legends in Quantitative Psychology.Preview, Grayson, Dave. Understanding Statistics,
3:2 101-134, 2004
Some practical guidelines for effective sample size determination, Lenth RV American Statistician, 55:3,
187-193, 2001
Bibliography
Sample Size Calculations in Clinical Research by Shein-Chung Chow, Jun Shao and
Hansheng Wang
Sample size calculation is usually conducted through a pre-study power analysis. The purpose is to
select a sample size such that the selected sample size will achieve a desired power for correctly
detection of a pre-specified clinically meaningful difference at a given level of significance. In
32
clinical research, however, it is not uncommon to perform sample size calculation with
inappropriate test statistics for wrong hypotheses regardless what study design is employed. This
book provides formulas and/or procedures for determination of sample size required not only for
testing equality, but also for testing non-inferiority/superiority, and equivalence (similarity) based
on both untransformed (raw) data and log-transformed data under a parallel-group design or a
crossover design with equal or unequal ratio of treatment allocations. It provides not only a
comprehensive and unified presentation of various statistical procedures for sample size
calculation that are commonly employed at various phases of clinical development, but also a
well-balanced summary of current regulatory requirements, methodology for design and analysis
in clinical research, and recent developments in the area of clinical development.
Chapman & Hall/CRC Biostatistics Series Volume: 20 ISBN: 9781584889823 ISBN 10:
1584889829
Sample size estimation: How many individuals should be studied? By Eng John
Radiology 227 (2): 309-313 May 2003
The number of individuals to include in a research study, the sample size of the study, is an
important consideration in the design of many clinical studies. This article reviews the basic
factors that determine an appropriate sample size and I provides methods for its calculation in
some simple, yet common, cases. Sample size is closely tied to statistical power, which is the
ability of a study to enable detection of a statistically significant difference when there truly is one.
A trade-off exists between a feasible sample size and adequate statistical power. Strategies for
reducing the necessary sample size while maintaining a reasonable power will also be discussed.
Power and Sample Size Estimation in Research by Ajeneye Francis
The Biomedical Scientist November 2006 988-990
Sample size determination by Dell RB, Holleran S, Ramakrishnan R
ILAR JOURNAL 43(4) 207-213 2002
Note the correction to Sample size determination (vol 43, pg 207, 2002) by Dell RB, Holleran S,
Ramakrishnan R
33
ILAR JOURNAL 44 (3): 239-239 2003
Statistical power and estimation of the number of required subjects for a study based on
the t-test: A surgeon's primer by Livingston EH, Cassidy L
Journal Of Surgical Research 126 (2): 149-159 Jun 15 2005
The underlying concepts for calculating the power of a statistical test elude most investigators.
Understanding them helps to know how the various factors contributing to statistical power factor
into study design when calculating the required number of subjects to enter into a study. Most
journals and funding agencies now require a justification for the number of subjects enrolled into a
study and investigators must present the principals of powers calculations used to justify these
numbers. For these reasons, knowing how statistical power is determined is essential for
researchers in the modern era. The number of subjects required for study entry, depends on the
following four concepts: 1) The magnitude of the hypothesized effect (i.e., how far apart the two
sample means are expected to differ by); 2) the underlying variability of the outcomes measured
(standard deviation); 3) the level of significance desired (e.g., α = 0.05); 4) the amount of
power desired (typically 0.8). If the sample standard deviations are small or the means are
expected to be very different then smaller numbers of subjects are required to ensure avoidance of
type 1 and 2 errors. This review provides the derivation of the sample size equation for continuous
variables when the statistical analysis will be the Student's t-test. We also provide graphical
illustrations of how and why these equations are derived.
34
Sample Size Correction Table for Single Parameter Estimates
Sample size correction table for adjusting “point-in-time” parameter estimates. n is the uncorrected sample
size value from the sample size equation. n* is the corrected sample size value. This table was created
using the algorithm reported by Kupper and Haffier (1989) for a one-sample tolerance probability of 0.90.
For more information consult Kupper, L.L. and K.B. Hafner. 1989. How appropriate are popular sample
size formulas? The American Statistician (43) 101-105.
80% Confidence Level
n* n
90% Confidence Level
n* n n* n
n* n
95% Confidence Level
n* n n* n
n* n
99% Confidence Level
n
n* n
n* n n* n
n* n
n*
1
5 51 65 101 120 1
5
51 65 101 120 1
5
51 66 101 121 1
6
51 67 101 122
2
6 52 66 102 121 2
6
52 66 102 122 2
7
52 67 102 122 2
8
52 68 102 123
3
7 53 67 103 122 3
8
53 67 103 123 3
8
53 68 103 123 3
9
53 69 103 124
4
9 54 68 104 123 4
9
54 69 104 124 4
10 54 69 104 124 4
11 54 70 104 125
5
10 55 69 105 124 5
11 55 70 105 125 5
11 55 70 105 125 5
12 55 72 105 126
6
11 56 70 106 125 6
12 56 71 106 126 6
12 56 71 106 126 6
14 56 73 106 128
7
13 57 71 107 126 7
13 57 72 107 127 7
14 57 72 107 128 7
15 57 74 107 129
8
14 58 73 108 128 8
15 58 73 108 128 8
15 58 74 108 129 8
16 58 75 108 130
9
15 59 74 109 129 9
16 59 74 109 129 9
16 59 75 109 130 9
18 59 76 109 131
10 17 60 75 110 130 10 17 60 75 110 130 10 18 60 76 110 131 10 19 60 77 110 132
11 18 61 76 111 131 11 18 61 76 111 131 11 19 61 77 111 132 11 20 61 78 111 133
12 19 62 77 112 132 12 20 62 78 112 132 12 20 62 78 112 133 12 22 62 79 112 134
13 20 63 78 113 133 13 21 63 79 113 133 13 21 63 79 113 134 13 23 63 80 113 135
14 22 64 79 114 134 14 22 64 80 114 134 14 23 64 80 114 135 14 24 64 82 114 136
15 23 65 80 115 135 15 23 65 81 115 135 15 24 65 81 115 136 15 25 65 83 115 138
16 24 66 82 116 136 16 25 66 82 116 136 16 25 66 83 116 137 16 26 66 84 116 139
17 25 67 83 117 137 17 26 67 83 117 137 17 26 67 84 117 138 17 28 67 85 117 140
18 27 68 84 118 138 18 27 68 84 118 138 18 28 68 85 118 139 18 29 68 86 118 141
19 28 69 85 119 140 19 28 69 85 119 140 19 29 69 86 119 141 19 30 69 87 119 142
35
80% Confidence Level
n
n* n
n* n
90% Confidence Level
n* n n* n
n* n
95% Confidence Level
n* n n* n
n* n
99% Confidence Level
n* n n* n
n* n
n*
20 29 70 86 120 141 20 29 70 86 120 141 20 30 70 87 120 142 20 31 70 88 120 143
21 30 71 87 121 142 21 31 71 88 121 142 21 31 71 88 121 143 21 32 71 89 121 144
22 31 72 88 122 143 22 32 72 89 122 143 22 32 72 89 122 144 22 34 72 90 122 145
23 33 73 89 123 144 23 33 73 90 123 144 23 34 73 90 123 145 23 35 73 92 123 146
24 34 74 90 124 145 24 34 74 91 124 145 24 35 74 91 124 146 24 36 74 93 124 147
25 35 75 91 125 146 25 35 75 92 125 147 25 36 75 92 125 147 25 37 75 94 125 148
26 36 76 93 126 147 26 37 76 93 126 148 26 37 76 94 126 148 26 38 76 95 126 149
27 37 77 94 127 148 27 38 77 94 127 149 27 38 77 95 127 149 27 39 77 96 127 150
28 38 78 95 128 149 28 39 78 95 128 150 28 39 78 96 128 150 28 41 78 97 128 151
29 40 79 96 129 150 29 40 79 96 129 151 29 41 79 97 129 151 29 42 79 98 129 153
30 41 80 97 130 151 30 41 80 97 130 152 30 42 80 98 130 152 30 43 80 99 130 154
31 42 81 98 131 152 31 42 81 99 131 153 31 43 81 99 131 154 31 44 81 100 131 155
32 43 82 99 132 154 32 44 82 100 132 154 32 44 82 100 132 155 32 45 82 101 132 156
33 44 83 100 133 155 33 45 83 101 133 155 33 45 83 101 133 156 33 46 83 103 133 157
34 45 84 101 134 156 34 46 84 102 134 156 34 46 84 102 134 157 34 48 84 104 134 158
35 47 85 102 135 157 35 47 85 103 135 157 35 48 85 103 135 158 35 49 85 105 135 159
36 48 86 104 136 158 36 48 86 104 136 158 36 49 86 104 136 159 36 50 86 106 136 160
37 49 87 105 137 159 37 49 87 105 137 159 37 50 87 105 137 160 37 51 87 107 137 161
38 50 88 106 138 160 38 50 88 106 138 161 38 51 88 106 138 161 38 52 88 108 138 163
39 51 89 107 139 161 39 52 89 107 139 162 39 52 89 107 139 162 39 53 89 109 139 164
40 52 90 108 140 162 40 53 90 108 140 163 40 53 90 108 140 163 40 55 90 110 140 165
41 53 91 109 141 163 41 54 91 110 141 164 41 54 91 110 141 164 41 56 91 111 141 166
42 55 92 110 142 164 42 55 92 111 142 165 42 56 92 111 142 165 42 57 92 112 142 167
43 56 93 111 143 165 43 56 93 112 143 166 43 57 93 112 143 166 43 58 93 114 143 168
44 57 94 112 144 166 44 57 94 113 144 167 44 58 94 113 144 168 44 59 94 115 144 169
45 58 95 113 145 168 45 58 95 114 145 168 45 59 95 114 145 169 45 60 95 116 145 170
46 59 96 115 146 169 46 60 96 115 146 169 46 60 96 116 146 170 46 61 96 117 146 171
47 60 97 116 147 170 47 61 97 116 147 170 47 61 97 117 147 171 47 62 97 118 147 172
48 61 98 117 148 171 48 62 98 117 148 171 48 62 98 118 148 172 48 64 98 119 148 173
49 62 99 118 149 172 49 63 99 118 149 172 49 63 99 119 149 173 49 65 99 120 149 174
50 64 100 119 150 173 50 64 100 119 150 173 50 65 100 120 150 174 50 66 100 121 150 175
36