Download Report

Power and Sample Size Calculations for Studies
Involving Linear Regression
William D. Dupont, PhD and Walton D. Plummer, Jr., BS
Department of Preventive Medicine, Vanderbilt University School of Medicine,
Nashville, Tennessee
ABSTRACT: This article presents methods for sample size and power calculations for studies
involving linear regression. These approaches are applicable to clinical trials designed
to detect a regression slope of a given magnitude or to studies that test whether the
slopes or intercepts of two independent regression lines differ by a given amount. The
investigator may either specify the values of the independent (x) variable(s) of the regression
line(s) or determine them observationally when the study is performed. In the latter
case, the investigator must estimate the standard deviation(s) of the independent variable(s). This study gives examples using this method for both experimental and observational study designs. Cohen’s method of power calculations for multiple linear regression
models is also discussed and contrasted with the methods of this study. We have posted
a computer program to perform these and other sample size calculations on the Internet
(see http://www.mc.vanderbilt.edu/prevmed/psintro.htm). This program can determine the sample size needed to detect a specified alternative hypothesis with the required
power, the power with which a specific alternative hypothesis can be detected with a
given sample size, or the specific alternative hypotheses that can be detected with a
given power and sample size. Context-specific help messages available on request make
the use of this software largely self-explanatory. Controlled Clin Trials 1998;19:589–
601  Elsevier Science Inc. 1998
KEY WORDS: Statistics, regression analysis, linear models, power calculations, sample size calculations,
linear regression
INTRODUCTION
Clinical investigators sometimes wish to evaluate a continuous response
measure in a cohort of patients randomized to one of several groups defined
by increasing levels of some treatment. In performing sample size and power
calculations for such studies, one reasonable approach models patient response
as a linear function of dose, and poses power calculations in terms of detecting
dose-response slopes of a given magnitude. Alternately, we may wish to evaluate the dose-response curves of two different treatments and test whether
slopes of these curves differ. This article provides an easily used, accurate
method for power and sample size calculations for such studies. We have
Address reprint requests to: William D. Dupont, PhD, Department of Preventive Medicine, Vanderbilt
University School of Medicine, A-1124 Medical Center North, Nashville, TN 37232-2637.
Received 20 June 1996; accepted 2 June 1998.
Controlled Clinical Trials 19:589–601 (1998)
 Elsevier Science Inc. 1998
655 Avenue of the Americas, New York, NY 10010
0197-2456/98/$19.00
PII S0197-2456(98)00037-3
590
W.D. Dupont and W.D. Plummer, Jr.
posted an interactive self-documented program to perform these calculations
on the Internet.
Other investigators have reviewed general methods for sample size and
power calculations [1–3]. Hintze [4] provided a method for designing studies to
detect correlation coefficients of specified magnitudes that uses a computational
algorithm of Guenther [5]. This method provides results that are perhaps less
easily understood than those based on regression slope parameters, because
many investigators can more readily interpret slopes than correlation coefficients. Kraemer and Thiemann [3] provide tables that permit exact sample size
calculations for studies designed to detect correlation coefficients of a given
magnitude. They also give formulas that permit using these tables for designs
involving linear regression. Although accurate, these methods are less convenient than those that we have incorporated into an interactive computer program. Cohen [2] provided more complex methods for designs involving multiple linear regression and correlation analysis. Later in this study we describe
these methods, which require expressing the alternative hypothesis in terms
of their effect on the multiple correlation coefficient [6]. Hintze [4] has written
software for deriving these calculations, but clinical investigators may find his
methods somewhat difficult to use and interpret. Goldstein [7] and Iwane et
al. [8] have reviewed other power and sample size software packages.
Simple Linear Regression
We study the effect of one variable on another by estimating the slope of
the regression line between these variables. For example, we might compare
the effects of a treatment at several dose levels. Suppose that we treat n patients,
that the jth patient has response yj after receiving dose level xj, and that the
expected value of yj given xj is g 1 lxj. To test the null hypothesis that l 5 0
against a two-sided alternative hypothesis with type I error probability a, we
must be able to answer the following three questions:
1. How many patients must we study to detect a specific alternative hypothesis l 5 la with power 1 2 b?
2. With what power can we detect a specific alternative hypothesis l 5 la
given observations on n study subjects?
3. What alternative values of la can we detect with power 1 2 b if we study
n patients?
Either observational or experimental studies may use this design. In the former,
both {xj} and {yj} are attributes of the study subjects, and we intend to determine
whether these two variables are correlated. In these studies, the investigator
must also estimate sx, the predicted standard deviation of xj in the patients
under study. In experiments, the investigator determines the values of {xj}.
Typically, xj denotes a drug dose given at one of K distinct values
w1, . . . , wK, with a proportion ck of the study subjects being assigned dose
level wk.
The degree of dispersion of the response values about the regression line
affects power and sample size calculations. A parameter that quantifies this
dispersion is s, the standard deviation of the regression errors. The regression
error for the jth observation is the difference between the observed and expected
Power Calculations for Linear Regression
591
Figure 1 In simple linear regression we obtain n pairs of observations {xj, yj}. We
assume that the expected value of the response yj is given by the linear
equation E(yj) 5 g 1 lxj. The jth regression error is the vertical distance
between the observed response yj and its expected value g 1 lxj.
response value for the jth subject. In other words, the regression error is the
vertical distance between the observed response yj and the true regression line
(see Fig. 1); s is the standard deviation of these vertical distances. The values
of s, sx, sy, l, and the correlation coefficient r are all interrelated. It is well
known [6] that:
l 5 rsy/sx
(1)
and it is easily shown that:
s 5 sy√1 2 r2 5 lsx√1/r2 2 1 5 √s2y 2 l2s2x
(2)
Thus, when r 5 1, the observations {xj} and {yj} are perfectly correlated and lie
on a straight line with slope sy/sx; the regression errors are all zero because
the observed and expected responses are always equal) and hence s 5 0. When
r 5 0, xj and yj are uncorrelated, the expected regression line is flat (l 5 0), and
the standard deviation of the regression errors equals the standard deviation of
yj (i.e., s 5 sy). Figure 2 illustrates the relationship between these parameters
when 0 , r , 1. This figure shows simulated data for patients given treatments
A and B under the assumption that the two treatments have identical means
and standard deviations of the independent and response variables. They differ
in that the correlation coefficient between response and independent variables
is 0.9 for treatment A (black dots) and 0.6 for treatment B (open circles). Consequently, the response to treatment A are more closely clustered around their
(black) regression line than the response to treatment B (gray). Thus, the average
regression error is less for treatment A than for treatment B and, hence, s, the
standard deviation of these errors, is less for treatment A than for treatment
592
W.D. Dupont and W.D. Plummer, Jr.
Figure 2 This figure illustrates the relationship in simple linear regression between r,
the correlation coefficient between the independent and response variables;
the regression errors (see Fig. 1); and s, the standard deviation of the regression errors. Higher values of r imply smaller regression errors, which, in
turn, imply smaller values of s (see text).
B. Power or sample size calculations require estimates of sx, l, and s. It is often
difficult to estimate s directly; however, we can obtain indirect estimates of s
using equation (2) whenever we are able to estimate r or sy. We derive power
and sample size formulas for simple linear regression in the Appendix.
Contrasting Two Linear Regression Lines
Suppose that we want to compare the slopes and intercepts of two independent regression lines. For example, we might wish to compare the effects of
two different treatments at several dose levels. Suppose that treatments 1 and
2 are given to n1 and n2 patients, respectively, and that the jth subject who
receives treatment i (i 5 1 or 2) has response yij to treatment at dose level xij,
where the expected value of yij is gi 1 lixij. We want to determine whether
the response to the treatments differ. Specifically, we intend to test the null
hypotheses that g1 5 g2 and l1 5 l2. In this case, we must answer the three
questions given earlier for alternative hypotheses concerning the magnitude
of the differences in the y intercept and slope parameters for these two treatments. We derive power and sample size formulas for two treatment linear
regression problems in the Appendix.
COMPUTER SOFTWARE
We have written a computer program to implement these and other sample
size and power calculations [1] and have posted it, together with program
Power Calculations for Linear Regression
593
documentation, on the Internet. The program runs under either Windows 95
or Windows NT operating systems. To obtain free copies open the http://
www.mc.vanderbilt.edu/prevmed/psintro.htm page on the World Wide Web
and follow instructions. The program, named PS, has a graphical user interface
with hypertext help messages that make the use of the program largely selfexplanatory. It can answer the three questions given in the Introduction for
each study design considered by this software. It can also generate graphs of
sample size versus power, sample size versus detectable alternative hypotheses,
or power versus detectable alternative hypotheses. It is written in Visual Basic
[9] and Fortran 90 [10] and uses the First Impression graphics control [11].
EXAMPLES
Linear Regression in an Observational Study
A dieting program encourages patients to follow a specific diet and to
exercise regularly. We want to determine whether the actual average time per
day spent exercising is related to body mass index (BMI, in kilograms per
square meter) after 6 months on this program. Previous experience suggests
that the exercise time of participants has a standard deviation of sx 5 7.5
minutes. Kuskowska-Wolk et al. [12] reported that the standard deviation of
the BMI for their female study subjects was sy 5 4.0 kg/m2. We have n 5 100
women willing to follow this program for 6 months. We want to determine
the power with which we can detect a true drop of BMI of la 5 20.0667 kg/m2
per minute of exercise. (This would imply that the average BMI of participants
who exercised half an hour a day would be 2 kg/m2 less than those who did
not exercise at all. We use the PS program to determine the power with which
the alternative hypothesis la 5 20.0667 can be detected with type I error
probability a 5 0.05 as follows: choose linear regression with one treatment;
specify that the investigator does not choose the treatment levels; enter sx 5
7.5 for the standard deviation of the independent variable; indicate that we
want to determine the power of the proposed study and that we will provide
an estimate of sy; and enter a 5 0.05, la 5 20.0667, sy 5 4.0, and n 5 100. The
PS program then calculates that 100 women yield a power of only 0.24 for
detecting this alternative hypothesis. Thus, the planned study would be insufficient to detect reliably a true slope of this magnitude. The user may experiment
with different values of la, sy, and n to determine the sensitivity of the derived
power to changes in these parameter values.
The units of measurement of the response variable affect the magnitude of
both sy and la. Thus, of BMI is measured in grams per square meter, then sy
becomes 4000 and la becomes 266.7. Substituting these two values into the
preceding power calculation, of course, leaves the power unchanged.
Linear Regression in an Experimental Study
Siber et al. [13] studied impaired antibody response to pneumococcal vaccine
after treatment for Hodgkin’s disease. Seventeen patients treated with subtotal
radiation received pneumococcal vaccine from 8 to 51 months later. A linear
regression of natural log antibody concentration on the time interval between
594
W.D. Dupont and W.D. Plummer, Jr.
treatment and vaccination suggested that log antibody concentration increased
with increasing time interval between treatment and vaccination. Siber’s group
estimated the slope parameter for this regression to be lˆ 5 0.01 (p 5 0.11) and
the correlation coefficient to be r 5 0.40.
Suppose that we want to use these results as pilot data for a new study
designed to detect the true alternative hypothesis that la 5 0.01 with power
1 2 b 5 0.90 and type I error probability a 5 0.05. We might decide to assign
patients at random to receive vaccine at either w1 5 10, w2 5 30, or w3 5 50
months after radiation therapy. That is, we consider a study of K 5 3 treatment
levels (vaccination delay times), with equal proportions of patients vaccinated
after each delay interval (c1 5 c2 5 c3 5 1/3). To use the PS program, we choose
linear regression with one treatment; specify equal allocation of the treatment
levels to the three times 10, 30, and 50 months; indicate that we intend to
determine the sample size and that we will provide an estimate of the correlation
coefficient r; and enter a 5 0.05, 1 2 b 5 0.90, la 5 0.01, and r > r 5 0.40.
Using these values in the PS program gives a sample size of n 5 57 patients
needed to detect a true value of l 5 0.01 with 90% power, a 5 0.05, and
patients equally allocated to receive vaccinations at 10, 30, and 50 months after
radiation therapy.
Comparing Slopes of Two Linear Regression Lines
Armitage and Berry ([6], Table 9.4) gave the age and pulmonary vital capacity
for 28 cadmium industry workers with less than 10 years of cadmium exposure
and for 44 workers never exposed to cadmium. The standard deviations of the
ages of those unexposed and exposed were sx1 5 12.0 and sx2 5 9.19, respectively. Regressing vital capacity on age in these two groups gives slope estimates
of lˆ 1 5 20.0306 and lˆ 2 5 20.0465 liters per year of life in unexposed and
exposed workers, respectively (i.e., a typical exposed worker loses 46.5 mL of
vital capacity per year). The standard errors of lˆ 1 and lˆ 2 are 0.00754 and 0.0113,
respectively; the residual mean squares from the unexposed and exposed regressions are 0.352 and 0.293, respectively. From equation (9.17) of Armitage
and Berry [6], the pooled estimate of the error variance from both groups is
s2 5 0.329, and hence s 5 0.574. The estimated difference in slope estimates,
lˆ 2 2 lˆ 1 5 20.0159, is not significantly different from zero (p 5 0.26) {[6] Equation
(9.19)}. Suppose that we want to recruit enough workers to detect a true difference of l2 2 l1 5 20.0159 in these two groups, with 80% power, type I error
probability a 5 0.05, and a ratio of unexposed to exposed workers m 5 44/
28 5 1.57. Applying the PS program, we choose linear regression with two
treatments, specify that the investigator does not choose the treatment levels,
enter sx1 5 12.0 and sx2 5 9.19 for the standard deviation of the independent
variable (age) in the control (unexposed) and experimental (exposed) groups,
respectively; indicate that we will provide an estimate of the standard deviation
of the regression errors, that we wish to calculate sample size, and that we
want to compare slopes; and enter a 5 0.05, 1 2 b 5 0.80, l2 2 l1 5 20.0159,
s 5 0.574, and m 5 1.57. The program responds that the required experimental
treatment sample size is 166. Hence, if we recruit 427 workers, 166 workers
with less than 10 years of cadmium exposure and 1.57 3 166 5 261 unexposed
Power Calculations for Linear Regression
595
workers we will have 80% power to detect a difference in the rate of loss of
vital capacity with age of 20.0159 L/yr of life.
LINEAR REGRESSION USING THE PASS PROGRAM
One of the most popular commercially available power and sample size
programs, PASS 6.0 [4, 8], provides a general approach to power calculations
for multiple linear regression using the method of Cohen [2]. Let:
yj 5 g 1 l1x1j 1 l2x2j 1 · · · 1 lkxkj 1 ej : j 5 1, . . . , J
(3)
denote a conventional multiple linear regression model in which the jth patient
has a response variable yj and k covariates {x1j, x2j, · · · xkj}. We intend to test the
null hypothesis that l1 5 l2 5 · · · 5 lp 5 0 for some p < k. Under this null
hypothesis the regression model, equation (3), reduces to:
yj 5 g 1 lp11xp11,j 1 lp12xp12,j 1 · · · 1 lkxkj 1 ej : j 5 1, . . . , J
(4)
Cohen provides an F statistic to test this null hypothesis that is based on the
multiple correlation coefficients RT and R0 from equations (3) and (4), respectively. PASS [4] uses this test to determine the power with which we are likely
to reject the null hypothesis given a true alternative hypothesis that is expressed
in terms of D 5 R2T 2 R20. The next sections describe the applicability of PASS
to the examples of the present work. Table 1 gives the required input and
output for these examples.
Using PASS for the BMI and Vaccination Examples
The correlation coefficient module of the PASS program is also applicable
for simple linear regression when the data have a bivariate normal distribution
[4, 5]. In the body mass index example given earlier, we calculated the power
to detect la 5 20.0667, given sx 5 7.5 and sy 5 4.0. From equation (1) we see
that this is equivalent to testing the alternative hypothesis that r 5 lsx/sy 5
0.125 against the null hypothesis that r 5 0. Entering this value into the correlation coefficient module of the PASS program with a two-tailed type I error
probability, a 5 0.05, a null hypothesis, r0 5 0, and a sample size of n 5 100
gives a power of 0.24 to detect ra 5 0.125. This is the same power obtained
with the PS program.
The PASS correlation coefficient module is not applicable to the vaccination
delay time example because the delay times are not normally distributed. This
nonnormality arises from the fact that Hintze–Guenther method [4, 5] used by
this program assumes that the independent and dependent variables have a
bivariate normal distribution. For experimental data, however, the independent
variable is rarely normally distributed. Instead, trials usually assign a fixed,
often equal number of patients to, say, low, medium, and high treatment levels
as in the vaccine example. In this case, these treatment levels are clearly not
normally distributed.
Sample size n 5 57
Not applicable
Exposed sample size n 5 166, total sample size J 5 427
Power 1 2 b 5 0.807
w1 5 10, w2 5 30, w3 5 50, c1 5 c2 5 c3 5 1/3,
a 5 0.05, 1 2 b 5 0.9, la 5 0.01, r 5 0.4
Not applicable
sx1 5 12.0, sx2 5 9.19, a 5 0.05, 1 2 b 5 0.80,
l2 2 l1 5 20.0159, s 5 0.574, m 5 1.57
p 5 1, D 5 0.0128, k 2 p 5 2, R20 5 0.3115,
J 5 427, a 5 0.05
PS
PASS
PS
PASSc
b
Power 1 2 b 5 0.24
PASS
PS
a 5 0.05, r0 5 0, ra 5 0.125, n 5 100
b
Output
Power 1 2 b 5 0.24
Inputa
sx 5 7.5, a 5 0.05, la 5 20.0667, sy 5 4, n 5 100
Program
See text for definitions.
PASS correlation coefficient module.
c
PASS multiple regression module.
a
Cadmium exposure
Vaccination delay time
Body mass index
Example
Table 1 Input and Output Needed by PS and PASS Programs for Examples Considered in the Present Study
596
W.D. Dupont and W.D. Plummer, Jr.
Power Calculations for Linear Regression
597
Using PASS for the Cadmium Exposure Example
To use the PASS program for the cadmium exposure example just discussed
we first combine the data from the unexposed and exposed workers into a
single multiple linear regression model. Let
x2j 5
51:0: jj
th
th
worker was exposed
worker not exposed
also let x3j be the age when the jth patient’s vital capacity yj is measured, and
x1j 5 x2j 3 x3j. The model:
yj 5 g 1 l1x1j 1 l2x2j 1 l3x3j 1 ej
(5)
reduces to yj 5 g 1 l3x3j 1 ej and yj 5 (g 1 l2) 1 (l1 1 l3)x3j 1 ej for unexposed
and exposed workers, respectively. Hence, in this model, l1 represents the
difference in the rate of decline in vital capacity between exposed and unexposed workers, and testing whether this rate is the same in both groups is
equivalent to testing the null hypothesis that l1 5 0. Analyzing equation (5)
with the vital capacity data set gives an estimate of l1 5 20.0159 with R2T 5
0.3243. (Note that this estimate of l1 equals the difference of the slope estimates
from the simple linear regressions given earlier). Under the null hypothesis,
equation (5) reduces to:
yj 5 g 1 l2x2j 1 l3x3j 1 ej
(6)
with a single slope parameter l3 for both exposure groups. This model gives
R20 5 0.3115. Hence the increase in R2 from equation (6) to equation (5) is D 5
0.3243 2 0.3115 5 0.0128. Suppose we have access to an additional 427 workers,
166 of whom were exposed for less than 10 years, with the remainder unexposed. Entering p 5 1, D 5 0.0128, k 2 p 5 2, and R20 5 0.3115, J 5 427, and
a 5 0.05 in the multiple regression module of the PASS program gives a power
estimate of 0.808, which is comparable to the results of the PS program given
earlier. (When running the PASS program, enter p and D in the Variables to be
tested frame, k 2 p and R20 in the Variables controlled for frame, and zeros in both
fields of the C: Variables removed frame. Positive values of these last two fields
are used in power calculations for certain complex study designs and hypotheses that Cohen refers to as case 2 analyses [2].) Thus, in this example, the PS
and PASS programs produce very similar results; they differ in their requirements for input of parameters. Also, this module of the PASS program does
not facilitate direct sample size calculations.
DISCUSSION
The chief advantage of Cohen’s method of power calculations for multiple
linear regression is its flexibility. It may be used to perform power calculations
for a very wide range of linear regression problems and null hypotheses. This
method, however, has three disadvantages that restrict its use:
1. The pilot data needed for Cohen’s method is often unavailable. Suppose that
the literature provides an estimate of the slope of a linear regression of
weight loss against hours of exercise per week for normal (control) subjects. Estimates of the standard deviations of these subjects’ weight loss
598
W.D. Dupont and W.D. Plummer, Jr.
and exercise time are also available. You believe that the rate of weight
loss per hour of exercise for patients on an experimental treatment will
differ from that for control subjects and intend to determine an appropriate
sample size for an experimental comparing experimental and control
treatments. Our method allows the alternative hypothesis to be naturally
specified in terms of a difference in loss-per-hour slope estimates. The user
can enter estimates of the control slope and control standard deviations
directly into the PS program to obtain the sample size needed for the
desired power. Cohen’s method, however, cannot be used for this example. His method requires estimates of the multiple correlation coefficients
RT and R0 under equations (5) and (6) of the present study. It is unlikely
that these statistics will be published in the literature, For this reason,
Cohen’s method almost always requires complete pilot data on both
experimental and control subjects to calculate RT and R0. This was the case
in the cadmium example presented in this work. Frequently, however, we
want to perform power calculations on the basis of data from the literature
or on pilot data that consist of the control data only. In such situations,
our method works well but Cohen’s is unusable.
2. It is difficult to interpret the results of Cohen’s method. In Cohen’s method
the alternative hypothesis is stated in terms of D 5 R2T 2 R20. This statistic
has little intuitive meaning to either clinicians or grant reviewers. In
contrast, specifying the alternative hypothesis in terms of a slope difference—say 0.5 kg per hour of exercise—is easier to comprehend.
3. Some investigators may find Cohen’s method difficult to use. To use Cohen’s
method, the investigator must first known how to set up equations (3)
and (4) and then how to run the linear regressions needed to derive R2T
and R20. In contrast, to use our method the investigator need only understand the basic concepts of statistical power and significance, and the
simple linear models described in the Introduction.
SOFTWARE ACCURACY
We have written Excel spreadsheets that evaluate Appendix equations (A1)
and (A2) for the different cases considered in this study. These spreadsheets
provide independent confirmation that the PS program has correctly implemented our formulas. The fact that the PS and PASS programs give very similar
answers to the cadmium and body mass index examples using very different
methods is evidence that both programs have been coded correctly.
This work was supported by NIH RO1 Grants CA50468, HL19153, and LM06226 and NCI Center
Grant CA68485. We thank Drs. W.A. Ray, O.B. Crofford, G.W. Reed, M.D. Decker, G.R. Bernard,
M.R. Griffin, and R.I. Shorr for their helpful suggestions.
REFERENCES
1. Dupont WD, Plummer WD. Power and sample size calculations: a review and
computer program. Controlled Clin Trials 1990;11:116–128.
2. Cohen J. Statistical Power Analysis for the Behavioral Sciences 2nd ed. Hillsdale, NJ:
Lawrence Erlbaum; 1988.
Power Calculations for Linear Regression
599
3. Kraemer HC, Thiemann S. How Many Subjects? Statistical Power Analysis in Research.
Newbury Park, CA; 1987.
4. Hintze JL. PASS 6.0 User’s Guide. Kaysville, UT: NCSS Dr. Jerry L. Hintze; 1996.
5. Guenther W. Desk calculation of probabilities for the distribution of the sample
correlation coefficient. Am Statistician 1977;31:45–48.
6. Armitage P, Berry G. Statistical Methods in Medical Research 3rd ed. Oxford, UK:
Blackwell Scientific; 1994.
7. Goldstein R. Power and sample size via MS/PC-DOS computers. Am Statistician
1989;43:253–260.
8. Iwane M, Palensky J, Plante K. A user’s review of commercial sample size software
for design of biomedical studies using survival data. Controlled Clin Trials
1997;18:65–83.
9. Microsoft Corporation. Microsoft Visual Basic Programmer’s Guide. Redmond, WA:
Microsoft Corporation; 1995.
10. Microsoft Corporation. Microsoft Fortran PowerStation Programmer’s Guide. Redmond,
WA: Microsoft Corporation; 1995.
11. Visual Components Sybase Inc. First Impression Active X User’s Guide: High Performance Software for Charting Data for Microsoft Visual Basic, Visual C11, and Other
Languages (Version 5.0). Overland Park, KS: Visual Components Sybase, Inc.; 1997.
12. Kuskowska-Wolk A, Bergstrom R, Bostrom G. Relationship between questionnaire
data and medical records of height, weight and body mass index. Int J Obes
1992;16:1–9.
13. Siber GR, Weitzman SA, Aisenberg AC, Weinstein HJ, Schiffman G. Impaired antibody response to pneumococcal vaccine after treatment for Hodgkin’s Disease. N
Engl J Med 1978;299:442–448.
APPENDIX
Generic Power and Sample Size Formulas
Suppose for n patients (or groups of patients) we observe responses that
depend on some parameter u. Let R, a statistic derived from the n responses,
have a normal distribution with mean √nu and standard deviation sR. Let SR
be another statistic independent of R such that vS2R/s2R has a x2 distribution
with v degrees of freedom. Let Tv[t] be the cumulative probability distribution
for a random variable having a t distribution with v degrees of freedom; tv,a 5
1
T2
v [1 2 a] denote the critical value that is exceeded by such a t statistic with
probability a; u0 and ua denote the values of u under the null and a specific
alternative hypothesis, respectively; d 5 (ua 2 u0)/sR; and a and b denote the
type I and II error probabilities associated with a two-sided test of the null
hypothesis and the alternative hypothesis ua, respectively. Then (R 2 √nu)/SR
has a t distribution with v degrees of freedom [6] that can be used to test the
null hypothesis that u 5 u0. The same argument used to derive equations (2)
and (3) of Dupont and Plummer [1] proves that the power to detect alternative
hypothesis u 5 ua is:
1 2 b 5 Tv[d√n 2 tv,a/2] 1 Tv[2d√n 2 tv,a/2]
and that, for the relevant values of a and b:
(A1)
600
W.D. Dupont and W.D. Plummer, Jr.
n 5 (tv,b 1 tv,a/2)2/d2
(A2)
Equation (A2) must be solved iteratively because both v and d are themselves
functions of n.
Studies Using Simple Linear Regression
Suppose that the error terms yj 2 (g 1 lxj) are independently and normally
distributed with mean 0 and standard deviation s. Let x and y denote the
means of {xj} and {yj}, respectively. Then it is well known [6] that:
lˆ 5 S(xj 2 x)(yj 2 y)/S(xj 2 x)2 is an unbiased estimate of l;
gˆ 5 y 2 lˆ x is an unbiased estimate of g; and
s2 5 S[yj 2 (gˆ 1 lˆ xj)]2/(n 2 2) is an unbiased estimate of s2 independent of lˆ .
Also, (n 2 2)s2/s2 has a x2 distribution with n 2 2 degrees of freedom, and lˆ
has variance s2lˆ 5 s2/S(xj 2 x)2. Let R 5 √nlˆ , s2x 5 S(xj 2 x)2/n and S2R 5
s2/s2x. Then, R has variance s2R 5 ns2l 5 s2/s2x and (n 2 2)S2R/s2R 5 (n 2 2)s2/s2 z
x2n22. Hence, substituting v 5 n 2 2 and d 5 (la 2 0)/sR 5 lasx/s into equations
(A1) and (A2) gives power and sample size formulas for simple linear regression. In observational studies the investigator estimates s2x, in experiments,
x 5 o ckwk and s2x 5 o ck(wk 2 x)2 in the definition of d given above. We can
k
k
estimate s indirectly using equation (2) if a direct estimate is unavailable. Thus,
if estimates of either r, the sample correlation coefficient, or sy, the sample standard deviation of yj, are available then s may be estimated by
lasx √1/r2 2 1 or √s2y 2 l2as2x, respectively.
Studies With Two Linear Regression Lines
Suppose that the errors yij 2 (gi 1 lixij), i 5 1, 2, are independently and
normally distributed with mean 0 and standard deviation s. Let xi, yi, lˆ i, and
gˆ i be the corresponding mean values and regression parameter estimates. Let
s2 5 Sij[yij 2 (gˆ i 1 lˆ ixij)]2/(n1 1 n2 2 4). Then s2 is an unbiased estimate of s2
and (n1 1 n2 2 4)s2/s2 z x2n11n224 . To test the null hypothesis that l2 2 l1 5 0,
we use equation (9.18) of Armitage and Berry [6], which may be rewritten
2
var(lˆ 2 2 lˆ 1) 5 slˆ 22lˆ 1 5 s2[1/(s2x1 n1) 1 1/(s2x2 n2)], where s2xi 5 Sj(xij 5 xi)2/ni for
i 5 1, 2. Let n 5 n2 and let m 5 n1/n2 be the ratio of the two group sizes. Let
2
R 5 √n (lˆ 2 2 lˆ 1) and s2R be the variance of R. Then s2R 5 nslˆ 22lˆ 1 5
s2[1/(ms2x1) 1 1/s2x2]. Let S2R 5 s2[1/(ms2x1) 1 1/s2x2]. Then, [n(1 1 m) 2
4]S2R/s2R 5 (n2 1 n1 2 4)s2/s2 z x2n21n124. Therefore, substituting v 5 n(1 1 m) 2
4 and d 5 (l2 2 l1)/sR into equations (A1) and (A2) gives power and sample size
formulas for testing the equality of the dose-response slopes of two treatments.
To test the equality of the y intercepts of the two treatments, we use equation
(5.16) or Armitage and Berry [6], which gives var(gˆ i) 5 s2gˆ i 5 s2[1 1 x2i /s2xi]/ni
for i 5 1, 2. Therefore, R 5 √n (gˆ 2 2 gˆ 1) has variance
s2R 5
5
3
x2
x2
s2
1 1 21 1 m 1 1 22
m
sx1
sx2
46
601
Power Calculations for Linear Regression
Let
S2R 5
5
3
46
x2
x2
s2
1 1 21 1 m 1 1 22
m
s x1
sx2
Then [n(1 1 m) 2 4]S2R/s2R 5 (n2 1 n1 2 4)s2/s2 z x2n21n124. Substituting v 5
n(1 1 m) 2 4 and d 5 (g2 2 g1)/sR into equations (A1) and (A2) gives the
desired sample size and power formulas.
As in the case for a single regression line, the values of xij may be either
observed attributes of patients or controlled treatment values; s may be estimated from the correlation coefficient or standard deviation of the response
variable among control subjects if a direct estimate is unavailable. These terms
may be handled in equations (A1) and (A2) in the same way as in the previous section.