Sample Size Determination for Analysis of Covariance

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001
Sample Size Determination for Analysis of Covariance.
Negasi Beyene, and Kung-Jong Lui, San Diego State University. San Diego, CA 92182.
Negasi Beyene, CDC/NCHS/Research Data Center, 6525 Belcrest Rd., Hyattsville, MD20782
Key Words: Confounder; Sample Size; Covariate.;
.
1.
Introduction
Sample size determination is one of the important problems
in designing an experiment or survey. The general results of
how to solve this problem are found in various textbooks
and journal articles. In clinical trials as well as in other
biomedical experiments, the sample size determination is
usually posed in relation to testing of the hypothesis.
However, in these experiments data collection is associated
with special problems. Patients may drop out of the study;
animals may die due to unknown causes. These and other
factors can lead to incompleteness in observations of some
experimental units. And in most cases clinicians ignore the
question of how the sample size requirements is affected by
the presence of covariate. If covariates are excluded from
the analysis of response variable, estimates of the factor
effect are biased and so is the sample size estimate. Note
that after data are collected conditional data analysis is
usually done by treating the value of potential confounders
as fixed constants in application of multiple regression
models. By contrast, in the planning stage of trial, it is not
uncommon that the values of potential confounders may not
be controlled by experimenters and even unknown in
advance.
1.1
Purpose of the Study
When data are in continuos scale, the multiple linear
regression model is one of many commonly used statistical
methods to control confounders in clinical trials. When
calculating the required sample size, if we don’t incorporate
confounding effects into sample size determination without
accounting for confounders the result may be quite
misleading.
In this paper, readers are provided with a systematic
discussion of the calculation of the required sample size
for detecting a specified treatment effect in the presence
of a single potential confounder. On the basis of
covariance-adjusted mean estimates, asymptotic sample
size formulae in closed forms were derived for both
randomized and non randomized trials. On the basis of
covariance-adjusted mean estimates, this study provides
an insight into influences of unbalanced confounding
covariate (in nonrandomized trials), the correlation
between the confounding covariates and the subject
response, the distance between mean responses in the two
treatment groups in unit of standard deviation, and the
magnitude of the specified treatment effect that we are
interested in detecting on sample size calculation.
Therefore, this study, “Sample Size Determination for
Analysis of Covariance” should be useful for clinicians
when designing their trials.
1.3
Analysis of Covariance
In order to discuss how to determine the sample size
of analysis of covariance, first we need to familiarize
ourselves with analysis of covariance. A combination of
an analysis of variance and regression techniques to
study the relationship of Y (dependent variable) to one
or more independent variables X 1 ... X k is known as
covariance analysis. Because analysis of covariance
techniques are complicated, we confine ourselves
mainly to the simplest type of covariance problem,
namely, a combination of one-way analysis of variance
with linear regression on a single X variable. The
purpose of an analysis of covariance is to remove the
effect of one or more unwanted and uncontrollable
factors in an analysis of covariance. When experimental
units are randomly
assigned to â treatments, then analysis of covariance
(ANCOVA) provides a method for reducing the size of
the residual error term to be used in making inferences.
Thus, covariance analysis can result in shorter confidence
intervals and more powerful tests and so can sometimes,
in an experiment, result in substantial reduction in the
sample size needed to establish differences among the
population means.
Example 1
In studying three diets, rats might be randomly assigned
to diets. If the dependent variable is weight after two
months on diet, a one-way analysis of variance can be
performed on the weight. It is reasonable to suppose
however, that weight after two months on a diet is highly
correlated with initial weight measured on each rat. The
elimination of the linear effect of initial weight on final
weight should result in a smaller mean square error.
Initial weight is then called a covariate (or concomitant)
variable; its linear effect will be removed by the use of
regression analysis techniques.
Example 2.
We let (Y) be language score for students taught by three
different methods. Measurements on IQ (X) taken before
the language instruction are also available. The students are
assigned randomly to the three teaching methods, so that
the mean IQ values for the three populations being
considered are all equal. The sample IQs probably differ,
but large differences are unlikely because of the random
assignment to teaching methods. Either a one-way analysis
of variance or a covariance analysis can be preformed with
the data.
The advantage of analysis of covariance is twofold.
1. We may reduce the size of the mean square error to be
used in testing for treatment differences and in making
confidence intervals.
2. Covariance analysis adjust for the possibility that
observed differences among the three sample mean
language scores may be partly due to difference among the
three mean IQs. The group with the highest mean IQ may
also have the highest mean language score, not because of
teaching method, but simply because students with higher
IQs learn a new language quickly. One-way analysis of
variance takes no account of differences among the mean
IQ scores by reporting adjusted-treatment means.
Frequently the researcher calculates the highest of the
−
to reject H 0 when H 0 is false);
between two means.) are carefully chosen, and if σ (the
population variance) is accurately known, then the
computed value for N is the desired sample size.
An experiment would be constrained if an experimenter
properly calculated an N of, for example, 20 samples but
then decided that this sample size was excessive for the
time or money available and proceeded to test a sample of
10 instead of the 20 required. Nevertheless, it is always
necessary to be realistic, and sometimes it will be
necessary to choose an N based on time available or
money available, or kind of sample (for instance, patients)
available, rather than on a calculation from the proper
formula. In this type of situation, it is recommended that
the adequacy of the experiment be evaluated before it is
started if enough information is available. If the α -error
is critical, the way to determine the power of a proposed
2
experiment is to compute the value of β -error that would
be obtained by using the pre-specified a , δ , and
proposed sample size; that is the experimenter would
predetermine the risk of missing an improvement of size δ
if a substandard sample size should be used. On the other
hand, if the β error is critical, the efficiency of a proposed
experiment is determined by computing the value of
α with the proposed sample size, β and δ . The size of
N is
2
a function of Zα , Z β , δ , and σ . If we are
sample regression lines at the overall mean X .. ; these are
testing
called adjusted means. His purpose is simply to have a
means that can be compared. Calling the adjusted mean for z and
α
_
i th population adj Y i .
obtain adj
−
−
1.4
*′′
are in the numerator of each formula, then
the
size
of
the
must increase as the desired risk of error
Yi . = Yi . + b( X ..− X i . ) the variance of the experiment
becomes smaller. Likewise. Any decrease in variance
*
Yi
zβ
Since
−
−
adjusted mean is obtained by X = X .. in
Var
H 0 : µ1 = µ2 vs. H a : µ1 ≠ µ2 .
N = (2(Zα + Zβ ) 2 σ 2 / δ 2 )
we use the foregoing definition to
−
δ ( the difference
− 2 

*
1
(
X
−
X
i. ) 
= σ e2  +
2

n
ΣX r
 i

Importance of Sample Size
causes a decrease in the sample size of the experiment.
Finally, the smaller the size of an effect that is to be
detected, the greater the sample size required. The
specified sample size assures the experimenter that the risk
of error will be equal to or less than α and β when the
experiment is complete.
2. Methods
Theory;
Sample size is a primary criterion for the validity of
experiments. If, in a given experiment, the values for Let
α , β , and δ [ α (the risk of making an error of the th
first kind; that is, fail to reject H 0 when H 0 is true)]; β (
the risk of making an error of the second kind, that is, fail
(Y , X )
ij
ij
′
denote
the
observation
for
( j = 1,2,3,..., ni ) s u b j e c t s i n t h e
i th ( i = 1,2) treatment group, Where Yij -represents the
j
subject response and
X ij -represents the potential


confounding covariate. We assume that
2  where
′
∑∑ X − X

ij
i.
Yij , X ij independently ~N µ , Σ ,
i j

S xy
Where
µ ′ = µ y(i ) , µ x(i )
β1 = Y2. − Y1. − β2 X 2. − X 1. ; β2 = 2 , and
Sx
 σ y2 σ yx 
 Therefore, for a β1 is the uniformly minimum variance unbiased
Σ = 
2
σ
σ
 yx
x 
estimator(UMVUE) of β1 , and X i . = ∑ X ij ni for
j
given X ij = xij , the conditional distribution of Yij can be
i = 1,2 [4]. Note that the conditional variance
expressed as
 β 
Yij = βο + β1 I i + β2 X ij − X .. + ε ij (2.1)
Var  1  depends on the value X ij of the confounder
 X
∑∑ X
ij
i j
that may be unknown in advance. Because the UMVUE
βο = µ y(1) + β2 X ..− µ x(1) ; X = ∑
n
of β1 is always an unbiased estimator of β1 regardless of
i i
(
)
( )
(
(
(
)
(
)
the
σ yx
σy
,
=ρ
σ xx
σx
(
− X 1.
)
2
)
)
value
( )
X , the unconditional Var β1
of

2
 X 2. − X 1.
  β  
2
2
1

 = σ y 1− ρ
= E  var 
  X  



 ∑ ∑ X ij − X i.
i j
(
 0 ⇒ if ..i = 1
1 ⇒ otherwise
The indicator variable I i = 
)
(
)
(
)



2


[4]. Furthermore, because
εij independently ~N (0, σ y( 2 ) (1 − ρ 2 )). Note that
(X
2.
)
2
(
− X 1. and ∑ ∑ X ij − X i .
i
( )
j
)
2
are independent, we
β1 is in fact, the difference of the obtain that VAR β1 =
covariance-adjusted mean response[5,25,4] between the
two treatments, and non-randomized trials is generally
different from
)
(
2.
)
β1 = µ y( 2 ) − µ y(1) − β2 µ x( 2 ) − µ x(1)
the coefficient
)
(X
)
(
and the
(
)
(
β2 =

1
1
2
2 
= σ y 1− ρ 
+
+
n1 n2


(µ
( 2)
y
− µ y(1)
)
1
1

1
1
1
= σ y2 1 − ρ 2  +
+
+ d x2   (2.2)
 +
 
 n1 n2 ( n1 + n2 − 4)  n1 n2
, the difference of the two
For ni > 2 .
(
)
simple sample means. To avoid the possibly confounding
effect resulting from the covariate X in comparing the two
µ x( 2) − µx(1)
Where d x =
, the distance between µx( 2) and
treatment effects, we may wish to test the hypothesis;
σx
Ho : β1 = 0 versus Ho : β1 ≠ 0 rather than the hypothesis
µ x(1) in units of the standard deviation σ x .
Ho : µ y( 2 ) − µ y(1) = 0 versus H1: µ y( 2 ) − µ y(1) ≠ 0
Note that for fixed total sample size n = n + n , the
(
(
)
(
)
)
o
( )
1
2
under model assumption (2.1), we can show that for fixed variance Var β in formula (2.2) reaches the minimum
1
column vector X , of which transpose vector
and thereby the power is maximized for detecting β1 ≠ 0
X ′ = X 11, ..., X 1n1 , X 21 ,..., X 2 n2 , the conditional if n1 = n2 .Therefore, we will assume equal sample
allocation n1 = n2 = n in the following discussion. In this
variance Var β1 X equals
case, if the correlation ρ were further equal to 0, then the
(
(
)
)
( )
variance Var β1 would reduce to
2
σ y2  +
 n

1
2
2
 + dx  

2( n − 2)  n

that
is
always
greater
(
)
>Var Y2. − Y1. =
2σ y2
n
Furthermore, when
∑ Yij
, Yi . =
j
for i = 1,2 .
ni
)
results suggest that in the case of ρ = 0 , using the test
statistic β1 , that has unnecessarily adjusted the
confounding effect due to X, always causes the loss of
efficiency. On the other hand, if correlation ρ ≠ 0 , by
recalling the definition of β1 , we can easily show that
dx =
ρ
between µ y( 2 ) and

(

/ ρ 2   − β1
σy


(
)
)
≤ − Zβ



 
2
For given values of d y , β1
and the correlation ρ we can
σ
y
easily apply a trial-error procedure to solve for N in
equation (2.4). Furthermore, if N were large, then a good
approximation N x for N in sample size calculation
procedure (2.4) would equal [ N x ]+1, where [ N x ] denotes
µ y( 2 ) − µ y(1)
where d y =
, the distance the largest integer less than or equal to N x , that is given by
σy
equation (2.5) {please see below}
2


µ y(1) in units of the standard deviation
2


(
)
σ y . Substituting this result for d x in formula (2.2), the
variance for β1 in the case of n1 = n2 = n becomes
) n 2( n1− 2)  2n + (

2
Var β1 = σ y2 1 − ρ 2  +
( ) (
2
y


2
1
 2 + d y − β1 σ y
1− ρ2  +
 n 2( n − 2)  n
ρ2


Ho : β1 = 0 is equivalent to testing µ y( 2) − µ y(1) = 0 . These

β 
dy − 1σ 

y
2
ρ = 0 , testing the hypothesis
(

(1 − ρ ) n2 + 2( n1− 2)  n2 + (d )
Za / 2

=

 Zα / 2


(1 − ρ )[ 2 + d / (2ρ )] + Z (1 − ρ ) 2 +  d
2
2
y
2
 β1 
 σ 
y

)
2
β

y
β1
−
( )


2 
σ y  / 2 ρ  


2
d y − β1 σ y  
  (2.3) Note also that in a randomized trial where

ρ2
µx( 2 ) = µx(1) , β1 = 0 implies that d y =0. Thus, in this situation,

Which is a function of σ y , ρ , β1 , d y and the sample size N. sample size calculation procedure (2.4) will be simply
equal to
Note that for all fixed n and ρ , the larger the value of
β1
σy
d y − β1 σ y (or equivalently, the larger the absolute value
(2.6)
Zα /2 −
(2)
(1)


1
µx − µx
1− ρ2 

of
), the larger is the variance Var β1 .
n
n
 ( − 2) 
σx
Therefore, for all other parameters fixed, in nonrandomized Similarly, if µ ( 2 ) = µ (1) , approximate sample size formula N
(
( )
( )
trials where µx( 2 ) ≠ µx(1) , the variance Var β1
)
x
x
x
of β1 is for N in equation (2.6) would become [ N x ]+1 where N x
always larger than that of β1 in a randomized trial where
µx( 2 ) ≠ µx(1) . On the basis of formula (2.3), the required
sample size N from each of the two treatment groups in
testing the hypothesis Ho : β1 = 0
for a power 1- β to detect the alternative hypothesis
Ha : β1 ≠ 0 ( say, β1 > 0 ) at an α -level ( two-sided test) is
the smallest integer N such that equaion (2.4)
Where Zα Zβ are the upper 100α th and 100β th percentiles
is N x
(Z
=2
a /2
+ Zβ
) (1 − ρ )
2
2
d y2
(2.7)
Note that sample size (2.7) is, in fact, the product of the
corresponding required sample size
( Zα
2
/2
+ Zβ
)
d y2
2
for
power of 1 − β at an α − level on the basis of the test
(
)
(
)
statistic Y2. − Y1. and the multiple factor 1 − ρ 2 . Note
of the standard normal distribution, respectively. Note that also that sample size calculation procedure (2.4) and (2.6)
the sample size below (2.4) depends on σ y only through or approximate sample size formulae (2.5) and (2.7) are all
symmetric with respect to the correlation ρ . Therefore, for
the parameter d y and the ratio β1
. Below is equation a given the same conditions, the required sample size N for
σy
ρ = ρο equals that for ρ = − ρο .
(2.4).
3. Results.
To study the influence of the correlation ρ , the distance in
unit of the standard deviation between the two treatment
mean responses d y , and the absolute difference
(
)
ρ = 0.7 , the required sample size N for a power of 0.90 is
44 at 0.05 level (two-sided). In contrast to this, the required
determination, Table 3.1, summarizes the corresponding sample size on the basis of the test statistic Y2. − Y1. , is 85,
sample size from each of the two treatment groups for that exceeds the required sample size of 44 in use of the
power of 0.80 and 0.90 at 0.05(two-sided) level based on
test statistic β1 by over 90%. On the other hand, if ρ = 0 ,
the test statistic β1 in a variety of situations. For example,
in testing the hypothesis Ho : β1 = 0 , for power of 0.80 at as noted before, use of the test statistic β1 that adjust the
nonconfounder X in hypothesis testing always causes the
0.05 level (two-sided), when d y =0.5, β1
=0.25,and the loss of efficiency. In nonrandomized trials, especially when
σy
the absolute value d x for the confounder X is large and
correlation ρ = 0.3 , the required sample size N from each
of the two treatment groups is 351. In this situation, when we do not incorporate the confounding effect into
according to the approximate sample size N in sample size calculating the final power could be quite
formula(2.5), we need to take 350 subjects. In order to different from the desired power if we were required to
prove our case we gave sample size and estimated the control this confounding effect to avoid the possibly
power. As the above sample size equals 350, misleading inference[22] on comparing the two treatment
effects. For example, say, we would like to have a power
d y = 0.5, β1
= 0.25, and the correlation ρ = 0.3, the 0.90 to detect d = 0.5 at α − level. The required sample
σy
y
d y − β1 σ y  = ρ µx( 2 ) − µx(1) / σ x  o n


sample
size
estimated power is equal to 0.806 which is approximately size is 85 by use of the formula 2( Z − Z )2 / (d )2 . If
a /2
β
y
equal to 0.80. Therefore, our theorem holds. In general, for
ρ = 01
. and β1 σ y = 0.25 (these conditions together with
fixed all other parameters, the higher the absolute value ρ
of correlation between subject response Y and confounder d y = 0.5 , lead d x to equal 2.5), it is easy to show that on
X, the smaller is the required sample size. Note that
the basis of the test statistic β1 with such a sample size 85
approximate sample size N x calculated by use of equation
we have only a power of approximately 0.01, that is much
(2.5) and (2.7) agrees quite well with those N calculated by
less than the desired power 0.90. This example
use of equations (2.4) and (2.6), respectively, even when N
demonstrates that when a confounder has not been
is small, although the former one always underestimates
incorporated into sample size calculation and is seriously
the latter one. Also, we calculated N1 without adjustment
unbalanced between the two treatment groups, the
for comparison (see Table 3). The formula used is
adjustment of this confounder can then cause a tremendous
loss of power. Note that in practice the parameters
 2 Z + Z 2 
β)
 ( α

σ y , ρ , β1 , and d y are usually unknown, and therefore we
.
N =
1
d y2
need to substitute the corresponding sample estimates for
4.
Discussion.
these parameters in the test statistic β1 . Because all these
In randomized clinical trial where µx( 2 ) = µx(1) , testing the
sample estimates converge to their corresponding
hypothesis Ho : µ y( 2 ) = µ y(1) is equivalent to testing parameters in probability, we can apply Slutsky’s theorem
Comparing
the
v a r i a n c e and other large sample properties [12, 21] to justify that the
Ho : β1 = 0 .
sample size formulae derived here to be still
Var Y2. − Y1. = 2σ y2 / n with Var β1 in formula (2.3), asymptomatically valid. Note that all the traditional
common assumptions for classical linear regression
however, after a few algebraic manipulations, we can show analysis[4, 5,25], such as a joint normal distribution for all
that Var β1 is smaller than Var Y2. − Y1. if and only if the covariates and a constant covariance matrix across
treatments, are needed to derive our sample size calculation
1
absolute value ρ >
. Therefore, even in procedures. Although transformation may often apply to
normalize the variables which have extremely skewed
2n − 3
distributions or to stabilize the variances
( 2)
(1)
randomized trials where µx = µx , use of the test statistic
of covariates, the sample size formulae presented here
β1 instead of Y2. − Y1. for hypothesis testing may still gain should be used cautiously in this situation.
(
)(
( )
)
( )
(
(
)
)
efficiency when there is a high correlation ρ between X
and Y. In fact, this gain of efficiency can be substantial.
For example, from Table 3.1, when d y = β1 = 0.5 and Table 3 Required sample sizes N and N x from each of the
σy
two treatment groups on the basis of covariance-adjusted
estimates for power of .0.80 and 0.90 at α − level=0.05
(two-sided) and for the distance in units of standard
deviation between the mean responses in two treatment
groups, d y ranging from 0.25 to 1.0, the ratio of treatment
effect to the standard deviation of Y, β1 σ y ranging from
0.25 to 1.0, and the correlation between the response Y and
the covariate X, ρ , ranging from 0.1 to 0.9.
N1 Without adjustment is shown for comparison.
Power
0.80
Correlation ρ
dy
0.25
0.50
1.0
0.90
0.1
0.3
0.5
0.7
0.9
0.1
0.3
0.5
0.7
0.9
N
.25 NX
N1
250
249
252
229
229
252
189
189
252
129
129
252
49
48
252
334
333
337
307
307
337
253
253
337
173
172
337
65
64
337
N
.50 Nx
N1
161
160
252
68
68
252
51
51
252
34
34
252
13
13
252
215
214
252
91
90
252
68
68
252
45
45
252
17
17
252
N
1.0 Nx
N1
83
82
252
23
22
252
15
15
252
10
9
252
5
4
252
133
131
337
33
32
337
21
20
337
13
13
337
6
5
337
N
.25 Nx
N1
1392
1390
63
351
350
63
226
225
63
142
141
63
52
51
63
1704
1702
85
454
453
85
297
297
85
188
187
85
68
68
85
N
.50 Nx
N1
63
63
63
58
58
63
48
48
63
33
33
63
13
12
63
84
84
85
78
77
85
64
64
85
44
43
85
17
16
85
N
1.0 Nx
N1
115
113
63
26
25
63
16
15
63
10
10
63
5
4
63
153
151
85
34
33
85
21
20
85
13
13
85
6
5
85
N
.25 Nx
N1
5571
5569
16
776
775
16
352
351
16
186
185
16
62
61
16
7102
7100
22
1003
1001
22
461
460
22
245
244
22
81
81
22
N
.50 Nx
N1
1193
1191
16
117
176
16
84
83
16
46
45
16
16
15
16
1434
1432
22
220
219
22
108
107
22
59
58
22
21
20
22
N
1.0 Nx
N1
17
16
16
15
15
16
13
12
16
9
9
16
4
3
16
22
21
22
20
20
22
17
16
22
12
11
22
5
4
22
β1 σ y
Note: Due to limitation of pages to be published, parts of this paper are omitted -like application,
Simulation, and references .