Download Report

Biostatistics (2002), 3, 2, pp. 277–287
Printed in Great Britain
A sample size adjustment procedure for clinical
trials based on conditional power
GANG LI†
Organon Inc., 375 Mt Pleasant Ave., West Orange, NJ 07052, USA
[email protected]
WEICHUNG J. SHIH
University of Medicine and Dentistry of New Jersey, New Brunswick, NJ 08901, USA
[email protected]
TAILIANG XIE
Whitehall-Robins Healthcare, Madison, NJ 07940, USA
JIANG LU
University of the Witwatersrand, PO WITS, 2050, Johannesburg, South Africa
S UMMARY
When designing clinical trials, researchers often encounter the uncertainty in the treatment effect or
variability assumptions. Hence the sample size calculation at the planning stage of a clinical trial may
also be questionable. Adjustment of the sample size during the mid-course of a clinical trial has become a
popular strategy lately. In this paper we propose a procedure for calculating additional sample size needed
based on conditional power, and adjusting the final-stage critical value to protect the overall type-I error
rate. Compared to other previous procedures, the proposed procedure uses the definition of the conditional
type-I error directly without appealing to an extra special function for it. It has better flexibility in setting
up interim decision rules and the final-stage test is a likelihood ratio test.
Keywords: Conditional power; Interim analysis; Sample size re-estimation; Two-stage design.
1. I NTRODUCTION
Sample size calculation plays a crucial role in the design of clinical trials. Undersized trials may
not provide statistically significant evidence of the efficacy. On the other hand, an unnecessarily large
sample size wastes resources and may make a trial administratively difficult. Poor sample size calculation
often stems from poor parameter estimates due to limited information prior to the start of the trial. One
hedge against this is a mid-trial correction of sample size, using the interim information collected from
the trial (see, for example, Shih, 1992; Whitehead et al., 2001). The sample size adjustment strategy has
become increasingly popular in pharmaceutical companies as they attempt to reduce cost, yet accelerate
the completion, of trials that will form the basis for regulatory approval of the medical product. Other
† To whom correspondence should be addressed
c Oxford University Press (2002)
278
G. L I ET AL.
examples can also be found in a general situation where some expected or unexpected event happened
during the conduct of a study and an adjustment of sample size is needed to maintain the study power (for
example, Bigger et al., 1998).
As an illustration, we consider the type II coronary intervention study conducted by the National
Heart, Lung and Blood Institute (NHLBI). This study was to assess the effects of cholestyramine on
the progression of coronary arteriosclerosis. However, the recruitment was slow and the sample size was
smaller than planned (Brensike et al., 1982, 1984). If we were to extend the study based on the data, what
would be our adjustment strategy? We consider the data from this trial in Section 3 below.
Several approaches of the sample size adjustment strategy have been developed in the literature in
the framework of different two-stage sampling schemes. Some have proposed adjustment upward only by
updating the variance estimate either in terms of ‘internal pilot study’ (Wittes and Brittain, 1990; Wittes
et al., 1999; Zucker et al., 1999), which opens the treatment codes, or in terms of ‘blind re-estimation’,
which does not open the treatment codes (Gould and Shih, 1992, 1998; Shih and Gould, 1995; Shih and
Long, 1998). Others have proposed adjustment upward or downward by updating the treatment difference
estimate and assuming a stable variance (Bauer and Kohne, 1994; Proschan and Hunsbarger, 1995; Davis
et al., 1993; Green and Dahlberg, 1992; Wassmer, 1998; Cui et al., 1999). The latter always opens the
treatment codes and has the flavor of mixing the group sequential design for early stopping with the
sample size re-estimation strategy that aims at keeping the power of the study. Shih (2001) gave a recent
review.
In this paper, we modify the procedure of Proschan and Hunsbarger (1995) (PH), which uses the
conditional power calculated at the first stage to determine the final sample size. The modified procedure
relaxes the specific form of the conditional type-I error function such as the circular or linear function
given by PH, and consequently has the advantage that the critical value for the final stage depends only
on the design parameters, not on the random outcome of the first stage. In Section 2 we first formulate
the problem following PH, then describe our procedure and contrast the differences between the two. In
Section 3 we demonstrate our procedure in comparison with PH with the data from the NHLBI type II
coronary intervention study. More discussion is given in Section 4.
2. T HE METHOD
We first briefly review PH’s method and make remarks to introduce the proposed modifications.
2.1 Review of Proschan and Hunsbarger’s procedure
Assume that the observations are normally distributed with means µx and µ y respectively for the two
groups and a common variance σ 2 . At the first stage, let x¯1 and y¯1 be the corresponding sample means
of the two groups from n 1 patients in each group and σˆ 12 be the pooled estimate of σ 2 . Denote the
standardized difference by δ = (µx − µ y )/σ and δˆ = (x¯1 − y¯1 )/σˆ 1 . The goal is to determine an additional
n 2 number of patients per group and a critical value c such that the test has an overall type-I error rate of
α and that the conditional power for detecting δ = δˆ at the final stage given the interim result is 1 − β1 ,
where α and β1 are pre-specified constants.
2 ( x¯2 − y¯2 )
At the end of trial, the final test statistic is z = n 1 (x¯1 −σˆy¯√1 )+n
based on n = n 1 + n 2 patients
n +n
1
2
per group, where σˆ 2 is the pooled estimate of σ 2 . Following PH, we assume
√ that the variance estimate is
ˆ 2/n 1 ∼ N (δ, 1) and that the
stable at the interim and final stages, i.e. σˆ 2 ≈ σˆ 12 ≈ σ 2 , so that z 1 = δ/
Conditional power for sample size adjustment
279
conditional probability
CPδ (n 2 , c|z 1 ) ≡ P(z > c|z 1 , δ)
√
√
n 2 (x¯2 − y¯2 ) − n 2 δσ
c 2(n 1 + n 2 ) − z 1 2n 1 − n 2 δ
=P
>
√
√
σˆ 2n 2
2n 2
√ √
√
c n 1 + n 2 − z 1 n 1 − n 2 δ/ 2
=1−
√
n2
(1)
where c = c(n 2 , z 1 ) and is the cumulative distribution function of the standard normal.
Under the null hypothesis that µx = µ y , CP0 (n 2 , c|z 1 ) = P(z > c|z 1 , δ = 0) ≡ A(z 1 ). It follows that
√
√ c n1 + n2 − z1 n1
A(z 1 ) = 1 − .
(2)
√
n2
A(z 1 ) is the so-called ‘conditional type-I error function’ in PH, which dictates how much conditional
type-I error rate is allowed at the end of the study, given z 1 . The major difference between our method and
that of PH lies on the role that A(z 1 ) plays in the determination of n 2 and c, and on whether c depends
on z 1 or not. This will be explained in detail in the sequel.
The two unknowns, n 2 and c, are solved by satisfying the two conditions indicated previously. In PH,
c depends on z 1 and will be denoted as c(z 1 ). First, the test needs to maintain
∞
∞
α=
A(z 1 )φ(z 1 ) dz 1 =
P(z > c(z 1 )|z 1 , δ = 0)φ(z 1 ) dz 1
(3)
−∞
−∞
where φ is the density function of the standard normal. The specific form of A(z 1 ), which we will discuss
later, will justify that α in (3), the conditional type-I error averaged overall possible values z 1 , is the
overall type-I error for this two-stage procedure with the special form of A(z 1 ); see (10) and (11) in the
next section. Here we use a one-sided test, as in PH. For a two-sided test, the conditional error function
should be symmetric with respect to z 1 = 0, and we simply use
∞
α/2 =
A(z 1 )φ(z 1 ) dz 1 .
(4)
0
Second, the conditional power for detecting δ = δˆ needs to be 1 − β1 . Following again the same notation
as in PH, let Z A(z 1 ) = −1 (1 − A(z 1 )). Then
1 − β1 = 1 − Z A(z 1 ) − δˆ n 2 /2
(5)
which leads to
n2 =
(Z A(z 1 ) + z β1 )2
2(Z A(z 1 ) + z β1 )2
= n1
z 12
δˆ
(6)
where z β1 = −1 (1 − β1 ). Plugging n 2 into (2) yields
c(z 1 ) =
z 12 + (Z A(z 1 ) + z β1 )Z A(z 1 )
.
z 12 + (Z A(z 1 ) + z β1 )2
(7)
Since both (3) and (5) involve A(z 1 ), the conditional type-I error function, it is a key step in PH to
decide what kind of function to use. PH mentioned that there are many choices. They suggested two
280
G. L I ET AL.
∗
classes
√ of functions: the so-called ‘circular conditional error function’ (Acir (z) = 0 if z < z p ; = 1 −
2
2
( k − z ) if z p∗ < z < k; and = 1 if z > k) and the ‘linear conditional error function’ (Al (z) = 0 if
z < z p∗ ; =(a+bz) if z p∗ < z < k; and = 1 if z > k), where p ∗ = 1−(z p∗ ) 0.5 and k (for Acir ) and
a, b, and k (for Al ) are design constants to be determined by (3) for a given α. These functions are named
obviously by their special mathematical form, but do not seem to provide more motivation beyond that.
In this paper, we propose to relax the need for such a special function of A(z 1 ) so that we may have more
flexibility in specifying the design parameters, as will be described in the next section. In the meantime,
we also would like the critical value c to be independent of z 1 so that the test z > c is a likelihood ratio
test (LRT). Other advantages are also illustrated by the example in Section 3. See the sequel for details.
2.2
Modified procedure
Notice that in both the circular and linear conditional error functions, there is a k such that
A(z 1 ) = P(z > c(z 1 )|z 1 , δ = 0) = 1 when z 1 > k, and p ∗ such that P(z > c(z 1 )|z 1 , δ = 0) = 0 if
z 1 < z p∗ . This implies that α in (3), the conditional type-I error averaged over all possible values z 1 , is
actually the overall type-I error for this two-stage procedure with these error functions; see (10) and (11)
below. Therefore, we may express (3) in terms of the ‘α-spending’ concept as used in a group sequential
trial setting. For the ‘circular conditional error function’, (3) becomes
k α = α1 +
1−
k 2 − z 12 φ(z 1 ) dz 1 .
(8)
z p∗
For the ‘linear conditional error function’, (3) may be written as
k
α = α1 +
(a + bz 1 )φ(z 1 ) dz 1 .
(9)
z p∗
Therefore, for both cases, α1 = 1 − (k) is the type-I error spent at the first stage. Notice also that,
with these conditional error functions, the early acceptance and rejection regions indexed respectively by
p ∗ and k are inter-dependent. Once k, or equivalently α1 , is chosen, the other parameter p ∗ is determined
through the specific functional forms, or vice versa. In PH, they selected p ∗ . Here we prefer specifying α1 ,
as one would do when considering ‘spending type-I error’ approach. In the procedure of Bauer and Kohne
(1994), p ∗ and k also determine each other through the following equation: [1 − (k)] exp[−1/2χ42 (1 −
p∗
α)] ln [1−(k)]
= α.
However, for different situations where different design features for early decision making may be
required, we often want to free the conditional type-I error function A(z 1 ) from the specific circular or
linear functional form, since we really do not have a sense about them except the mathematics, and we
would like to flexibly select the early rejection and acceptance regions as separate design parameters. We
therefore propose to consider a modification of the PH procedure. We shall begin by letting h = z p∗ and
following the interim decision rule that we (i) reject H0 and terminate the trial if z 1 > k, (ii) accept H0 and
terminate the trial if z 1 < h, or (iii) extend the trial to the second stage if h z 1 k. At the first stage,
the type-I error spent is α1 . Thus, the type-I error at the second stage should be controlled at α − α1 level:
i.e. we need to determine a critical region of size α − α1 on h z 1 k. Conditional on z 1 , z is normally
distributed. It is easy to see that the LRT is of the form z > c, where c is a constant not dependent on z 1 .
(The proof of LRT is given in the appendix.) Thus, the overall type-I error rate is
α = P(z 1 > k|δ = 0) + P(z > c, h < z 1 < k|δ = 0)
k
= α1 +
A(z 1 )φ(z 1 ) dz 1
h
(10)
Conditional power for sample size adjustment
281
where A(z 1 ) for h z 1 k is given directly by definition in (2) without adding any extra, special
functional form to it, as was in PH or Bauer and Kohne, and c is a constant not dependent on z 1 . Note
that (8) and (9) are special case of (10), hence the ‘conditional type-I error averaged over all possible z 1 ’
in (8) and (9) is actually the overall type-I error for the two-stage procedure.
Clearly, h needs to be chosen such that p ∗ = 1 − (h) > α. Since p ∗ < 0.5, h > 0. Then from (2),
(3) and (10)
√ k √
c n1 + n2 − z1 n1
p∗ − α =
φ(z 1 ) dz 1 .
(11)
√
n2
h
For a given α and chosen h and k such that α1 < α < p ∗ , we would like to solve for c in (11). First,
we need to express n 2 in terms of z 1 and c. Recall that the objective is to obtain additional n 2 samples to
ensure an appropriate chance of success for the trial. Toward this end, we turn to (5) for the conditional
power and have
√
√
√
ˆ 2
c n 1 + n 2 − z 1 n 1 − n 2 δ/
= −z β1 .
(12)
√
n2
√
In general, (12) is a messy quadratic equation of n 2 . A simpler solution which provides conditional
power at least 1 − β1 is to choose n 2 such that
√
√
√
ˆ 2
c n 1 + n 2 − z 1 n 1 − n 2 δ/
−z β1 .
(13)
√
n2
√
√
Since n 1 + n 2 n 2 , (13) will be accomplished if n 2 satisfies
√
√
√
ˆ 2
c n 1 + n 2 − z 1 n 1 − n 2 δ/
= −z β1 .
(14)
√
n1 + n2
√
Plugging in δˆ = z 1 2/n 1 we have
√
√
n 1 + n 2 = (c + z β1 ) n 1 /z 1
(15)
which leads to
n2 =
c + z β1
√
z1
2
− 1 n1.
(16)
Substituting (15) and (16) into (11) we have
p∗ − α =

c(c + z β1 ) − z 12
 φ(z 1 ) dz 1 .

(c + z β1 )2 − z 12

k
h
(17)
The critical value c is solved in (17) by numerical integration methods. Then we obtain n 2 by (16), for
the given set of design parameters α, β1 , h, and k. Table 1 provides examples of the critical value for
corresponding k (up to the maximum k1 ; see immediate discussion below) and some scenarios of chosen
α, conditional power, and h.
The choice of the overall type-I error rate α is a standard one, and the conditional power 1 − β1
depends on the researcher’s desire of detecting δ = δˆ at the later stage given whatever the random interim
282
G. L I ET AL.
Table 1. Critical value c for corresponding k (up to the maximum k1 ) and chosen
α, conditional power, and h
α-level
Conditional power (%)
h
0.025
>90
0.675
0.840
1.030
>80
0.675
0.840
1.030
>60
0.675
0.840
1.030
0.05
>90
0.675
0.840
1.030
>80
0.675
0.840
1.030
>60
0.675
0.840
1.030
*k1 = the maximum k.
k
c
k
c
k
c
2.000
2.626
2.000
2.580
2.000
2.517
2.500
1.934
2.500
1.890
2.500
1.834
3.000
1.906
3.000
1.866
3.000
1.814
k
c
k
c
k
c
2.000
2.646
2.000
2.604
2.000
2.546
2.500
2.000
2.500
1.966
2.500
1.921
2.831*
1.990
2.799*
1.958
2.758*
1.916
k
c
k
c
k
c
2.000
2.670
2.000
2.631
2.000
2.578
2.385*
2.132
2.364*
2.111
2.337*
2.083
k
c
k
c
k
c
2.000
1.627
2.000
1.561
2.000
1.468
2.500
1.528
2.500
1.473
2.500
1.396
k
c
k
c
k
c
2.000
1.686
2.000
1.631
2.000
1.555
2.466*
1.624
2.422*
1.580
2.361*
1.520
k
c
k
c
k
c
2.000
1.791
2.000
1.760
1.974*
1.721
2.042*
1.789
2.013*
1.760
2.808*
1.527
2.754*
1.472
2.677*
1.396
3.187*
1.906
3.148*
1.866
3.095*
1.814
Conditional power for sample size adjustment
283
result z 1 may realize at the interim stage. The choice of h determines the ‘acceptance region’, which is
equivalent to the p-value specification p ∗ in the PH procedure. The choice of k determines α1 , the type-I
error spent at the interim stage. For a fixed α and h, if we spend less α1 by specifying a larger k, i.e.
a larger ‘continuation region’, then we may relax the critical value c at the final stage to have a larger
‘rejection region’ (also see Table 1). But note that (17) indicates that there is a supremum for k for given
α and h. Since the quantity in the square root has to be positive, the upper bound of the integral, k, has
to satisfy (c + z β1 )2 − k 2 > 0. That is, k < c + z β1 . Let k1 = c + z β1 to denote the supremum for k.
We actually solve c in (17) by replacing k with k1 for the upper bound of the integral. Note also that (16)
indicates that n 2 increases with increasingly stringent c, and that the additional samples of n 2 > 0 are
needed only when z 1 < k1 . If z 1 exceeds k1 , which means the null hypothesis is already rejected at the
interim analysis, there is no need for the additional samples.
It is important to note that, contrasting to that given in the PH procedure (see (7)), the solution of
c from (17) (with k replaced by k1 ) depends only on the design parameters α, h, k and β1 , not on the
random outcome z 1 , since z 1 has been integrated out in (17). (Thus we can compare z 1 to k1 = c + z β1 as
2 ( x¯2 − y¯2 )
discussed above.) Theoretically speaking, our test ‘z > c’ at the final stage, where z = n 1 (x¯1 −σˆy¯√1 )+n
n 1 +n 2
and c is a constant, is a LRT (see the appendix), but the PH test ‘z > c(z 1 )’ is not. For the PH procedure,
c depends on z 1 unless the conditional power is 0.5 and the circular conditional error function is used.
In practice, the non-dependence of c on z 1 also has a strong appeal especially to trials conducted by the
pharmaceutical companies, since regulatory agencies usually prefer pre-specified rules such as the critical
values in the protocol.
Another practical consideration is to set an upper bound to limit the sample size of the second stage.
As PH have pointed out, if n 2 is too large, it may be more economical to start a new study based on the
interim findings than to extend the current study. Therefore, in practice the choice of h should not be
too close to zero (or, equivalently, p ∗ not too close to 0.5). To bound h away from zero is to avoid the
ˆ is close to 0, but still falls in the ‘continuation region’—then n 2 will blow up;
situation where z 1 (or δ)
see (6) and (16) for the PH and the proposed procedures, respectively. More on sample size comparison
is given in the next section by an example.
3. E XAMPLE
The NHLBI type II coronary intervention study was designed to examine the hypothesis that lowering
the plasma concentration of cholesterol by diet and drug therapy reduces the rate of progression of
coronary artery disease (CAD). Patients with type-II hyperlipoproteinemia and CAD were randomized
to cholestyrmine treatment or placebo. For details of the study see Brensike et al. (1982, 1984). Due to
various reasons, the sample size was smaller than what was planned when the data was analysed. The
question of how to extend the study in terms of adjusting the sample size was also previously considered
by PH.
The observed definite progression rates in the two groups were 20/57 (placebo) and 15/59
(cholestyramine). For simplicity, let n 1 = 58. As in PH, the variance-stabilizing arcsine transformation
was applied to put the example in the context of a test of normal means with known variance. Table 2
contrasts the results using PH procedure with that of the proposed for α = 0.05 and 1 − β1 = 0.80. The
following points highlight the essential comparisons.
(i) The top row illustrates that the proposed procedure yields a similar n 2 to the PH procedure, under
the same α, n 1 , h and k, without using the special circular error function. (The slightly larger sample size
was due to the slightly higher conditional power achieved by the proposed procedure via (13).) The critical
value c is also similar, but the c from PH depends on the interim result z 1 and the c from the proposed
procedure does not.
284
G. L I ET AL.
Table 2. NHLBI type II coronary intervention study example; Brensike et al. (1982; 1984)
α = 0.05, n 1 = 58
PH procedure, circular error function
CP (%) c(z 1 )†
n 2 (z 1 )†
Proposed procedure
CP (%)
c
n 2 (z 1 )†
h ( p∗ )
k (α1 )
1.036
(0.15)
1.82
(0.0344)
80
1.78
226
(n 2 /n 1 = 3.90)
83
1.70
233
(n 2 /n 1 = 4.02)
1.036
(0.15)
2.36
(0.0091)
80
—
—
83
1.52
193
(n 2 /n 1 = 3.33)
—Not available since this (h, k) is not an applicable configuration for the PH procedure.
† z = 1.136
1
(ii) In the PH procedure, h and k are linked: i.e. once h is chosen, the value of k is determined by the
special conditional type-I error function. In this example, using the ‘circular error function’ with α = 0.05,
k equals 1.82 for corresponding h = 1.036 (i.e. p ∗ = 0.15). The limitation on the design flexibility with
this dependence in the PH procedure, compared to the proposed procedure, is illustrated in the bottom
row of Table 2.
Since k determines α1 , the type-I error spent at the interim stage, having k = 1.82 is equivalent to
having α1 = 0.0344. Suppose that, instead of 0.0344, we would like to spend much less α1 at the interim
stage to make an early stopping more difficult (as would be appropriate for an O’Brien–Fleming type of
group sequential boundary), for example, α1 = 0.0091, then we would choose k = 2.36. This design
may be desirable especially for this situation where the sample size per group at the interim stage is not
very large. Using our proposed procedure this much smaller α1 at the interim stage would leave more
type-I error for the final stage, and consequently would lower the critical value c (to 1.52) and reduce
the additional sample size required (to 193 per group). Note that, for the proposed procedure, there is no
need to alter the specification of h = 1.036 ( p ∗ = 0.15) to achieve these, since, as indicated previously,
h and k are chosen independently. However, for the PH procedure the special form of the conditional
error function determines k for a given h. Thus one has to change h. However, using the circular error
function, the maximum that k can reach is 1.95: i.e. when choosing h = 0 ( p ∗ = 0.5). Since k = 1.95
gives α1 = 0.026, this implies that the lowest type-I error allowed to spend for the PH procedure using
the circular error function is 0.026, far more than the 0.0091 level we wanted. This exclusion of the use of
O’Brien–Fleming type of boundaries is a serious limitation for the PH procedure with the circular error
function.
(iii) Another interesting observation with the circular conditional error function approach can also be
made from the preceding discussion. A larger k (1.95 versus 1.82) implies less α1 (0.026 versus 0.034)
spent at the interim stage. If in a usual group sequential setting, one would think that the final critical
region would be larger (i.e. c smaller) due to more α saved for the final stage. However, for the circular
conditional error function approach c increases with k. For example, c = k for CP = 0.5; c = 1.92
for k = 1.95 versus c = 1.78 for k = 1.82 when CP = 0.8 for z 1 = 1.136. With the proposed
procedure, however, there is no such confusion (as seen in Table 1), since the design parameters h and k
are independently specified.
It should be noted that the conditional power (for example, 80% in the above example) was set in
PH after observing z 1 = 1.136 and δˆ = 0.21, while for the proposed approach CP = 80% was decided
in advance at the design stage. For the PH procedure, the conditional power is data-adaptive. For the
proposed procedure so far, the conditional power is a constant, representing the desire of the researcher to
Conditional power for sample size adjustment
285
Fig. 1. A comparison of the additional sample size in the example where z 1 = 1.136, α = 0.05, n 1 = 58, k = 1.82
and h = 1.036. Sample size required for a fixed sample size design, the proposed procedure, and the PH procedure
with CP = 80% and with CP of the proposed procedure.
have a chance to detect the difference to be observed at the interim stage, whatever it may turn out to be,
by re-sizing the trial and giving the proper critical value at the end. We can also let CP be data-adaptive
(see next section).
From a practitioner’s point of view, some might criticize both procedures in the example for requiring
an approximately four fold sample size of the first stage to continue under the CP requirement of at least
80% and given n 1 = 58 and z 1 = 1.136 (δˆ = 0.21). On the other hand, one might appreciate the
warning that such a large sample size is actually needed for the study to be conclusive. Furthermore, as
PH pointed out, one would require an even larger sample size to start a new trial using the fixed sample
size design compared to continuing the two-stage design. Figure 1 gives a general comparison on the
additional sample size (n 2 ) needed for detecting various δ by the fixed sample size design, the proposed
procedure, and the PH procedure with 80% CP or the CP reached by the proposed procedure.
4. D ISCUSSION
We have proposed a modified procedure based on the PH conditional power approach to adjusting
sample size using interim data of the current study. The procedure also requires breaking the treatment
codes at the interim stage, hence may need an independent data monitoring board to conduct the
procedure. The basic modification is to free the conditional type-I error function from the special
mathematical forms used in PH so as to untie the dependence of k and h (corresponding to early rejection
and acceptance regions, respectively) and to have the critical value of the final test independent of z 1 .
This modification leads to a few advantages such as closely imitating the familiar α-spending approach,
286
G. L I ET AL.
final-stage test being LRT, and a more natural and simpler way to set up the design without appealing to
a rather artificial function for the conditional type-I error, as illustrated in the example.
Note that (16) defines a relation between n 2 and z 1 . In its simple form of the proposed procedure, z β1
is a constant, independent of the interim data. This may also be extended to be data-adaptive by specifying
a conditional power function β1 (z 1 ) of z 1 at the design stage, hence z β1 = −1 (1 − β1 (z 1 )). Thus, (17)
for c and (16) for n 2 may also be modified accordingly. This conditional power function β1 (z 1 ) may well
be a simple step function so that solving c in (17) by integrating z 1 remains an easy task.
The critical value, c in our procedure as well as c(z 1 ) in the PH procedure, for the final stage z statistic
is defined in correspondence to the additional sample size n 2 (z 1 ) in such a way that the overall type-I
error is controlled. In practice, however, for administrative reasons the observed sample size at the second
2 ( x¯2 − y¯2 )
stage may turn out to be different from n 2 (z 1 ). We caution that in calculation of z = n 1 (x¯1 −σˆy¯√1 )+n
n 1 +n 2
the planned value of n 2 (z 1 ) should be used instead of the observed one.
Contrasting with the ‘blinded re-estimation’ by Gould and Shih (1992), all sample size adjustment
procedures that mix with group sequential designs, including Cui et al. (1999), the PH and the proposed
procedure, assume a stable variance. This assumption is inherited from the group sequential method and
may not be realistic in practice for certain situations. Future research is needed to re-estimate both the
variance and the treatment difference and to adjust the sample size using the standardized effect size.
Finally, as does the PH method, the proposed procedure also applies to many testing situations including comparing means, proportions or survival distributions by the logrank test under the proportional
hazard assumption. The various sample size re-estimation strategies have provided much flexibility for
designing clinical trials. Nevertheless, a prudent researcher should always plan ahead and carefully choose
appropriate design parameters and specify stopping rules, since altering sample size in the mid-course of
a study is not a simple matter administratively (see, for example, Shih, 1992).
ACKNOWLEDGEMENT
The work of Dr Shih was supported by NCI grant 2P30 CA 72720-04.
A PPENDIX . The likelihood ratio test
√
By our definition
in Section 2, the statistic Z 1 is normally distributed with variance 1 and mean δ n 1 /2.
√
Let Z 2 = √
n 2 (Z 1 )( X¯ 2 − Y¯2 )/σ . Conditional on Z 1 , the statistic Z 2 is normally distributed with variance 1
and mean δ n 2 (z 1 )/2. The joint probability density function of (Z 1 , Z 2 ) is
1
exp
4π
√
√
(z 12 + z 22 ) − 2δ( n 1 /2z 1 + n 2 (z 1 )/2z 2 ) + δ 2 (n 1 + n 2 (z 1 ))/2
−
.
2
The maximum likelihood estimate of δ is
√
√
2n 1 Z 1 + 2n 2 (Z 1 ) Z 2
.
n 1 + n 2 (Z 1 )
Under the null hypothesis, δ = 0, Z 1 and Z 2 are independent and identically distributed. The LRT for the
hypothesis is easily derived as
√
√
n 1 Z 1 + n 2 (Z 1 )Z 2
>c
√
n 1 + n 2 (Z 1 )
for some constant c.
Conditional power for sample size adjustment
287
R EFERENCES
BAUER , P. AND KOHNE , K. (1994). Evaluations of experiments with adaptive interim analyses. Biometrics 50,
1029–1104.
B IGGER J R ., J. T., PARIDES , M. K., ROLNITZKY , L. M., M EIER , P., L EVIN , B. AND E GAN , D. A. (1998).
Changes in sample size and length of follow-up to maintain power in the coronary artery bypass graft (CABG)
patch trial. Controlled Clinical Trials 19, 1–14.
B RENSIKE , J. F. et al. (1982). NHLBI type II coronary intervention study: design, methods and baseline
characteristics. Controlled Clinical Trials 3, 91.
B RENSIKE , J. F. et al. (1984). Effect of therapy with cholestyramine on progression of coronary arteriosclerosis:
results of the NHLBI type II coronary intervention study. Circulation 69, 313–324.
C UI , L., H UNG , H. M. AND WANG , S. J. (1999). Modification of sample size in group sequential clinical trials.
Biometrics 55, 853–857.
DAVIS , B. R., W ITTES , J., B ERGE , K. G., H AWKINS , C. M., L AKATOS , E., M OYE , L. A. AND P ROBSTFIELD ,
J. L. (1993). Statistical considerations in monitoring the systolic hypertension in the elderly program (SHEP).
Controlled Clinical Trials 14, 350–361.
G OULD , A. L. AND S HIH , W. J. (1992). Sample size re-estimation without unblinding for normally distributed
outcomes with unknown variance. Communication in Statistics, Theory and Methods 21, 2833–2853.
G OULD , A. L. AND S HIH , W. J. (1998). Modifying the design of ongoing trials without unblinding. Statistics in
Medicine 17, 89–100.
G REEN , S. J. AND DAHLBERG , S. (1992). Planned versus attained design in phase II clinical trials. Statistics in
Medicine 11, 853–862.
P ROSCHAN , M. A. AND H UNSBARGER , S. A. (1995). Designed extension of studies based on conditional power.
Biometrics 51, 1315–1324.
S HIH , W. J. (1992). Sample size reestimation in clinical trials. In Peace, K. (ed.), Biopharmaceutical Sequential
Statistical Applications, New York: Dekker, pp. 285–301.
S HIH , W. J. AND G OULD , A. L. (1995). Re-evaluating design specifications of longitudinal clinical trials without
unblinding when the key response is rate of change. Statistics in Medicine 14, 2239–2248.
S HIH , W. J. AND L ONG , J. (1998). Blinded sample size re-estimation with unequal variances and center effects in
clinical trials. Communications in Statistics, Theory and Methods 27, 395–408.
S HIH , W. J. (2001). Sample size re-estimation—journey for a decade. Statistics in Medicine 20, 515–518.
WASSMER , G. (1998). A comparison of two methods for adaptive interim analyses in clinical trials. Biometrics 54,
696–705.
W HITEHEAD , J., W HITEHEAD , A., T ODD , S., B OLLAND , K. AND S OORIYARACHCHI , M. R. (2001). Mid-trial
design reviews for sequential clinical trials. Statistics in Medicine 20, 165–176.
W ITTES , J. AND B RITTAIN , E. (1990). The role of internal pilot studies in increasing the efficiency of clinical trials.
Statistics in Medicine 9, 65–72.
W ITTES , J., S CHABENBERGER , O., Z UCKER , D., B RITTAIN , E. AND P ROSCHAN , M. (1999). Internal pilot
studies I: type I error rate of the na¨ıve t-test. Statistics in Medicine 18, 3481–3491.
Z UCKER , D., W ITTES , J., S CHABENBERGER , O. AND B RITTAIN , E. (1999). Internal pilot studies II: comparison
of various procedures. Statistics in Medicine 18, 3493–3509.
[Received 18 January, 2001; first revision 14 May, 2001; second revision 28 June, 2001;
accepted for publication 28 June, 2001]