Biostatistics (2002), 3, 2, pp. 277–287 Printed in Great Britain A sample size adjustment procedure for clinical trials based on conditional power GANG LI† Organon Inc., 375 Mt Pleasant Ave., West Orange, NJ 07052, USA [email protected] WEICHUNG J. SHIH University of Medicine and Dentistry of New Jersey, New Brunswick, NJ 08901, USA [email protected] TAILIANG XIE Whitehall-Robins Healthcare, Madison, NJ 07940, USA JIANG LU University of the Witwatersrand, PO WITS, 2050, Johannesburg, South Africa S UMMARY When designing clinical trials, researchers often encounter the uncertainty in the treatment effect or variability assumptions. Hence the sample size calculation at the planning stage of a clinical trial may also be questionable. Adjustment of the sample size during the mid-course of a clinical trial has become a popular strategy lately. In this paper we propose a procedure for calculating additional sample size needed based on conditional power, and adjusting the final-stage critical value to protect the overall type-I error rate. Compared to other previous procedures, the proposed procedure uses the definition of the conditional type-I error directly without appealing to an extra special function for it. It has better flexibility in setting up interim decision rules and the final-stage test is a likelihood ratio test. Keywords: Conditional power; Interim analysis; Sample size re-estimation; Two-stage design. 1. I NTRODUCTION Sample size calculation plays a crucial role in the design of clinical trials. Undersized trials may not provide statistically significant evidence of the efficacy. On the other hand, an unnecessarily large sample size wastes resources and may make a trial administratively difficult. Poor sample size calculation often stems from poor parameter estimates due to limited information prior to the start of the trial. One hedge against this is a mid-trial correction of sample size, using the interim information collected from the trial (see, for example, Shih, 1992; Whitehead et al., 2001). The sample size adjustment strategy has become increasingly popular in pharmaceutical companies as they attempt to reduce cost, yet accelerate the completion, of trials that will form the basis for regulatory approval of the medical product. Other † To whom correspondence should be addressed c Oxford University Press (2002) 278 G. L I ET AL. examples can also be found in a general situation where some expected or unexpected event happened during the conduct of a study and an adjustment of sample size is needed to maintain the study power (for example, Bigger et al., 1998). As an illustration, we consider the type II coronary intervention study conducted by the National Heart, Lung and Blood Institute (NHLBI). This study was to assess the effects of cholestyramine on the progression of coronary arteriosclerosis. However, the recruitment was slow and the sample size was smaller than planned (Brensike et al., 1982, 1984). If we were to extend the study based on the data, what would be our adjustment strategy? We consider the data from this trial in Section 3 below. Several approaches of the sample size adjustment strategy have been developed in the literature in the framework of different two-stage sampling schemes. Some have proposed adjustment upward only by updating the variance estimate either in terms of ‘internal pilot study’ (Wittes and Brittain, 1990; Wittes et al., 1999; Zucker et al., 1999), which opens the treatment codes, or in terms of ‘blind re-estimation’, which does not open the treatment codes (Gould and Shih, 1992, 1998; Shih and Gould, 1995; Shih and Long, 1998). Others have proposed adjustment upward or downward by updating the treatment difference estimate and assuming a stable variance (Bauer and Kohne, 1994; Proschan and Hunsbarger, 1995; Davis et al., 1993; Green and Dahlberg, 1992; Wassmer, 1998; Cui et al., 1999). The latter always opens the treatment codes and has the flavor of mixing the group sequential design for early stopping with the sample size re-estimation strategy that aims at keeping the power of the study. Shih (2001) gave a recent review. In this paper, we modify the procedure of Proschan and Hunsbarger (1995) (PH), which uses the conditional power calculated at the first stage to determine the final sample size. The modified procedure relaxes the specific form of the conditional type-I error function such as the circular or linear function given by PH, and consequently has the advantage that the critical value for the final stage depends only on the design parameters, not on the random outcome of the first stage. In Section 2 we first formulate the problem following PH, then describe our procedure and contrast the differences between the two. In Section 3 we demonstrate our procedure in comparison with PH with the data from the NHLBI type II coronary intervention study. More discussion is given in Section 4. 2. T HE METHOD We first briefly review PH’s method and make remarks to introduce the proposed modifications. 2.1 Review of Proschan and Hunsbarger’s procedure Assume that the observations are normally distributed with means µx and µ y respectively for the two groups and a common variance σ 2 . At the first stage, let x¯1 and y¯1 be the corresponding sample means of the two groups from n 1 patients in each group and σˆ 12 be the pooled estimate of σ 2 . Denote the standardized difference by δ = (µx − µ y )/σ and δˆ = (x¯1 − y¯1 )/σˆ 1 . The goal is to determine an additional n 2 number of patients per group and a critical value c such that the test has an overall type-I error rate of α and that the conditional power for detecting δ = δˆ at the final stage given the interim result is 1 − β1 , where α and β1 are pre-specified constants. 2 ( x¯2 − y¯2 ) At the end of trial, the final test statistic is z = n 1 (x¯1 −σˆy¯√1 )+n based on n = n 1 + n 2 patients n +n 1 2 per group, where σˆ 2 is the pooled estimate of σ 2 . Following PH, we assume √ that the variance estimate is ˆ 2/n 1 ∼ N (δ, 1) and that the stable at the interim and final stages, i.e. σˆ 2 ≈ σˆ 12 ≈ σ 2 , so that z 1 = δ/ Conditional power for sample size adjustment 279 conditional probability CPδ (n 2 , c|z 1 ) ≡ P(z > c|z 1 , δ) √ √ n 2 (x¯2 − y¯2 ) − n 2 δσ c 2(n 1 + n 2 ) − z 1 2n 1 − n 2 δ =P > √ √ σˆ 2n 2 2n 2 √ √ √ c n 1 + n 2 − z 1 n 1 − n 2 δ/ 2 =1− √ n2 (1) where c = c(n 2 , z 1 ) and is the cumulative distribution function of the standard normal. Under the null hypothesis that µx = µ y , CP0 (n 2 , c|z 1 ) = P(z > c|z 1 , δ = 0) ≡ A(z 1 ). It follows that √ √ c n1 + n2 − z1 n1 A(z 1 ) = 1 − . (2) √ n2 A(z 1 ) is the so-called ‘conditional type-I error function’ in PH, which dictates how much conditional type-I error rate is allowed at the end of the study, given z 1 . The major difference between our method and that of PH lies on the role that A(z 1 ) plays in the determination of n 2 and c, and on whether c depends on z 1 or not. This will be explained in detail in the sequel. The two unknowns, n 2 and c, are solved by satisfying the two conditions indicated previously. In PH, c depends on z 1 and will be denoted as c(z 1 ). First, the test needs to maintain ∞ ∞ α= A(z 1 )φ(z 1 ) dz 1 = P(z > c(z 1 )|z 1 , δ = 0)φ(z 1 ) dz 1 (3) −∞ −∞ where φ is the density function of the standard normal. The specific form of A(z 1 ), which we will discuss later, will justify that α in (3), the conditional type-I error averaged overall possible values z 1 , is the overall type-I error for this two-stage procedure with the special form of A(z 1 ); see (10) and (11) in the next section. Here we use a one-sided test, as in PH. For a two-sided test, the conditional error function should be symmetric with respect to z 1 = 0, and we simply use ∞ α/2 = A(z 1 )φ(z 1 ) dz 1 . (4) 0 Second, the conditional power for detecting δ = δˆ needs to be 1 − β1 . Following again the same notation as in PH, let Z A(z 1 ) = −1 (1 − A(z 1 )). Then 1 − β1 = 1 − Z A(z 1 ) − δˆ n 2 /2 (5) which leads to n2 = (Z A(z 1 ) + z β1 )2 2(Z A(z 1 ) + z β1 )2 = n1 z 12 δˆ (6) where z β1 = −1 (1 − β1 ). Plugging n 2 into (2) yields c(z 1 ) = z 12 + (Z A(z 1 ) + z β1 )Z A(z 1 ) . z 12 + (Z A(z 1 ) + z β1 )2 (7) Since both (3) and (5) involve A(z 1 ), the conditional type-I error function, it is a key step in PH to decide what kind of function to use. PH mentioned that there are many choices. They suggested two 280 G. L I ET AL. ∗ classes √ of functions: the so-called ‘circular conditional error function’ (Acir (z) = 0 if z < z p ; = 1 − 2 2 ( k − z ) if z p∗ < z < k; and = 1 if z > k) and the ‘linear conditional error function’ (Al (z) = 0 if z < z p∗ ; =(a+bz) if z p∗ < z < k; and = 1 if z > k), where p ∗ = 1−(z p∗ ) 0.5 and k (for Acir ) and a, b, and k (for Al ) are design constants to be determined by (3) for a given α. These functions are named obviously by their special mathematical form, but do not seem to provide more motivation beyond that. In this paper, we propose to relax the need for such a special function of A(z 1 ) so that we may have more flexibility in specifying the design parameters, as will be described in the next section. In the meantime, we also would like the critical value c to be independent of z 1 so that the test z > c is a likelihood ratio test (LRT). Other advantages are also illustrated by the example in Section 3. See the sequel for details. 2.2 Modified procedure Notice that in both the circular and linear conditional error functions, there is a k such that A(z 1 ) = P(z > c(z 1 )|z 1 , δ = 0) = 1 when z 1 > k, and p ∗ such that P(z > c(z 1 )|z 1 , δ = 0) = 0 if z 1 < z p∗ . This implies that α in (3), the conditional type-I error averaged over all possible values z 1 , is actually the overall type-I error for this two-stage procedure with these error functions; see (10) and (11) below. Therefore, we may express (3) in terms of the ‘α-spending’ concept as used in a group sequential trial setting. For the ‘circular conditional error function’, (3) becomes k α = α1 + 1− k 2 − z 12 φ(z 1 ) dz 1 . (8) z p∗ For the ‘linear conditional error function’, (3) may be written as k α = α1 + (a + bz 1 )φ(z 1 ) dz 1 . (9) z p∗ Therefore, for both cases, α1 = 1 − (k) is the type-I error spent at the first stage. Notice also that, with these conditional error functions, the early acceptance and rejection regions indexed respectively by p ∗ and k are inter-dependent. Once k, or equivalently α1 , is chosen, the other parameter p ∗ is determined through the specific functional forms, or vice versa. In PH, they selected p ∗ . Here we prefer specifying α1 , as one would do when considering ‘spending type-I error’ approach. In the procedure of Bauer and Kohne (1994), p ∗ and k also determine each other through the following equation: [1 − (k)] exp[−1/2χ42 (1 − p∗ α)] ln [1−(k)] = α. However, for different situations where different design features for early decision making may be required, we often want to free the conditional type-I error function A(z 1 ) from the specific circular or linear functional form, since we really do not have a sense about them except the mathematics, and we would like to flexibly select the early rejection and acceptance regions as separate design parameters. We therefore propose to consider a modification of the PH procedure. We shall begin by letting h = z p∗ and following the interim decision rule that we (i) reject H0 and terminate the trial if z 1 > k, (ii) accept H0 and terminate the trial if z 1 < h, or (iii) extend the trial to the second stage if h z 1 k. At the first stage, the type-I error spent is α1 . Thus, the type-I error at the second stage should be controlled at α − α1 level: i.e. we need to determine a critical region of size α − α1 on h z 1 k. Conditional on z 1 , z is normally distributed. It is easy to see that the LRT is of the form z > c, where c is a constant not dependent on z 1 . (The proof of LRT is given in the appendix.) Thus, the overall type-I error rate is α = P(z 1 > k|δ = 0) + P(z > c, h < z 1 < k|δ = 0) k = α1 + A(z 1 )φ(z 1 ) dz 1 h (10) Conditional power for sample size adjustment 281 where A(z 1 ) for h z 1 k is given directly by definition in (2) without adding any extra, special functional form to it, as was in PH or Bauer and Kohne, and c is a constant not dependent on z 1 . Note that (8) and (9) are special case of (10), hence the ‘conditional type-I error averaged over all possible z 1 ’ in (8) and (9) is actually the overall type-I error for the two-stage procedure. Clearly, h needs to be chosen such that p ∗ = 1 − (h) > α. Since p ∗ < 0.5, h > 0. Then from (2), (3) and (10) √ k √ c n1 + n2 − z1 n1 p∗ − α = φ(z 1 ) dz 1 . (11) √ n2 h For a given α and chosen h and k such that α1 < α < p ∗ , we would like to solve for c in (11). First, we need to express n 2 in terms of z 1 and c. Recall that the objective is to obtain additional n 2 samples to ensure an appropriate chance of success for the trial. Toward this end, we turn to (5) for the conditional power and have √ √ √ ˆ 2 c n 1 + n 2 − z 1 n 1 − n 2 δ/ = −z β1 . (12) √ n2 √ In general, (12) is a messy quadratic equation of n 2 . A simpler solution which provides conditional power at least 1 − β1 is to choose n 2 such that √ √ √ ˆ 2 c n 1 + n 2 − z 1 n 1 − n 2 δ/ −z β1 . (13) √ n2 √ √ Since n 1 + n 2 n 2 , (13) will be accomplished if n 2 satisfies √ √ √ ˆ 2 c n 1 + n 2 − z 1 n 1 − n 2 δ/ = −z β1 . (14) √ n1 + n2 √ Plugging in δˆ = z 1 2/n 1 we have √ √ n 1 + n 2 = (c + z β1 ) n 1 /z 1 (15) which leads to n2 = c + z β1 √ z1 2 − 1 n1. (16) Substituting (15) and (16) into (11) we have p∗ − α = c(c + z β1 ) − z 12 φ(z 1 ) dz 1 . (c + z β1 )2 − z 12 k h (17) The critical value c is solved in (17) by numerical integration methods. Then we obtain n 2 by (16), for the given set of design parameters α, β1 , h, and k. Table 1 provides examples of the critical value for corresponding k (up to the maximum k1 ; see immediate discussion below) and some scenarios of chosen α, conditional power, and h. The choice of the overall type-I error rate α is a standard one, and the conditional power 1 − β1 depends on the researcher’s desire of detecting δ = δˆ at the later stage given whatever the random interim 282 G. L I ET AL. Table 1. Critical value c for corresponding k (up to the maximum k1 ) and chosen α, conditional power, and h α-level Conditional power (%) h 0.025 >90 0.675 0.840 1.030 >80 0.675 0.840 1.030 >60 0.675 0.840 1.030 0.05 >90 0.675 0.840 1.030 >80 0.675 0.840 1.030 >60 0.675 0.840 1.030 *k1 = the maximum k. k c k c k c 2.000 2.626 2.000 2.580 2.000 2.517 2.500 1.934 2.500 1.890 2.500 1.834 3.000 1.906 3.000 1.866 3.000 1.814 k c k c k c 2.000 2.646 2.000 2.604 2.000 2.546 2.500 2.000 2.500 1.966 2.500 1.921 2.831* 1.990 2.799* 1.958 2.758* 1.916 k c k c k c 2.000 2.670 2.000 2.631 2.000 2.578 2.385* 2.132 2.364* 2.111 2.337* 2.083 k c k c k c 2.000 1.627 2.000 1.561 2.000 1.468 2.500 1.528 2.500 1.473 2.500 1.396 k c k c k c 2.000 1.686 2.000 1.631 2.000 1.555 2.466* 1.624 2.422* 1.580 2.361* 1.520 k c k c k c 2.000 1.791 2.000 1.760 1.974* 1.721 2.042* 1.789 2.013* 1.760 2.808* 1.527 2.754* 1.472 2.677* 1.396 3.187* 1.906 3.148* 1.866 3.095* 1.814 Conditional power for sample size adjustment 283 result z 1 may realize at the interim stage. The choice of h determines the ‘acceptance region’, which is equivalent to the p-value specification p ∗ in the PH procedure. The choice of k determines α1 , the type-I error spent at the interim stage. For a fixed α and h, if we spend less α1 by specifying a larger k, i.e. a larger ‘continuation region’, then we may relax the critical value c at the final stage to have a larger ‘rejection region’ (also see Table 1). But note that (17) indicates that there is a supremum for k for given α and h. Since the quantity in the square root has to be positive, the upper bound of the integral, k, has to satisfy (c + z β1 )2 − k 2 > 0. That is, k < c + z β1 . Let k1 = c + z β1 to denote the supremum for k. We actually solve c in (17) by replacing k with k1 for the upper bound of the integral. Note also that (16) indicates that n 2 increases with increasingly stringent c, and that the additional samples of n 2 > 0 are needed only when z 1 < k1 . If z 1 exceeds k1 , which means the null hypothesis is already rejected at the interim analysis, there is no need for the additional samples. It is important to note that, contrasting to that given in the PH procedure (see (7)), the solution of c from (17) (with k replaced by k1 ) depends only on the design parameters α, h, k and β1 , not on the random outcome z 1 , since z 1 has been integrated out in (17). (Thus we can compare z 1 to k1 = c + z β1 as 2 ( x¯2 − y¯2 ) discussed above.) Theoretically speaking, our test ‘z > c’ at the final stage, where z = n 1 (x¯1 −σˆy¯√1 )+n n 1 +n 2 and c is a constant, is a LRT (see the appendix), but the PH test ‘z > c(z 1 )’ is not. For the PH procedure, c depends on z 1 unless the conditional power is 0.5 and the circular conditional error function is used. In practice, the non-dependence of c on z 1 also has a strong appeal especially to trials conducted by the pharmaceutical companies, since regulatory agencies usually prefer pre-specified rules such as the critical values in the protocol. Another practical consideration is to set an upper bound to limit the sample size of the second stage. As PH have pointed out, if n 2 is too large, it may be more economical to start a new study based on the interim findings than to extend the current study. Therefore, in practice the choice of h should not be too close to zero (or, equivalently, p ∗ not too close to 0.5). To bound h away from zero is to avoid the ˆ is close to 0, but still falls in the ‘continuation region’—then n 2 will blow up; situation where z 1 (or δ) see (6) and (16) for the PH and the proposed procedures, respectively. More on sample size comparison is given in the next section by an example. 3. E XAMPLE The NHLBI type II coronary intervention study was designed to examine the hypothesis that lowering the plasma concentration of cholesterol by diet and drug therapy reduces the rate of progression of coronary artery disease (CAD). Patients with type-II hyperlipoproteinemia and CAD were randomized to cholestyrmine treatment or placebo. For details of the study see Brensike et al. (1982, 1984). Due to various reasons, the sample size was smaller than what was planned when the data was analysed. The question of how to extend the study in terms of adjusting the sample size was also previously considered by PH. The observed definite progression rates in the two groups were 20/57 (placebo) and 15/59 (cholestyramine). For simplicity, let n 1 = 58. As in PH, the variance-stabilizing arcsine transformation was applied to put the example in the context of a test of normal means with known variance. Table 2 contrasts the results using PH procedure with that of the proposed for α = 0.05 and 1 − β1 = 0.80. The following points highlight the essential comparisons. (i) The top row illustrates that the proposed procedure yields a similar n 2 to the PH procedure, under the same α, n 1 , h and k, without using the special circular error function. (The slightly larger sample size was due to the slightly higher conditional power achieved by the proposed procedure via (13).) The critical value c is also similar, but the c from PH depends on the interim result z 1 and the c from the proposed procedure does not. 284 G. L I ET AL. Table 2. NHLBI type II coronary intervention study example; Brensike et al. (1982; 1984) α = 0.05, n 1 = 58 PH procedure, circular error function CP (%) c(z 1 )† n 2 (z 1 )† Proposed procedure CP (%) c n 2 (z 1 )† h ( p∗ ) k (α1 ) 1.036 (0.15) 1.82 (0.0344) 80 1.78 226 (n 2 /n 1 = 3.90) 83 1.70 233 (n 2 /n 1 = 4.02) 1.036 (0.15) 2.36 (0.0091) 80 — — 83 1.52 193 (n 2 /n 1 = 3.33) —Not available since this (h, k) is not an applicable configuration for the PH procedure. † z = 1.136 1 (ii) In the PH procedure, h and k are linked: i.e. once h is chosen, the value of k is determined by the special conditional type-I error function. In this example, using the ‘circular error function’ with α = 0.05, k equals 1.82 for corresponding h = 1.036 (i.e. p ∗ = 0.15). The limitation on the design flexibility with this dependence in the PH procedure, compared to the proposed procedure, is illustrated in the bottom row of Table 2. Since k determines α1 , the type-I error spent at the interim stage, having k = 1.82 is equivalent to having α1 = 0.0344. Suppose that, instead of 0.0344, we would like to spend much less α1 at the interim stage to make an early stopping more difficult (as would be appropriate for an O’Brien–Fleming type of group sequential boundary), for example, α1 = 0.0091, then we would choose k = 2.36. This design may be desirable especially for this situation where the sample size per group at the interim stage is not very large. Using our proposed procedure this much smaller α1 at the interim stage would leave more type-I error for the final stage, and consequently would lower the critical value c (to 1.52) and reduce the additional sample size required (to 193 per group). Note that, for the proposed procedure, there is no need to alter the specification of h = 1.036 ( p ∗ = 0.15) to achieve these, since, as indicated previously, h and k are chosen independently. However, for the PH procedure the special form of the conditional error function determines k for a given h. Thus one has to change h. However, using the circular error function, the maximum that k can reach is 1.95: i.e. when choosing h = 0 ( p ∗ = 0.5). Since k = 1.95 gives α1 = 0.026, this implies that the lowest type-I error allowed to spend for the PH procedure using the circular error function is 0.026, far more than the 0.0091 level we wanted. This exclusion of the use of O’Brien–Fleming type of boundaries is a serious limitation for the PH procedure with the circular error function. (iii) Another interesting observation with the circular conditional error function approach can also be made from the preceding discussion. A larger k (1.95 versus 1.82) implies less α1 (0.026 versus 0.034) spent at the interim stage. If in a usual group sequential setting, one would think that the final critical region would be larger (i.e. c smaller) due to more α saved for the final stage. However, for the circular conditional error function approach c increases with k. For example, c = k for CP = 0.5; c = 1.92 for k = 1.95 versus c = 1.78 for k = 1.82 when CP = 0.8 for z 1 = 1.136. With the proposed procedure, however, there is no such confusion (as seen in Table 1), since the design parameters h and k are independently specified. It should be noted that the conditional power (for example, 80% in the above example) was set in PH after observing z 1 = 1.136 and δˆ = 0.21, while for the proposed approach CP = 80% was decided in advance at the design stage. For the PH procedure, the conditional power is data-adaptive. For the proposed procedure so far, the conditional power is a constant, representing the desire of the researcher to Conditional power for sample size adjustment 285 Fig. 1. A comparison of the additional sample size in the example where z 1 = 1.136, α = 0.05, n 1 = 58, k = 1.82 and h = 1.036. Sample size required for a fixed sample size design, the proposed procedure, and the PH procedure with CP = 80% and with CP of the proposed procedure. have a chance to detect the difference to be observed at the interim stage, whatever it may turn out to be, by re-sizing the trial and giving the proper critical value at the end. We can also let CP be data-adaptive (see next section). From a practitioner’s point of view, some might criticize both procedures in the example for requiring an approximately four fold sample size of the first stage to continue under the CP requirement of at least 80% and given n 1 = 58 and z 1 = 1.136 (δˆ = 0.21). On the other hand, one might appreciate the warning that such a large sample size is actually needed for the study to be conclusive. Furthermore, as PH pointed out, one would require an even larger sample size to start a new trial using the fixed sample size design compared to continuing the two-stage design. Figure 1 gives a general comparison on the additional sample size (n 2 ) needed for detecting various δ by the fixed sample size design, the proposed procedure, and the PH procedure with 80% CP or the CP reached by the proposed procedure. 4. D ISCUSSION We have proposed a modified procedure based on the PH conditional power approach to adjusting sample size using interim data of the current study. The procedure also requires breaking the treatment codes at the interim stage, hence may need an independent data monitoring board to conduct the procedure. The basic modification is to free the conditional type-I error function from the special mathematical forms used in PH so as to untie the dependence of k and h (corresponding to early rejection and acceptance regions, respectively) and to have the critical value of the final test independent of z 1 . This modification leads to a few advantages such as closely imitating the familiar α-spending approach, 286 G. L I ET AL. final-stage test being LRT, and a more natural and simpler way to set up the design without appealing to a rather artificial function for the conditional type-I error, as illustrated in the example. Note that (16) defines a relation between n 2 and z 1 . In its simple form of the proposed procedure, z β1 is a constant, independent of the interim data. This may also be extended to be data-adaptive by specifying a conditional power function β1 (z 1 ) of z 1 at the design stage, hence z β1 = −1 (1 − β1 (z 1 )). Thus, (17) for c and (16) for n 2 may also be modified accordingly. This conditional power function β1 (z 1 ) may well be a simple step function so that solving c in (17) by integrating z 1 remains an easy task. The critical value, c in our procedure as well as c(z 1 ) in the PH procedure, for the final stage z statistic is defined in correspondence to the additional sample size n 2 (z 1 ) in such a way that the overall type-I error is controlled. In practice, however, for administrative reasons the observed sample size at the second 2 ( x¯2 − y¯2 ) stage may turn out to be different from n 2 (z 1 ). We caution that in calculation of z = n 1 (x¯1 −σˆy¯√1 )+n n 1 +n 2 the planned value of n 2 (z 1 ) should be used instead of the observed one. Contrasting with the ‘blinded re-estimation’ by Gould and Shih (1992), all sample size adjustment procedures that mix with group sequential designs, including Cui et al. (1999), the PH and the proposed procedure, assume a stable variance. This assumption is inherited from the group sequential method and may not be realistic in practice for certain situations. Future research is needed to re-estimate both the variance and the treatment difference and to adjust the sample size using the standardized effect size. Finally, as does the PH method, the proposed procedure also applies to many testing situations including comparing means, proportions or survival distributions by the logrank test under the proportional hazard assumption. The various sample size re-estimation strategies have provided much flexibility for designing clinical trials. Nevertheless, a prudent researcher should always plan ahead and carefully choose appropriate design parameters and specify stopping rules, since altering sample size in the mid-course of a study is not a simple matter administratively (see, for example, Shih, 1992). ACKNOWLEDGEMENT The work of Dr Shih was supported by NCI grant 2P30 CA 72720-04. A PPENDIX . The likelihood ratio test √ By our definition in Section 2, the statistic Z 1 is normally distributed with variance 1 and mean δ n 1 /2. √ Let Z 2 = √ n 2 (Z 1 )( X¯ 2 − Y¯2 )/σ . Conditional on Z 1 , the statistic Z 2 is normally distributed with variance 1 and mean δ n 2 (z 1 )/2. The joint probability density function of (Z 1 , Z 2 ) is 1 exp 4π √ √ (z 12 + z 22 ) − 2δ( n 1 /2z 1 + n 2 (z 1 )/2z 2 ) + δ 2 (n 1 + n 2 (z 1 ))/2 − . 2 The maximum likelihood estimate of δ is √ √ 2n 1 Z 1 + 2n 2 (Z 1 ) Z 2 . n 1 + n 2 (Z 1 ) Under the null hypothesis, δ = 0, Z 1 and Z 2 are independent and identically distributed. The LRT for the hypothesis is easily derived as √ √ n 1 Z 1 + n 2 (Z 1 )Z 2 >c √ n 1 + n 2 (Z 1 ) for some constant c. Conditional power for sample size adjustment 287 R EFERENCES BAUER , P. AND KOHNE , K. (1994). Evaluations of experiments with adaptive interim analyses. Biometrics 50, 1029–1104. B IGGER J R ., J. T., PARIDES , M. K., ROLNITZKY , L. M., M EIER , P., L EVIN , B. AND E GAN , D. A. (1998). Changes in sample size and length of follow-up to maintain power in the coronary artery bypass graft (CABG) patch trial. Controlled Clinical Trials 19, 1–14. B RENSIKE , J. F. et al. (1982). NHLBI type II coronary intervention study: design, methods and baseline characteristics. Controlled Clinical Trials 3, 91. B RENSIKE , J. F. et al. (1984). Effect of therapy with cholestyramine on progression of coronary arteriosclerosis: results of the NHLBI type II coronary intervention study. Circulation 69, 313–324. C UI , L., H UNG , H. M. AND WANG , S. J. (1999). Modification of sample size in group sequential clinical trials. Biometrics 55, 853–857. DAVIS , B. R., W ITTES , J., B ERGE , K. G., H AWKINS , C. M., L AKATOS , E., M OYE , L. A. AND P ROBSTFIELD , J. L. (1993). Statistical considerations in monitoring the systolic hypertension in the elderly program (SHEP). Controlled Clinical Trials 14, 350–361. G OULD , A. L. AND S HIH , W. J. (1992). Sample size re-estimation without unblinding for normally distributed outcomes with unknown variance. Communication in Statistics, Theory and Methods 21, 2833–2853. G OULD , A. L. AND S HIH , W. J. (1998). Modifying the design of ongoing trials without unblinding. Statistics in Medicine 17, 89–100. G REEN , S. J. AND DAHLBERG , S. (1992). Planned versus attained design in phase II clinical trials. Statistics in Medicine 11, 853–862. P ROSCHAN , M. A. AND H UNSBARGER , S. A. (1995). Designed extension of studies based on conditional power. Biometrics 51, 1315–1324. S HIH , W. J. (1992). Sample size reestimation in clinical trials. In Peace, K. (ed.), Biopharmaceutical Sequential Statistical Applications, New York: Dekker, pp. 285–301. S HIH , W. J. AND G OULD , A. L. (1995). Re-evaluating design specifications of longitudinal clinical trials without unblinding when the key response is rate of change. Statistics in Medicine 14, 2239–2248. S HIH , W. J. AND L ONG , J. (1998). Blinded sample size re-estimation with unequal variances and center effects in clinical trials. Communications in Statistics, Theory and Methods 27, 395–408. S HIH , W. J. (2001). Sample size re-estimation—journey for a decade. Statistics in Medicine 20, 515–518. WASSMER , G. (1998). A comparison of two methods for adaptive interim analyses in clinical trials. Biometrics 54, 696–705. W HITEHEAD , J., W HITEHEAD , A., T ODD , S., B OLLAND , K. AND S OORIYARACHCHI , M. R. (2001). Mid-trial design reviews for sequential clinical trials. Statistics in Medicine 20, 165–176. W ITTES , J. AND B RITTAIN , E. (1990). The role of internal pilot studies in increasing the efficiency of clinical trials. Statistics in Medicine 9, 65–72. W ITTES , J., S CHABENBERGER , O., Z UCKER , D., B RITTAIN , E. AND P ROSCHAN , M. (1999). Internal pilot studies I: type I error rate of the na¨ıve t-test. Statistics in Medicine 18, 3481–3491. Z UCKER , D., W ITTES , J., S CHABENBERGER , O. AND B RITTAIN , E. (1999). Internal pilot studies II: comparison of various procedures. Statistics in Medicine 18, 3493–3509. [Received 18 January, 2001; first revision 14 May, 2001; second revision 28 June, 2001; accepted for publication 28 June, 2001]
© Copyright 2025