Sample Size and Power for the Comparison of Cost and Effect

Sample Size and Power for the
Comparison of Cost and Effect
Henry Glick
Applications of Statistical Considerations in Health
Economic Evaluations
ISPOR 13th International Meeting
May 4, 2008
Goal of Sample Size Calculation
• Prior to the literature that described confidence intervals
for cost-effectiveness ratios, sample size was commonly
based on the larger of the sample sizes needed for
estimating pre-specified cost and effect differences
– i.e., what sample size was required to identify a
$1000 difference in costs, and what was required to
identify a 10% reduction in mortality
• Current sample size calculations are based on the
number of study subjects needed to rule out that the
therapy is unacceptable (equivalent to ruling out that the
net monetary benefits of the intervention are less than 0)
Sample Size Formula, Common SDs
• Sample size for NMB uses the following formula:
n=
{
(α+β)2 2 sdc2 + 2 W 2 sd2q - 2W ρ (2sdc2 )0.5 (2sd2q )0.5
}
(W ΔQ − ΔC)2
where n = n/group; tα/2 and tβ = t-statistics for α and β
errors; sd = standard deviation for cost (c) and effect (q);
W = maximum willingness to pay one wishes to rule out;
and ρ = correlation of the difference in cost and effect
http://www.uphs.upenn.edu/dgimhsr/stat%20samps.htm
1
Sample Size Formula, SDs Differ
• When the SDs differ, the formula becomes:
n=
{
2
2
2 0.5
(t α/2 + t β )2 (sdc0
+ sd2c1 ) + W 2 (sd2q0 + sd2q1 ) - 2W ρ(sdc0
+ sdc1
) (sd2q0 + sd2q1 )0.5
}
(WQ − C)2
where n = n/group; tα/2 and tβ = t-statistics for α and β
errors; sd = standard deviation for cost (c) and effect (q);
W = maximum willingness to pay one wishes to rule out;
and ρ = correlation of the difference in cost and effect
Correlation Between Costs and Effects
$93,500
$46,200
$27,500
Incremental Costs
45000
$22,670
30000
$11,545
15000
0
0.00
0.50
Point Estimate CER: $27,500
Means and S.D.s identical
Correlations: 0.95; -0.95
1.00
1.50
2.00
Incremental QALYS
Correlation Between Costs and Effects
• All else equal, the required sample size is less when the
therapies have a Win/Lose (positive) correlation
– As the effectiveness increases, the cost increases
(e.g., stroke care)
• All else equal, the required sample size is greater when
the therapies have a Win/Win (negative) correlation
– As the effectiveness increases, the cost decreases
(e.g., asthma care)
• Extreme values of correlation between costs and effects
can have dramatic effects on the confidence interval for
the cost effectiveness ratio/NMB and thus on the sample
size required to demonstrate value for the cost
2
Where to Obtain the Necessary Data?
• When therapies are already in use: Expected differences
in outcomes and standard deviations can be derived
from feasibility studies or from records of patients
– Potential sources
• Medical charts of administrative data sets
• Patient logs of their health care resource use
• Asking patients and experts about the kinds of
care received by those with the condition under
study
– In addition, at least one study has suggested that the
simple correlation between costs and effects
observed in these data may be an adequate proxy for
the measure of correlation used for estimating
sample size
Obtaining Data for Novel Therapies
• For novel therapies, information about the magnitude of
the incremental costs and outcomes may not be
available
– May need to be generated by assumption
– Data on the standard deviations for those who receive
usual care/placebo may be obtained from feasibility
studies or patient records
• One may assume that the standard deviation will
apply equally to both treatment groups, or one may
make alternative assumptions about their relative
magnitudes
– The correlation also would be obtained from such
data
ssizeprg.do
• quietly do ssizeprg
• Contains 4 “immediate form” PROGRAMS related to
sample size and power to detect NMB differences that
are greater than 0
– The command do ssizeprg simply loads these
programs; it does not calculate anything
• Documentation program: ssizeprgdoc
3
ssizeprg.do (cont.)
• Programs for calculating sample size and power
– cess1i: Calculates sample size under the assumption
that the standard deviations for cost and effect are
common between the 2 treatment groups
– cess2i: Calculates sample size under the assumption
that the standard deviations for cost and effect differ
between the 2 treatment groups
– cepow1i: Calculates power to detect NMB greater
than 0 under the assumption of common standard
deviations
– cepow2i: Calculates power to detect NMB greater
than 0 under the assumption that the standard
deviations differ
ssizeprg.do (cont.)
• All 4 programs presume two arm trials and a common
sample size for both treatment groups
• These programs yield results that are identical to those
derived from the NHB formula in: Willan AR. Analysis,
sample size, and power for estimating incremental net
health benefit from clinical trial data. Control Clin Trials
2001;22:228-237
ssizeprgdoc: cess1i
* PROGRAM: CESS1I
* cess1i is used to estimate sample size when one assumes
* there are common standard deviations for cost and effect
* between the 2 treatment groups (SDs, not SEs for the difference
* in cost and effect).
* COMMAND LINE: cess1i [diffc] [diffe] [sdc] [sde] [corr] [wtp] [alpha] [beta]
* The 8 arguments are all numbers
** `1' Difference in costs
** `2' Difference in effects
** `3' Standard deviation, costs (assumed the same for both groups)
** `4' Standard deviation, effects (assumed the same for both groups)
** `5' Correlation, difference in costs and effects
** `6' Maximum willingness to pay
** `7' Two-tailed alpha level (e.g., 0.05)
** `8' One-tailed beta level (e.g., 0.80)
4
ssizeprgdoc: cess1i (cont.)
• Saved results (scalars)
* r(diffc)
* r(diffq)
* r(sd_c)
* r(sd_e)
* r(rho)
* r(wtp)
* r(alpha)
* r(beta)
* r(nmb)
* r(sampsize)
Implementing cess1i
• Suppose the expected difference in cost is 25; the
expected difference in QALYs is 0.05; the expected SDs
for cost and QALYs are 1000 and 0.195, respectively;
the expected correlation of the difference is -0.1; your
maximum WTP is 75,000; and you want a 2-tailed alpha
of .05 and a 1-tailed beta of 0.8:
– Point estimate = 25 / 0.5 = 500
Implementing cess1i (cont.)
. cess1i 25 .05 1000 .195 -.1 75000 .05 .8
SAMPLE SIZE CALCULATION (Common SD Costs and Effects)
Assumptions
Difference in costs:
Difference in effects:
25
.05
Standard deviation, costs:
Standard deviation, effects:
Correlation, difference in costs and effects:
1000
.195
-.1
Willingness to pay:
Two-tailed alpha level:
One-tailed beta level:
75000
.05
.8
Expected NMB:
3725
*** SAMPLE SIZE PER GROUP ***
246
5
Implementing cess1i (cont.)
. return list
scalars:
r(diffc)
r(diffq)
r(sd_c)
r(sd_e)
r(rho)
r(wtp)
r(alpha)
r(beta)
r(nmb)
r(sampsize)
=
=
=
=
=
=
=
=
=
=
25
.05
1000
.195
-.1
75000
.05
.8
3725
246
Calculate Sample Sizes
• Compare the sample sizes required for the following
expected results:
– 25 0.05 1000 0.195 -0.1 75000 0.05 0.8
– 25 0.05 2000 0.195 -0.1 75000 0.05 0.8
– 25 0.05 1000 0.390 -0.1 75000 0.05 0.8
• What is happening?
Sample Size Calculations
Sample Size Parameters
25 0.05 1000 0.195 -0.1 75000 0.05 0.8
Sample Size
246
25 0.05 2000 0.195 -0.1 75000 0.05 0.8
253
25 0.05 1000 0.390 -0.1 75000 0.05 0.8
976
6
Sample Size Often More Sensitive to SDq than to SDc
• The sample size formula is symmetric for the SDs of cost
and effect except for the following:
2
2
(sdc0
+ sdc1
) + W 2 (sd2q0 + sd2q1 )
• Changes in the square of the QALY SD are weighted by
the square of WTP; changes in the square of the cost
SD are unweighted
– When WTP is substantially greater than SD for cost,
percentage changes in the QALY SD will have a
greater effect on sample size than will equivalent
percentage changes in cost SD
Calculate Sample Sizes (II)
• Compare the sample sizes required for the following
expected results:
– 25 0.05 1000 0.195 -0.5 75000 0.05 0.8
– 25 0.05 2000 0.195 0.5 75000 0.05 0.8
• What is happening?
Sample Size Calculations
Sample Size Parameters
Sample Size
25 0.05 1000 0.195 -0.5 75000 0.05 0.8
260
25 0.05 1000 0.195 +0.5 75000 0.05 0.8
227
• Holding all else equal, when the correlation of the
difference in cost and effect is negative, one needs a larger
sample than when the correlation of the difference is
positive
7
Calculate Sample Sizes (III)
• Compare the sample sizes required for the following
expected results:
– 25 0.05 1000 0.195 -0.1 900 0.05 0.8
– 25 0.05 1000 0.195 -0.1 100 0.05 0.8
• What is happening?
When WTP ~ PE, NMB ~ 0, Sample Size 
Sample Size Parameters
Sample Size
25 0.05 1000 0.195 -0.1 900 0.05 0.8
41,831
25 0.05 1000 0.390 -0.1 100 0.05 0.8
39,412
• At willingnesses to pay of 900 and 100, the expected value
of NMB approaches 0 (when WTP = 900, NMB = 20; when
WTP = 100, NMB = -20
• Power to detect a difference is lowest as NMB appoaches
0
Checking Your Sample Size Calculation
• Based on your original design criteria, the sample size
formula indicated you need 246 per group
• You decide to use ceapowersimulator to double check
the sample size
– The program draws random samples of size 492 with
the appropriate means, sds, and correlation
quietly do ceapowersimulator
ceapowersimulator 25 .05 1000 .195 -.1 75000 .05 246
• When you look at the results, you find that 99.8% of the
point estimates from your repeated samples -- which
look very much like the cloud of points we plot on the CE
plane -- are acceptable
8
Distribution of Point Estimates
300
WTP:
75, 000
Dfi f er ence n
i Cost
150
0
- 150
- 300
- 0. 02
- 0. 00
0. 02
N = 246 / gr oup
0. 04
0. 06
0. 08
Dfi f er ence n
i Q ALYs
0. 10
0. 12
• Are we using the wrong formulas or are we looking
at the wrong outcome of our simulation?
Point Estimates Address the Wrong Question
• What are we trying to insure when we calculate sample
size with an alpha of 0.05 and a 1-beta of 0.8?
• While 99.8% of the point estimates satisfy our
willingness to pay of 75,000 per QALY, in only 79.9% of
repeated experiments, do the 95% CI allow us to be 95%
confident that the therapy is good value
• Implication: Sample size calculations are about CI in
repeated experiments, they aren’t about the distribution
of point estimates from repeated experiments
Experiments that Do and Do Not Yield Confidence
300
WTP: 75,000
Difference in Cost
150
0
-150
-300
-0.02
-0.00
N = 246 / group
0.02
0.04
0.06
0.08
Difference in QALYs
0.10
0.12
9
ssizeprgdoc: cess2i
* PROGRAM: CESS2I
* cess2i is used to assess sample size when one
* assumes there are Rx-specific standard deviations
* for the 2 treatment groups' costs and effects (SDs,
• not SEs for the difference in costs and effects)
* COMMAND LINE: cess2i [diffc] [diffe] [sdc0] 9sdc1 [sde0] [sde1] [corr] [wtp]
[alpha] [beta]
* The 10 arguments are all numbers
* `1' Difference in costs
* `2' Difference in effects
* `3' Standard deviation, costs, group 0
* `4' Standard deviation, costs, group 1
* `5' Standard deviation, effects, group 0
* `6' Standard deviation, effects, group 1
* `7' Correlation, difference in costs and effects
* `8' Willingness to pay
* `9' Two-tailed alpha level (e.g., 0.05)
* `10' One-tailed beta level (e.g., 0.80)
ssizeprgdoc: cess2i (cont.)
* Saved results (scalars)
* r(diffc)
* r(diffq)
* r(sd_c0)
* r(sd_c1)
* r(sd_e0)
* r(sd_e1)
* r(rho)
* r(wtp)
* r(alpha)
* r(beta)
* r(nmb)
* r(sampsize)
Implementing cess2i
• Suppose the expected difference in cost is 25; the
expected difference in QALYs is 0.05; the expected SDs
for cost are 800 and 1200; the expected SDs for QALYs
are 0.19 and 0.20; the expected correlation of the
difference is -0.1; your maximum WTP is 75,000; and
you want a 2-tailed alpha of .05 and a 1-tailed beta of
0.8:
10
Implementing cess2i (cont.)
. cess2i 25 .05 800 1200 .19 .20 -.1 75000 .05 .8
SAMPLE SIZE CALCULATION (Different SD, Costs and Effects)
Assumptions
Difference in costs:
Difference in effects:
25
.05
Standard deviation, costs, group 0:
Standard deviation, costs, group 1:
Standard deviation, effects, group 0:
Standard deviation, effects, group 1:
Correlation, difference in costs and effects:
800
1200
.19
.2
-.1
Ceiling ratio:
Two-tailed alpha level:
One-tailed beta level:
75000
.05
.8
Expected NMB:
3725
*** SAMPLE SIZE PER GROUP ***
247
Implementing cess2i (cont.)
. return list
scalars:
r(diffc)
r(diffq)
r(sd_c0)
r(sd_c1)
r(sd_e0)
r(sd_e1)
r(rho)
r(wtp)
r(alpha)
r(beta)
r(nmb)
r(sampsize)
=
=
=
=
=
=
=
=
=
=
=
=
25
.05
800
1200
.19
.2
-.1
75000
.05
.8
3725
247
Calculate Sample Sizes
• Calculate the sample size for the case when the SDs for
cost and QALYs were 500, 1500, 0.145 and 0.245
• Calculate the sample size for common SDs of 1035 and
.201825 (3.5% increases over the original example)
11
Separate SDs Tend to Increase Sample Size
Sample Size Parameters
Sample Size
25 .05 500 1500 0.145 0.245 -.1 75000 .05 .8
263
25 .05 1035 .201825 -.1 75000 .05 .8
264
ssizeprgdoc: cepow1i
•
PROGRAM: CEPOW1i
* cepow1i is used to assess power when one assumes
* that the 2 treatment groups have common standard
* deviations for costs and effects (SDs, not SEs for
• the difference in cost and effect)
* COMMAND LINE: cepow1i [diffc] [diffe] [sdc] [sde] [corr] [wtp] [alpha]
• [sampsize]
* The 8 arguments are all numbers
* `1' Difference in costs
* `2' Difference in effects
* `3' Standard deviation, costs (assumed the same for both groups)
* `4' Standard deviation, effects (assumed the same for both groups)
* `5' Correlation, difference in costs and effects
* `6' Willingness to pay
* `7' Two-tailed level (e.g., 0.05)
* `8' Sample size per group
ssizeprgdoc: cepow1i
• Saved results (scalars)
* r(diffc)
* r(diffq)
* r(sd_c)
* r(sd_e)
* r(rho)
* r(wtp)
* r(alpha)
* r(sampsize)
* r(nmb)
* r(power)
12
Implementing cepow1i
• Suppose the expected difference in cost is 25; the
expected difference in QALYs is 0.05; the expected SDs
for cost and QALYs are 1000 and 0.195, respectively;
the expected correlation of the difference is -0.1; your
maximum WTP is 75,000; you want a 2-tailed alpha of
.05; and the current sample size plans are for 246 per
group
Implementing cepow1i (cont.)
. cepow1i 25 .05 1000 .195 -.1 75000 .05 246
POWER CALCULATION (Common SD Costs and Effects)
Assumptions
Difference in costs:
Difference in effects:
25
.05
Standard deviation, costs:
Standard deviation, effects:
Correlation, difference in costs and effects:
1000
.195
-.1
Willingness to pay:
Two-tailed alpha level:
Sample size per group
75000
.05
246
Expected NMB:
3725
*** POWER TO DETECT DIFFERENCE ***
.799
Implementing cpow1i (cont.)
. return list
scalars:
r(diffc)
r(diffq)
r(sd_c)
r(sd_e)
r(rho)
r(wtp)
r(alpha)
r(sampsize)
r(nmb)
r(power)
=
=
=
=
=
=
=
=
=
=
25
.05
1000
.195
-.1
75000
.05
246
3725
.799
13
Power Table (Example 1)
Power for
WTP = 75,000
Sample Size
150
0.584
200
0.714
246
0.799
300
0.871
350
0.916
Power Graph: 25 .05 1000 .195 -.1 WTP .05 250
1.00
0.80
Power
0.10
0.60
0.08
0.05
0.40
0.03
0.20
0.00
0.00
0
25000
50000
0
250
500
75000
750
1000 1250 1500
100000 125000 150000
WTP
ssizeprgdoc: cepow2i
•
PROGRAM: CEPOW2I
* cepow2i is used to assess power when one assumes
* there are Rx-specific standard deviations for for the
* 2 treatment groups' costs and effects (SDs, not SEs
* for the difference in costs and effects)
* COMMAND LINE: cepow2i [diffc] [diffe] [sdc0] 9sdc1 [sde0] [sde1] [corr] [wtp]
[alpha] [sampsize]
* The 10 arguments are all numbers
* `1' Difference in costs
* `2' Difference in effects
* `3' Standard deviation, costs, group 0
* `4' Standard deviation, costs, group 1
* `5' Standard deviation, effects, group 0
* `6' Standard deviation, effects, group 1
* `7' Correlation, difference in costs and effects
* `8' Willingness to pay
* `9' Two-tailed alpha level (e.g., 0.05)
* `10’ Sample size
14
ssizeprgdoc: cepow2i
• Saved results (scalars)
* r(diffc)
* r(diffq)
* r(sd_c0)
* r(sd_c1)
* r(sd_e0)
* r(sd_e1)
* r(rho)
* r(wtp)
* r(alpha)
* r(sampsize)
* r(nmb)
* r(power)
Implementing cepow2i
• Suppose the expected difference in cost is 25; the
expected difference in QALYs is 0.05; the expected SDs
for cost are 800 and 1200; the expected SDs for QALYs
are 0.19 and 0.20; the expected correlation of the
difference is -0.1; your maximum WTP is 75,000; you
want a 2-tailed alpha of .05; and the current sample size
plans are for 246 per group
Implementing cepow2i (cont.)
. cepow2i 25 .05 800 1200 .19 .20 -.1 75000 .05 246
POWER CALCULATION (Different SD, Costs and Effects)
Assumptions
Difference in costs:
Difference in effects:
25
.05
Standard deviation, costs, group 0:
Standard deviation, costs, group 1:
Standard deviation, effects, group 0:
Standard deviation, effects, group 1:
Correlation, difference in costs and effects:
800
1200
.19
.2
-.1
Ceiling ratio:
Two-tailed alpha level:
Sample Size:
75000
.05
246
Expected NMB:
*** POWER TO DETECT DIFFERENCE ***
3725
.799
15
Implementing cpow2i (cont.)
. return list
scalars:
r(diffc)
r(diffq)
r(sd_c0)
r(sd_c1)
r(sd_e0)
r(sd_e1)
r(rho)
r(wtp)
r(alpha)
r(sampsize)
r(nmb)
r(power)
=
=
=
=
=
=
=
=
=
=
=
=
25
.05
800
1200
.19
.2
-.1
75000
.05
246
3725
.799
Sample Size and Power Table (Example 1)
Sample Size
For 80% power
Power for
N = 246
50,000
251
.0792
75,000
246
0.799
100,000
244
0.803
150,000
242
0.806
200,000
241
0.807
WTP
• For this type of experiment, as one increases the WTP,
sample size decreases and power increases
Cost Minimization
• Suppose we are performing what we expect will be a
cost-minimization study (cost savings but no difference
in effect)
– We expect a cost savings of 1250 and difference in
QALYs of 0.0; we expect an SD for cost of 5500 and
for effect of 0.25; we expect the correlation of the
difference to be 0.5; and we want an alpha and beta
of 0.05 and 0.8, respectively
16
Calculate Sample Size and Power
• Sample size
– Calculate the required sample size for a WTP of
50,000
– Calculate the required sample size for a WTP of
200,000
• Power: Assuming a sample size of 1183
– Calculate power for a WTP of 50,000
– Calculate power for a WTP of 200,000
• What happened?
Sample Size Calculations
Sample Size / Power Parameters
Sample Size
-1250 0 5500 .25 .5 50000 .05 .8
1183
-1250 0 5500 .25 .5 200000 .05 .8
22,658
Power
-1250 0 5500 .25 .5 50000 .05 1183
0.8
-1250 0 5500 .25 .5 200000 .05 1183
0.093
Sample Size and Power Table
Sample Size for
80% power
Power for
N = 1185
50,000
1183
0.801
75,000
2800
0.445
100,000
5202
0.267
150,000
12,360
0.137
200,000
22,658
0.094
WTP
• For this kind of experiment, as WTP increases, the
required sample size increases and power decreases
• Why?
17
Ca Significantly Different from Cb and Qa Not
Significantly Different from Qb
C = -1250; SE = 492;
Q = 0; SE = .027;
S.D. for effectiveness = 0.30
500
= 0.5; DOF = 498
1000
Difference in Costs
0
-500
UL: -8,388
-1000
-1500
-2000
-2500
LL: 26,610
-3000
-3500
-0.080
-0.040
0.000
0.040
0.080
Difference in QALYs
Dominance
• Similar kinds of issues can arise if you design your trial
with the idea that you will document dominance
– In other words, you may be in a situation in which as
you increase the WTP, the sample size decreases
and power increases (e.g., if the lower limit remains in
the lower right quadrant, and the upper limit is moving
up into the upper right quadrant
– Or you may be in a situation where as the WTP
increases, the sample size increases and power
decreases (if the lower limit rotates into the lower left
quadrant)
Summary
• We’ve provided you with programs to calculate sample
size and power for the comparison of cost and effect
• In many cases, as one’s WTP increases, the necessary
sample size will be decreased and power will increase (a
pattern 1 experiment)
• In other cases, as one’s WTP increases, the necessary
sample size will be increase and power will decrease (a
pattern 2 experiment)
18