Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators SANDER GREENLAND

Biostatistics (2000), 1, 1, pp. 113–122
Printed in Great Britain
Small-sample bias and corrections for conditional
maximum-likelihood odds-ratio estimators
SANDER GREENLAND
Department of Epidemiology, UCLA School of Public Health, Los Angeles,
CA 90095-1772, USA
A number of small-sample corrections have been proposed for the conditional maximum-likelihood
estimator of the odds ratio for matched pairs with a dichotomous exposure. I here contrast the rationale and
performance of several corrections, specifically those that generalize easily to multiple conditional logistic
regression. These corrections or Bayesian analyses with informative priors may serve as diagnostics for
small-sample problems. Points are illustrated with a small exact performance comparison and with an
example from a study of electrical wiring and childhood leukemia. The former comparison suggests that
small-sample bias may be more prevalent than commonly realized.
Keywords: Bias; Case-control studies; Conditional logistic regression; Cox model; Epidemiologic methods; Likelihood analysis; Logistic models; Matching; Odds ratio; Proportional hazards; Relative risk; Risk assessment.
1. I NTRODUCTION
The conditional maximum-likelihood (CML) estimator of a common odds ratio for matched pairs was
introduced by Kraus (1960) and has since become a mainstay of epidemiologic analysis (Breslow and
Day, 1980; Clayton and Hills, 1993; Kelsey et al., 1996; Rothman and Greenland, 1998). Jewell (1984),
however, described the severe small-sample bias that can arise in the estimator, and derived and compared
some bias corrections. Since then other corrections and comparisons have appeared. The present note
contrasts several corrections that have an obvious Bayesian rationale or a straightforward extension to
conditional-logistic regression. A new estimator is introduced that is a minor adaptation of formulas
for ordinary logistic regression. Estimators are illustrated in an exact performance comparison, and in
a matched-pair study of power lines and childhood leukemia (Ebi et al., 1999). The former comparison
suggests that bias may be a frequent problem in small or overmatched studies.
The CML odds-ratio estimators have positive probability of being infinite and so have infinite exact
expectations, even though they are unbiased to first order. Following earlier literature (Jewell, 1984,
1986; Liu, 1989), for ease of writing I will use the term ‘bias’ to refer to bias of higher order. One can
also can view the bias problem as one in which estimates far above the true parameter value occur with
unacceptably high probability.
2. A PPROXIMATE BIAS CORRECTIONS
Several approximate corrections have been proposed and evaluated for matched odds-ratio estimates
with a dichotomous exposure and for 2 × 2-table (unmatched) odds-ratio estimators (Jewell, 1984, 1986;
Becker, 1989; Liu, 1989; Walter and Cook, 1991). These corrections are of two forms: those that correct
c Oxford University Press (2000)
Downloaded from http://biostatistics.oxfordjournals.org/ by guest on October 6, 2014
S UMMARY
114
S. G REENLAND
for bias on the logarithmic scale, and those that correct for bias on the odds-ratio (arithmetic) scale.
2.1. Logarithmic corrections
2.2. Arithmetic corrections
ˆ E(θˆ ) = θ implies E(eθˆ ) > eθ ; consequently, the above estimators
For any nonconstant estimator θ,
undercorrect for bias on the arithmetic scale (Jewell, 1984). This raises the issue of whether one should
examine the odds ratios or the log odds ratios. A common presumption is that one should focus on the
log odds ratios because of the extreme asymmetry of the distribution of the odds ratios. I and others
maintain that this presumption is an example of ignoring context to suit the statistics. In a well-designed
study, the odds ratios, not their logs, are proportional to disease rates (Rothman and Greenland, 1998).
These rates, in turn, are proportional to the overall costs of disease (Morgenstern and Greenland, 1990).
The magnitudes of these costs are a primary target of interest for public health and subsequent policy
debates, and hence the relevant estimation errors are proportional to arithmetic, not logarithmic, errors in
relative-risk estimates.
Several approximate bias corrections for the arithmetic scale have been proposed for discrete x (Bishop
et al., 1975; Good, 1983; Jewell, 1984). These turn out to be very close to the Laplace estimator obtained
by adding 1 rather than 12 to each cell (Bishop et al., 1975; Good, 1983). Unlike the others, this Laplace
correction is invariant under exposure recoding and has a simple Bayesian derivation using a uniform
prior on expit(β) (Bishop et al., 1975; Good, 1983); this prior is equivalent to the mean-zero logistic prior
on β with c.d.f. expit(β).
Downloaded from http://biostatistics.oxfordjournals.org/ by guest on October 6, 2014
It is possible to adapt a well-known bias correction for unconditional ML estimators (Byth and McLachlan, 1978; Anderson and Richardson, 1979; Schaefer, 1983; Cordeiro and McCullagh, 1991) to
matched-pair CML estimators. The contribution of a matched pair with case regressor vector x1 and
control regressor vector x0 to the conditional logistic likelihood (Breslow and Day, 1980; Clayton and
Hills, 1993) simplifies to expit(d β), where β is the vector of logistic coefficients, d = x1 − x0 , and
expit(η) = (1 + e−η )−1 is the logistic transform. The full conditional likelihood thus can be written in
the form of a no-intercept unconditional logistic likelihood for binomial observations defined by n(x1 , x0 )
‘successes’ out of n(x1 , x0 )+n(x0 , x1 ) trials, where n(x1 , x0 ) is the number of pairs with case regressor x1
and control regressor x0 . The distribution of n(x1 , x0 ) is binomial given n(x1 , x0 ) + n(x0 , x1 ); if x1 = x0 ,
the distribution does not depend on β and hence concordant pairs do not contribute to the likelihood.
Let i index the pairs, let D be the diagonal matrix of observed pair differences di , p the vector of
conditional probabilities pi = expit(di β) for the pairs, W the diagonal matrix diag[ pi (1 − pi )], and
H = D W D. The second-order approximation to the bias in the CML estimator is then b = H −1 D W r ,
where ri = di H −1 di ( pi − 12 ); see Cordeiro and McCullagh (1991). A bias correction is obtained by
using the CMLE βˆ to compute b, then subtracting the result bˆ from βˆ (Anderson and Richardson, 1979;
Schaefer, 1983); a corresponding variance estimate for βˆ − bˆ may be computed by the delta method
(Bishop et al., 1975, Chapter 14).
For discrete x there is another logarithmic-scale correction, due to Haldane, which adds 12 to each cell
(here, pair count) and then applies ML to the augmented counts (Bishop et al., 1975; Good, 1983; Jewell,
1984). For a binary x, the resulting ‘augmented likelihood’ for β is identical to the posterior distribution
for β under a Jeffreys prior (Leonard and Hsu, 1994). The augmented counts are sometimes multiplied
by a constant to restore the sample total to its original value (Bishop et al., 1975), which affects only the
variance estimates.
Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators
115
2.3. Bayes estimators
3. E XACT RESULTS FOR DICHOTOMOUS MATCHED - PAIR STUDIES
In the case of a matched-pair study of a dichotomous exposure with no covariates, it is easy to compute
the bias of odds-ratio estimates directly from the exact conditional distribution of the discordant pairs
(Jewell, 1984); there is no need for approximate or simulation studies. Let u and v be the numbers of
discordant pairs with the case exposed and with the control exposed. The conditional distribution of u
is binomial with probability expit(β) and total N = u + v; the CML, Haldane, and Laplace odds-ratio
estimates are u/v, (u + 12 )/(v + 12 ), and (u + 1)/(v + 1) (Breslow and Day, 1980; Good, 1983; Jewell,
1984; Clayton and Hills, 1993); the Mantel–Haenszel and CML estimators are identical in this case. One
must use an ad hoc redefinition of the CML estimator at v = 0 to give it a finite mean; following Jewell
(1984), I equated it to the Haldane estimator when a zero occurred. The log-scale CML bias correction
simplifies to
bˆ = (u − v)2 /2uv N
(1)
ˆ
so the corrected CML estimate is u/veb (again, with ad hoc replacement by the Haldane estimator when
a zero occurred). The posterior mode of the log odds ratio β under a normal(0, τ 2 ) prior is the solution β˜
of
β = [u − N · expit(β)]τ 2 .
(2)
˜
I here evaluate eβ with τ 2 = 1 as a ‘Bayes point estimator’ of the odds ratio, because it is a special case of
logistic penalized-likelihood estimators studied elsewhere (Breslow and Clayton, 1993; Greenland, 1997;
Breslow et al., 1998).
Table 1 presents the exact expectations of the above estimators under various scenarios. When the
true odds ratio was small (2 or less), the Bayes-normal(0,1) estimator appeared least biased, though little
different from the Laplace estimator, but was severely overcorrected (biased downward) for odds ratios of
4 or more. Excepting the rather extreme case of N = 8, ω = 8, the Laplace estimator was nearly unbiased
in all cases examined. As expected, the uncorrected CML estimator had considerable bias even when the
true odds ratio was 1, and the corrected-CML and Haldane estimators were arithmetically undercorrected.
Downloaded from http://biostatistics.oxfordjournals.org/ by guest on October 6, 2014
If one believes that random error is a major contributor to the results, it would be natural to pursue
estimators with even lower expected squared error (ESE) than the corrected estimators, such as a Bayes
estimator based on a prior that is (hopefully) more concentrated near the true coefficient vector than the
priors implicit in the above procedures.
This leaves the task of specifying the prior. For ease of illustration, consider a matched-pair study
of a dichotomous exposure, with no covariates. In this case β is the pair-specific log odds ratio. Many
epidemiologic controversies about harmful effects revolve around whether the true relative risk (which
the odds ratio is supposed to approximate) is 1 versus 1.5 or 1 versus 2, with virtually no prior probability
given to values above 3 or 4 by anyone, largely because most estimates are below 2. The electric powercancer literature is an example (Portier and Wolfe, 1998; Greenland et al., 2000b). Other examples can
be found in the nutrition and diet literature, such as in the coffee–heart disease controversy (Greenland,
1993a). In these contexts, the upper prior percentiles derived from normal(µ, τ 2 ) distributions for β
with µ close to zero and τ 2 between 12 and 1 more closely corresponds to meta-analysis results and to
the spectrum of expert opinions than do percentiles derived from the priors implicit in the Haldane or
Laplace estimators. For example, the upper 90th prior percentiles for the odds ratio under normal(0,1)
and normal(0, 12 ) priors for β are 3.6 and 1.9, whereas the upper 90th odds-ratio percentiles under the
Haldane and Laplace priors are 40 and 9.
116
S. G REENLAND
Table 1. Expected values and percent probabilities of twice truth or more for odds-ratio estimators in a
matched-pair study of dichotomous exposure.* OR = odds ratio, N = number of discordant pairs, CMLC
= CML with ML bias correction, Bayes = Bayes estimator using normal(0,1) prior for β (see text)
Expected value
True OR
1
1.2
2
4
8
CML
CMLC
Haldane
Laplace
8
1.4
1.3
1.3
1.2
16
1.2
1.1
1.1
1.1
Bayes
CML
CMLC
Haldane
Laplace
Bayes
1.1
14
14
14
14
4
1.1
11
11
11
11
4
24
1.1
1.1
1.1
1.1
1.1
8
3
3
3
3
8
1.8
1.6
1.6
1.4
1.3
21
21
21
6
6
16
1.4
1.4
1.4
1.3
1.3
8
8
8
8
3
24
1.3
1.3
1.3
1.3
1.3
8
3
3
3
3
8
2.3
2.1
2.0
1.7
1.5
32
11
11
11
2
16
1.8
1.8
1.7
1.6
1.5
17
7
7
7
2
24
1.7
1.7
1.6
1.6
1.5
10
4
4
4
1
8
3.2
2.9
2.8
2.2
1.8
20
20
20
20
4
16
2.6
2.4
2.4
2.2
1.9
17
6
6
6
1
24
2.3
2.3
2.2
2.1
2.0
6
6
6
6
2
8
6.4
5.7
5.6
3.8
2.6
17
17
17
17
0
16
6.2
5.3
5.2
4.2
3.1
14
14
14
14
0
24
5.4
4.8
4.7
4.2
3.4
11
11
11
3
0
8
9.9
9.2
9.1
5.5
3.4
39
39
39
0
0
16
12.5
10.7
10.7
7.2
4.4
15
15
15
15
0
24
12.6
10.4
10.5
7.9
5.1
24
6
6
6
0
*CML and CMLC set equal to Haldane when zero cell occurs.
The bias results are in good accord with those in Jewell (1984). As mentioned earlier, however, not
everyone is comfortable with bias as a criterion for evaluating ratio estimators. Therefore, the table also
presents the probabilities that the estimates will exceed twice the true odds ratio. For the CML estimator,
these upper-tail probabilities can remain appreciable even with a substantial number of discordant pairs,
and only the Bayes estimator does consistently better by this criterion.
Evaluations were also made using arithmetic and logarithmic expected-squared error as performance
criteria; in both, CML was worst and Laplace was best over all the cases shown. I also computed exact
coverages of the approximate 95% Wald-type intervals centered on log odds-ratio estimators, as well as
exact and score intervals, for the situations in Table 3. These results are not shown because all exhibited
over 95% coverage in almost all the situations examined, although the Laplace correction produced by far
the narrowest average width and closest to nominal coverage, with score intervals also doing well. Studies
of intervals for binomial proportions have found that score intervals exhibit better performance than CML,
Wald, likelihood-ratio, and even exact intervals; see Agresti and Coull (1998) for references. Interestingly,
the latter authors observed that adding two to each cell count produced Wald intervals for p = expit(β)
that performed nearly as well as the score intervals; this corresponds to using an approximate posterior
interval for p derived from a beta(2,2) prior.
Downloaded from http://biostatistics.oxfordjournals.org/ by guest on October 6, 2014
1.5
N
% Probability $ 2 A true OR
Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators
117
Table 2. Case-specular pairs from
analysis of back-yard electrical lines
and childhood leukemia
Specular back-yard lines
Case:
3-phase
Secondary
None
3-phase
15
24
11
Secondary
11
107
9
0
1
81
None
4. A N EXAMPLE
ω(t1 , t2 ) = exp(β1 t1 + β2 t2 ).
Row 1 of Table 3 gives the CML odds-ratio estimates (with 95% Wald confidence limits) from fitting this
model to the example data. The intervals fall above the range of estimates obtained from other studies
of wiring and leukemia, and both point estimates are at least ten times what one would expect based on
all the evidence to date (including twenty or so other epidemiologic studies, most of them larger than this
one) (Portier and Wolfe, 1998; Greenland et al., 2000b).
While epidemiologic validity problems may have contributed to the apparent exaggeration of the estimates, the data are uninformative about those problems. We can, however, examine the extent to which
this appearance depends on the analysis method. Row 2 of Table 3 provides the results from an exact
logistic-regression software program. The point estimates are hardly different from the CML estimates
because they are in fact only slightly modified CML estimates (LogXact, 1993). More disturbing is the
fact that the exact limits appear even more exaggerated than the CML Wald limits. The exact limits are
known to cover at or above the nominal rate if there are no epidemiologic biases (Breslow and Day, 1980),
and so suggest no exaggeration in the CML intervals. Nonetheless, the results are extraordinarily unstable. Row 3 of Table 3 shows the impact on the CML results of reclassifying as unexposed just one of the
eleven cases in the secondary/3-phase cell. This minor change puts one pair in the empty cell in Table 1,
and halves the estimates. Conversely, reclassifying as exposed the single unexposed case in a discordant
pair makes the CML estimates infinite.
Rows 4–6 of Table 3 give the ML-bias corrected, Haldane, and Laplace estimates. The two logarithmic
corrections reduce the estimates by about half while the Laplace correction reduces the estimates by about
two-thirds. Nonetheless, the estimates still appear implausibly large relative to previous studies.
Row 7 of Table 3 gives approximate posterior medians and 95% intervals derived from the second
derivative of the log posterior density, based on a bivariate normal prior for β2 , β1 − β2 in model 1, with
prior means of zero, prior variances of 1, and a prior correlation 0.5; β2 and β1 − β2 are the log odds ratios
Downloaded from http://biostatistics.oxfordjournals.org/ by guest on October 6, 2014
A case-specular study involves case-control pairs in which the ‘case’ is a case house and the ‘control’
is a reflection of the case house across the street (Zaffanella et al., 1998); under certain assumptions,
ordinary matched-pair likelihoods can be used to analyze such data (Greenland, 1999). Table 2 gives data
from a case-specular study of electrical wiring and childhood leukemia (Ebi et al., 1999). Of the 259
pairs available for this example, only 56 were discordant, and only one of these pairs had a case with no
back-yard power line.
Represent line type by two indicators, t1 for 3-phase line (1 = yes, 0 = no) and t2 for secondary line,
and let ω(t1 , t2 ) be the ratio of leukemia odds at exposure (t1 , t2 ) versus (0,0) within matching strata.
The usual conditional-logistic model for the regression of leukemia risk on (t1 , t2 ) is equivalent to the
conditional (stratum-specific) odds-ratio model
118
S. G REENLAND
Table 3. Odds-ratio estimates for 3-phase and secondary back-yard power-line exposure, from casespecular analysis of childhood leukemia. CML =
conditional maximum likelihood
Method
3-phase
Secondary
1. CML
32 (4.0,253)
14 (1.8,107)
2. Exact*
30 (4.5,1328)
14 (2.1,507)
16 (3.4,72)
6.8 (1.5,30)
4. CML bias corrected†
19 (3.6,105)
8.7 (1.7,45)
5. Haldane‡
16 (3.5,78)
7.4 (1.6,34)
6. Laplace§
11 (2.9,43)
5.2 (1.4,19)
7. Bayes β ∼ N (0,1)
12 (3.5,40)
4.9 (1.7,14)
8. Bayes β ∼ N (0,1/2)
8.6 (3.0,25)
3.6 (1.6,8.5)
9. Pairing ignored
2.4 (1.4,4.1)
1.2 (0.81,1.7)
*Modified CML point estimates and exact limits from LogXact
† Using approximate bias correction for ML estimates
‡ Add 1 to each cell and renormalize
2
§ Add 1 to each cell and renormalize
comparing secondary to no line and 3-phase to secondary. The prior variance for β1 is 1 + 1 + 2(0.5) = 3,
which yields an upper 90th prior percentile for the odds ratio eβ1 comparing 3-phase to no line of 9.2. The
results resemble the Laplace estimates, but with narrower intervals; this narrowing is as expected, given
the lighter tails of the normal prior in comparison to the Laplace prior. Row 8 is derived using the same
prior means and correlation, but with prior variances of 12 . This change implies prior variance of 1.5 for
β1 and an upper 90th prior percentile for eβ1 of 3; although the results are still implausibly large, their
magnitude is easily attributable to random error and (not unlikely) other sources of bias.
A referee suggested examining the estimates obtained by breaking the pairing and using the crude
unmatched data. These are presented in row 9 of Table 3. Because of the strong positive association of the
pair exposures, collapsing across pairs produces estimates that are less than a tenth that of CML; the results
are also much more precise and consistent with the literature. The latter consistency may largely reflect a
fortuitous cancellation of biases, for the crude (collapsed) odds ratio is known to be biased toward the null
when the pair exposures are positively correlated (Siegel and Greenhouse, 1973). Nonetheless, the crude
odds ratio also has lower variance, which has led some authors to suggest averaging the stratified and
crude estimators to minimize expected squared error (Liang and Zeger, 1988; Kalish, 1990; Greenland,
1991). In the present example, the tremendous drop in the odds ratio upon collapsing is just what one
should expect given the extremely high correlation of the exposure (line type) with the main matching
factors (neighborhood and housing type) implicit in the use of specular controls.
For a more detailed discussion of this example and similar bias in a conventional matched case-control
study, see Greenland et al. (2000a).
5. D ISCUSSION
The present paper has focused on situations in which there are too few pairs to support CML estimation
of even one parameter. The problems can become more acute in multiple logistic regression. These prob-
Downloaded from http://biostatistics.oxfordjournals.org/ by guest on October 6, 2014
3. CML moving one pair
Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators
119
ˆ
1
ˆ
N = 24 and u = 23 yield eβ = 23 and W = ln(23)/(1/23 + 1/1) 2 = 3.07. Thus, W declines as eβ
explodes. This type of behavior can result in the power of the Wald test dropping as |β| → ∞ given fixed
N (Hauck and Donner, 1977; Vaeth, 1985).
Downloaded from http://biostatistics.oxfordjournals.org/ by guest on October 6, 2014
lems, formalized as sparse-data inconsistency, have long been recognized in unconditional ML estimators,
and in fact CML estimators were developed to address these problems (Breslow and Day, 1980; Breslow,
1981). Unfortunately, an analogous problem occurs in CML estimators when pair-counts are sparse.
The formal equivalence of the matched-pair conditional likelihood to an unconditional likelihood allows one to map results for the latter to the former. As an example, consider a matched-pair study of an
indicator x in which the investigator wishes to control an unmatched nominal covariate z whose number of
levels increases at the same rate as the total number of pairs M. Entering this covariate into the conditional
logistic model as a series of indicators (dummy variables) will then produce a conditional likelihood with
O(M) nuisance parameters (the z indicator coefficients), from which it follows by arguments parallel to
those in Breslow (1981) that the CML estimator βˆ of the x coefficient will be inconsistent. Because of the
formal equivalence of conditional logistic and Cox-model partial likelihoods, the same type of problem
can afflict proportional-hazards analyses, although the bias would not be as severe because each failure
(case) would be matched to many nonfailures at each failure time.
The example in Table 2 may seem extreme, but studies reporting similarly large odds ratios based
on sparse matched or stratified data are not uncommon, especially in analyses in which many covariates are entered in the conditional logistic model or in which the data are divided into small subgroups
(for examples, see Daling et al., 1994; Witte et al., 1994; Abenhaim et al., 1996; Feychting et al., 1998;
Schwartzbaum et al., 1998). Such large reported estimates should call attention to potential bias problems.
Of perhaps greater concern, however, is the possibility of unnoticed small-sample bias in modest, plausible results. Uncontrolled study biases, like selection bias, misclassification, and residual confounding,
can easily make the odds-ratio parameter eβ equal to 1.2 or even 1.5 when no underlying causal effect is
present (Kelsey et al., 1996; Rothman and Greenland, 1998). As apparent from Table 1, small-sample bias
can then operate on this biased parameter to generate CMLEs of 2 or more, which seem less plausibly explained by study biases. The contribution of such synergistic bias effects to the generation of controversial
results may be considerable when most studies have few exposed cases.
Another potential for harmful synergy can arise from unnecessary matching. If the matching factor is
related only to the exposure, such overmatching increases the variance of the CML estimator of the odds
ratio by reducing the number of discordant matched sets available for analysis (Miettinen, 1970; Thomas
and Greenland, 1983). An additional consequence of this reduction is an increase in the small-sample
bias of the odds-ratio estimator. Some older writings on the impact of matching (e.g., Chase, 1968) did
not encounter these problems because they focused on tests of the null hypothesis under random matching
(which does not increase concordance) or focused on the difference in proportions, whose variance decreases as the pairwise correlation (and hence concordancy) increases, and which is exactly unbiased for
the average pairwise difference in response probabilities. In case-control studies, however, the ‘response’
is exposure status, and so the response difference is of no direct interest.
The present paper concerns the poor behavior of CML odds-ratio estimators under conditions common
in epidemiology (studies with few discordant matched pairs). This behavior does not have a simple
relation to the behavior of tests of the null hypothesis. Consider the behavior of the Wald test for univariate
ˆ β)
ˆ E(
ˆ as a standard normal statistic for testing β = 0. W exhibits quite different
β, treating W = β/S
ˆ It has long been known that W can eventually decline as |β|
ˆ gets larger given a
pathologies from β.
ˆ β)
ˆ can increase more rapidly than βˆ as the latter increases.
fixed sample size, due to the fact the S E(
ˆ β)
ˆ = (1/u + 1/v) 12 , so N = 24
For example, with matched pairs, the CML odds ratio is u/v and S E(
1
ˆ
discordant pairs and u = 22 yield eβ = 22/2 = 11 and W = ln(11)/(1/22 + 12 ) 2 = 3.25, whereas
120
S. G REENLAND
6. R ECOMMENDATIONS
ACKNOWLEDGEMENTS
The author thanks Kris Ebi, David Savitz, and Luciano Zaffanella for use of the example data, and the
referees for helpful comments.
R EFERENCES
A BENHAIM, L., M ORIDE, Y., B RENOT, F., R ICH, S., B ENICHOU, J., K URZ, X., H IGENBOTTAM, T., OAKLEY, C.,
W OUTERS, E., AUBIER, M., et al. (1996). Appetite-suppressant drugs and the risk of primary pulmonary hypertension. New England Journal of Medicine 335, 609–616, Table 3.
AGRESTI , A. AND C OULL, B. A. (1998). Approximate is better than ‘exact’ for interval estimation of binomial
proportions. American Statistician 52, 119–126.
A NDERSON , J. A. AND R ICHARDSON, S. C. (1979). Logistic discrimination and bias correction in maximum likelihood estimation. Technometrics 21, 71–78.
Downloaded from http://biostatistics.oxfordjournals.org/ by guest on October 6, 2014
The simplest diagnostic for small-sample or sparse-data problems is close tabular examination of basic
data. In the above example, the possibility of small-sample artifacts did not occur to the co-investigator
who first presented the CML odds ratios to the research team, simply because the total number of pairs
(259) seemed quite large. Even a more sophisticated summary, noting there are fifty-six discordant pairs
(twenty-eight ‘informative’ pairs per parameter), would not have signaled problems. Only the full pair
table (Table 1) shows the pair sparsity.
Full tabulation may seem impractical or unreliable when multiple covariates (some perhaps continuous) are entered in the model. A crude rule of thumb, adapted from an oft-cited rule for unconditional
logistic regression (Peduzzi et al., 1996), would require at least ten discordant matched sets per estimated parameter. This rule, however, fails dramatically in the above example. I therefore suggest that a
Bayesian or (more generally, when applicable) an hierarchical Bayes analysis can serve as a diagnostic, in
the following sense: if the results change dramatically between a CML analysis and a Bayesian analysis
with scientifically reasonable priors, one at least has a warning of severe data limitations. This use of a
Bayesian analysis need entail no commitment to the Bayesian results over the CML results, but may serve
to temper reliance on the CML results in formulating conclusions. For this purpose, simple approximate
fitting methods may suffice (Greenland, 1993b; Witte and Greenland, 1996; Greenland, 1997; Breslow et
al., 1998), although even these are not invulnerable to sparse-data bias (Neuhaus and Segal, 1997).
A fully Bayesian analysis with scientifically sensible priors is of course the Bayesian solution to the
sample-size problem, provided one uses a fitting method appropriate for small samples. Frequentists
might argue instead in favor of formal bias corrections or exact analysis. Formal bias corrections for the
multiple-regression case are currently only available for the coefficients, which, as argued above, are not
the final parameters of interest for public-health purposes. Exact analysis has more serious shortcomings.
Exact intervals are constructed to ensure at least nominal coverage of the true parameter value. This
assurance extends to all parameter values, no matter how absurdly large. The cost is that exact intervals
tend to expand to values beyond the CML intervals, driving them even further from the Bayesian posterior
intervals than the CML intervals. They thus can be even more misleading than the CML intervals when
(as seems inevitable) they are interpreted as posterior intervals by the consumer. On the practical side, the
capacity of exact programs remains limited, despite remarkable computing advances.
Regardless of how one chooses to deal with it, the potential for small-sample bias in results from
asymptotic procedures needs to be checked more routinely than is current practice. The development of
easily programmed sample-size diagnostics for commercial software would be of particular value; formal
bias corrections might serve well in this role.
Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators
121
Downloaded from http://biostatistics.oxfordjournals.org/ by guest on October 6, 2014
B ECKER, S. (1989). A comparison of maximum likelihood and Jewell’s estimators of the odds ratio and relative risk
in single 2 × 2 tables. Statistics in Medicine 8, 987–996.
B ISHOP, Y. M. M., F IENBERG , S. E. AND H OLLAND, P. W. (1975). Discrete Multivariate Analysis: Theory and
Practice. Cambridge, MA: MIT Press.
B RESLOW, N., L EROUX , B. AND P LATT, R. (1998). Approximate hierarchical modelling of discrete data in epidemiology. Statistical Methods in Medical Research 7, 49–62.
B RESLOW, N. E. (1981). Odds ratio estimators when the data are sparse. Biometrika 68, 73–84.
B RESLOW, N. E. AND C LAYTON, D. G. (1993). Approximate inference in generalized linear mixed models. Journal
of the American Statistical Association 88, 9–25.
B RESLOW, N. E. AND DAY, N. E. (1980). Statistical Methods in Cancer Epidemiology. I. The Analysis of CaseControl Studies. Lyon: IARC.
B YTH , K. AND M C L ACHLAN, G. T. (1978). The biases associated with maximum likelihood methods of estimation
of the multivariate logistic risk function. Communications in Statistics A7, 877–890.
C HASE, G. R. (1968). On the efficiency of matched pairs in Bernoulli trials. Biometrika 55, 365–369.
C LAYTON , D. AND H ILLS, M. (1993). Statistical Models in Epidemiology. New York: Oxford University Press.
C ORDEIRO , G. M. AND M C C ULLAGH, P. (1991). Bias correction in generalized linear models. Journal of the Royal
Statistical Society B 53, 629–643.
DALING, J. R., M ALONE, K. E., VOIGT, L. F., W HITE , E. AND W EISS, N. S. (1994). Risk of breast cancer among
young women: relationship to induced abortion. Journal of the National Cancer Institute 86, 1584–1592.
E BI, K. L., Z AFFANELLA , L. E. AND G REENLAND, S. (1999). Application of the case-specular method to two
studies of wire codes and childhood cancers. Epidemiology 10, 398–404.
F EYCHTING, M., F ORSSEN, U., RUTQUIST, L. E. AND A HLBOHM, A. (1998). Magnetic fields and breast cancer in
Swedish adults residing near high-voltage power lines. Epidemiology 9, 392–397.
G OOD, I. J. (1983). Some history of the hierarchical Bayesian methodology. In Good Thinking ed. Good, I.J. Chapter 9, 95–105. Minneapolis, MN: University of Minnesota Press.
G REENLAND, S. (1991). Reducing mean squared error in the analysis of stratified epidemiologic studies. Biometrics
47, 773–775.
G REENLAND, S. (1993a). A meta-analysis of coffee, myocardial infarction, and sudden coronary death. Epidemiology
4, 366–374.
G REENLAND, S. (1993b). Methods for epidemiologic analyses of multiple exposures: A review and a comparative
study of maximum-likelihood, preliminary testing, and empirical-Bayes regression. Statistics in Medicine 12, 717–
736.
G REENLAND, S. (1997). Second-stage least squares versus penalized quasi-likelihood for fitting hierarchical models
in epidemiologic analysis. Statistics in Medicine 16, 515–526.
G REENLAND, S. (1999). A unified approach to the analysis of case-distribution (case-only) studies. Statistics in
Medicine 18, 1–15.
G REENLAND, S., S CHWARTZBAUM , J. A. AND F INKLE, W. D. (2000a). Problems due to small samples and sparse
data in conditional logistic regression analysis. American Journal Epidemiology 151, in press.
G REENLAND, S., S HEPPARD, A. S., K AUNE, W. T., P OOLE , C. AND K ELSH, M. A. (2000b). A pooled analysis of
magnetic fields, wire codes, and childhood leukemia. Epidemiology 11, in press.
H AUCK , W. W. AND D ONNER, A. (1977). Wald’s test as applied to hypotheses in logit analysis. Journal of the
American Statistical Association 72, 851–853.
J EWELL, N. P. (1984). Small-sample bias of point estimators of the odds ratio from matched sets. Biometrics 40,
421–435.
J EWELL, N. P. (1986). On the bias of commonly used measures of association for 2×2 tables. Biometrics 42, 351–358.
K ALISH, L. A. (1990). Reducing mean-squared error in the analysis of pair-matched case-control studies. Biometrics
46, 493–499.
K ELSEY, J. L., W HITTEMORE, A. S., E VANS , A. S. AND T HOMPSON, W. D. (1996). Methods in Observational
Epidemiology. New York: Oxford University Press.
K RAUS, A. S. (1960). Comparison of a group with disease and a control group from the same families, in search of
possible etiologic factors. American Journal of Public Health 50, 303–311.
122
S. G REENLAND
[Received June 28, 1999. Revised October 25, 1999]
Downloaded from http://biostatistics.oxfordjournals.org/ by guest on October 6, 2014
L EONARD , T. AND H SU, J. S. J. (1994). The Bayesian analysis of categorical data: a selective review. In Aspects of
Uncertainty ed. Freeman, P. R. and Smith, A. F. M. Chapter 18, 283–310. New York: Wiley.
L IANG , K.-Y. AND Z EGER, S. L. (1988). On the use of concordant pairs in matched case-control studies. Biometrics
44, 1145–1156.
L IU, K.-J. (1989). A note on the estimate of the relative risk when sample sizes are small (letter). Biometrics 45,
1030–1031.
L OG X ACT (1993). Cambridge, MA, Cytel.
M IETTINEN, O. S. (1970). Matching and design efficiency in retrospective studies. American Journal of Epidemiology
91, 111–118.
M ORGENSTERN , H. AND G REENLAND, S. (1990). Graphing ratio measures of effect. Journal of Clinical Epidemiology 43, 539–542.
N EUHAUS , J. M. AND S EGAL, M. R. (1997). An assessment of approximate maximum likelihood estimators in
generalized linear models. In Modelling Longitudinal and Spatially Correlated Data: Methods, Applications, and
Future Directions ed. Gregoire, T.G., Brillinger, D. R., Diggle, P. J., Russek-Cohen, E., Warren, W. G. and Wolfinger, R. D. 11–22. New York: Springer.
P EDUZZI, P., C ONCATO, J., K EMPER, E., H OLFORD , T. R. AND F EINSTEIN, A. R. (1996). A simulation study of the
number of events per variable in logistic regression analysis. Journal of Clinical Epidemiology 49, 1373–1379.
P ORTIER , C. J. AND W OLFE, M. S. (1998). Assessment of Health Effects from Exposure to Power-line Frequency
Electric and Magnetic Fields. Research Triangle Park, NC; National Institute of Environmental Health Sciences.
ROTHMAN , K. J. AND G REENLAND, S. (1998). Modern Epidemiology (2nd edn). Philadelphia: Lippincott-Raven.
S CHAEFER, R. L. (1983). Bias correction in maximum-likelihood logistic regression. Statistics in Medicine 2, 71–78.
S CHWARTZBAUM, J. A., F ISHER , J. L. AND C ORNWELL, D. G. (1998). Role of dietary energy and cured meat
consumption in adult glioma risk (abstract). American Journal of Epidemiology 147, S7.
S IEGEL , D. G. AND G REENHOUSE, S. W. (1973). Validity in estimating relative risk in case-control studies. Journal
of Chronic Diseases 42, 687–688.
T HOMAS , D. C. AND G REENLAND, S. (1983). The relative efficiencies of matched and independent sample designs
for case-control studies. Journal of Chronic Diseases 36, 685–697.
VAETH, M. (1985). On the use of Wald’s test in exponential families. International Statistics Review 53, 199–214.
WALTER , S. D. AND C OOK, R. J. (1991). A comparison of several point estimators of the odds ratio in a single 2 × 2
contingency table. Biometrics 47, 795–811.
W ITTE , J. S. AND G REENLAND, S. (1996). Simulation study of hierarchical regression. Statistics in Medicine 15,
1161–1170.
W ITTE, J. S., G REENLAND, S., H AILE , R. W. AND B IRD, C. L. (1994). Hierarchical regression analysis applied to
a study of multiple dietary exposures and breast cancer. Epidemiology 5, 612–621.
Z AFFANELLA, L. E., S AVITZ, D. A., G REENLAND , S. AND E BI, K. L. (1998). The residential case-specular method
to study wire codes, magnetic fields, and disease. Epidemiology 9, 16–20.