Small-Sample Adjustments for Wald-Type Tests Using Sandwich Estimators Author(s): Michael P. Fay and Barry I. Graubard Source: Biometrics, Vol. 57, No. 4 (Dec., 2001), pp. 1198-1206 Published by: International Biometric Society Stable URL: http://www.jstor.org/stable/3068253 Accessed: 02/07/2009 11:27 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=ibs. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that promotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected]. International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to Biometrics. http://www.jstor.org BIOMETRICS 57, 1198-1206 December 2001 Small-Sample Adjustments for Wald-Type Tests Using Sandwich Estimators Michael P. Fay National Cancer Institute, 6116 Executive Boulevard, Suite 504, MSC 8317, Bethesda, Maryland 20892-8317, U.S.A. email: [email protected] and Barry I. Graubard National Cancer Institute, 6120 Executive Boulevard, EPS Room 8032, Bethesda, Maryland 20892-7335, U.S.A. SUMMARY. The sandwich estimator of variance may be used to create robust Wald-type tests from estimating equations that are sums of K independent or approximately independent terms. For example, for repeated measures data on K individuals, each term relates to a different individual. These tests applied to a parameter may have greater than nominal size if K is small, or more generally if the parameter to be tested is essentially estimated from a small number of terms in the estimating equation. We offer some practical modifications to these robust Wald-type tests, which asymptotically approach the usual robust Wald-type tests. We show that one of these modifications provides exact coverage for a simple case and examine by simulation the modifications applied to the generalized estimating equations of Liang and Zeger (1986), conditional logistic regression, and the Cox proportional hazard model. KEY WORDS: Conditional logistic regression; Cox proportional hazards model; Generalized estimating equations; Robust Wald statistics; Sandwich estimator; Small sample size. 1. Introduction The sandwich estimator of variance has been used with many different types of data to provide inferences robust to certain model misspecifications. We mention three examples. Liang and Zeger (1986), using their generalized estimating equations (GEEs), applied sandwich estimators of variances to repeated measures data. Lin and Wei (1989) applied sandwich estimators to the Cox proportional hazards model, and Fay et al. (1998) applied them to conditional logistic regression. Each of these applications use estimating equations with K independent orapproximately independent terms. The Wald-type tests used with the sandwich estimator are valid as K goes to infinity. For finite samples simulations have shown that these (unadjusted) Wald sandwich tests tend to be liberal (see Lin and Wei, 1989; Emrich and Piedmonte, 1992; Gunsolley, Getchell, and Chinchilli, 1995; Fay et al., 1998; and Mancl and DeRouen, 2001). In this paper we consider finite sample adjustments for Wald sandwich tests and apply them to quite general estimating equations, which include the three applications mentioned above. Consider the data that motivated this research. Fay et al. (1997) performed a meta-analysis to determine the effect of different types of dietary fat on the development of mammary tumors in rodents. The analysis combined 146 exper- iments ("sets" in the terminology of Fay et al., 1997). Each experiment had two or more groups of animals, each group received the same intervention except for diets, and the response was the number of animals in the group that developed tumors. Although within an experiment there were only diet differences between the groups, between experiments there were many differences. For example, the type and/or dose of carcinogen and the amount of follow-up time varied between experiments. Thus, as detailed in Fay et al. (1998), a conditional logistic regression was used to condition out intercept differences between experiments, and a sandwich estimator of variance was used to account for heterogeneity of the diet parameters. A random effects model for the diet parameters was not tractable because each of the diet effects was not varied within each experiment. For example, Fay et al. (1997) measured effects of four types of fat, but one of the types of fats, n-3 polyunsaturated fatty acid (n-3 PUFA), was rarely present except in very small portions. Only 22 of the experiments had any dietary groups with greater than 1% of the dietary fat equal to n-3 PUFA. Thus it is difficult to estimate a random effect for the n-3 PUFA parameter in all but these 22 experiments. The important aspect of these data for this paper is that although there is a large sample size of independent observations, i.e., 146 experiments that make up 1198 Small-Sample Adjustments for Wald-Type Tests the terms in the estimating equation, there is an effectively small sample size for testing the n-3 PUFA parameter. When a parameter is primarily estimated from a small proportion of the terms in the estimating equations (e.g., the n-3 PUFA parameter in the diet data), we call the estimating equations unbalanced for that parameter. We reanalyze the diet data in section 3.3. We propose two types of adjustments. First, we use Taylor series approximations to adjust for the bias of the sandwich estimator of variance. Mancl and DeRouen (2001) provide such an adjustment for GEEs and compare it to many earlier bias corrections (see that paper for earlier references). In this paper we propose a bias correction similar to that of Mancl and DeRouen (2001), but our correction may be applied to more general estimating equations than the GEE. Our second adjustment is to use an F (or t) distribution instead of a chi-square (or a normal) distribution to calculate significance. Others have used this adjustment with the canonical use of 1 and K - p degrees of freedom, where p is the number of parameters describing the mean (see Lipsitz et al., 1994, and Mancl and DeRouen, 2001, for the GEE case). As seen for the diet example above, these canonical degrees of freedom may not be appropriate for estimating equations that are unbalanced for certain parameters. This problem was noted by Chesher and Austin (1991) for the simple linear model. We propose an estimator of degrees of freedom that produces different degrees of freedom depending on which parameter (or combination of parameters) is being tested. These estimators can more properly adjust for situations such as the diet example. Similar but not equivalent small sample adjustments have been proposed for some special cases (see Fay et al., 1998, for conditional logistic regression; Lipsitz and Ibrahim (1999) for linear regression; and Fai and Cornelius (1996) for unbalanced split-plot experiments). Because of the way we motivate our degrees of freedom estimators, we restrict ourselves to tests of linear combinations of parameters. In other words if p is the parameter vector, then we are restricted to tests with null hypotheses equal to CT/ = CTo0, where 3o is known and C is a vector of constants (i.e., C is not allowed to be a matrix). In section 2.1 we review the use of the sandwich estimator of variance and its associated Wald test. In section 2.2 we propose our bias correction and compare it to that of Mancl and DeRouen (2001), and in section 2.3 we propose some degrees of freedom estimators. In section 3 we examine one special case where one of the modifications produces an exact test, and we perform simulations to test these modifications when used with GEEs, conditional logistic regression, and Cox proportional hazards models. 2. Main Result 2.1 Background Consider estimating equations of the form, SK1 Ui(/3) = 0, where 3 is a p x 1 parameter vector. Let / be the solution to the estimating equations. Let Ui = Ui(/), and let a hat over any function denote evaluation at 3 (e.g., Ui Ui(3)). Assume E{ iK_1 Ui(/3)} = 0 for some 3o, and both cov{Ui (3o), Uj3(3o)} - 0 for i ? j and -30 -- 0 as K -- oo. We use a Taylor series approximation about / but replace the 1199 derivative by an estimator, Ui - Ui i ( (1) -), where Qi is an estimator of -dUi//3 evaluated at 3. Summing over all clusters and rearranging terms gives / -/3 i Vm (2) I where Vm == (K 1 Qi)-l and will be called the "modelbased" variance. If Vm is approximately constant with small changes in /3, then the variance of 3 - 3 may be estimated with the sandwich estimator, Vs = Vm(EK1 UiUT)Vm. We consider hypotheses of the form Null: CT/ = CT3o vs. Alternative: CTP3 CT 30, where C is a p x 1 vector of constants. For example, we test whether 3j = 0 or not by letting C be all zeros except for a 1 in the jth row and 0o = 0. The unadjusted Wald sandwich test rejects when T2 {T( - 3o)}2/CT VsC > (X2)-(1 - a), where (X2)-l(q) is the qth quantile of the chi-square distribution with one degree of freedom. 2.2 A Bias-Corrected Sandwich Covariance Estimator Mancl and DeRouen (2001) motivated a bias correction of Vs for the GEE case by using a first-order Taylor series expansion of the ith residual vector together with approximation (2) to approximate the expected value of the ith squared residuals. Analogously, we use a first-order Taylor series expansion of Ui (approximation [1]) together with approximation (2) to obtain E (UiU^) (Ip- QiVm) 'i (Ip- Qi Vm) ( + QjVm Z"'j \j ; I Vm,(3 (3) where i = cov(Ui) and Ip is a p x p identity matrix. For tractability, Mancl and DeRouen (2001) motivated their bias correction by assuming the the last term in their expression (4) is small [note there is a typographical error in expression (4) of Mancl and DeRouen, 2001; the cov(yi) in the last term should be cov(yj)], which is analogous to assuming here that the last term in expression (3) is small. We take a different approach to tractability. We consider a correction that is reasonable when the working variance model is approximately within a scale factor of the true variance, i.e., when 'i cQi for all i and some constant c and the model-based variance is consistent. For example, in the GEE case this would occur when the correlation model is correctly specified. When ji . c2i, approximation (3) T i(Ip - VmQi). simplifies to E(UiUi) (Ip - QiVm)qi To partially correct for this bias [i.e., E(UiUT) T 'i], we estimate Ii with i = HiUi Ui HiT, where Hi is given below and the form of the correction ensures that Ii is a symmetric nonnegative definite matrix. In general, (Ip - QiVm) is not may symmetric, so that the choice of Hi = (Ip - iVm)-/2 not exist. Instead, we propose the simple bias correction of letting Hi be a p x p diagonal matrix with jjth element equal to {1 - min(b, {fiVm}jj)}-1/2, where b < 1 is a constant 1200 Biometrics, defined by the user. Setting a bound, b, is a practical necessity to prevent extreme adjustments when the jjth element of QiVm is very close to 1. (In fact it is possible for the jjth element of QiVm to be greater than 1; see http://srab. cancer. gov/sandwich.) We arbitrarily use b = 0.75 for our simulations, which ensures that each diagonal element of Hi is less than or equal to 2. We write this adjusted sandwich estimator as Va = Vm (K-=1 HiUiUTHi) Vm. Although the bound 0.75 is arbitrary, the bound of b = 0.75 is rarely reached. For example, the simulations in section 3 gave almost exactly the same results (results not shown) when run without any bound (i.e., the bound is infinity). 2.3 An Approximate F-Distribution Let T2 be the Wald test statistic using Va instead of Vs. The main result motivated in this section is B {CT(f-_o)}2U T =CTVaC UTB1U (4) Fl,d, where ~ denotes approximate distribution under the null Bo and B1 are pK x pK hypothesis, UT = [UT ... U, matrices given in appendix A, F1,d is an F distribution with 1 and d degrees of freedom, and we give estimators of d later. Following a standard derivation of the F distribution, our F distribution approximation may be motivated by the following three approximate conditions: (1) cr-2UTBoU X1; (2) a-2dUTB1U Xd; and (3) UTBoU and UTB1U are approximately independent, where a2 = var{CT(/3 - 3o)}. We discuss each of these approximations in Appendix A. In order to estimate d we need estimators of qi for i = 1,.. , K. We estimate Ti in one of two ways. First, we simply use Ii as in section 2.2; this estimator tends to overestimate the heterogeneity of the Ti. One can see this by noting that even when all Fi are equal, the Ji vary. The associated estimator of d is dH= { ( ) {trace (Bi)}2 trace (B1IB1) (5) ' where I = block diagonal(Tl, * * IK) (see Appendix A). Alternatively, we propose the more complicated iV = Wi(Ke=l we))-1(_i= lj), where wi = CT{(ji Q)-1 - Vm}C and wi(E we)- represents the proportional reduction in the model-based variance of CT(3 - 3) because of the addition of the ith cluster. The idea behind the /i is that it smooths the extremely variable estimates Tj, j = 1,..., K, yet unlike the unweighted sum (i.e., using K-1 Z j to estimate each Ti) it still accounts for some of the differences in cluster variability associated with the working variance model. We estimate dH in a similar manner to dH by replacing 'Ji with 'i in equation (5). To see if the adjustment of section 2.2 is necessary, we estimate the distribution of Ts2 with an F distribution with 1 and d (equal to either d or d) degrees of freedom. Here d is calculated the same as dH except with the replacement of Hi with an identity matrix for all i. Similarly define d. We compare the different methods in section 3. December 2001 The adjustments of this section do not affect the asymptotic properties of the unadjusted Wald sandwich test. Under the assumption that the data are sufficiently regular such that i(EQj)as K -- 1 - 0 for all i and d -- oo, (Ip - iVm)1 - oo as K -* oo, then 0. Ip, and Va-Vs Further, if d -- oo, then the Fl,d distribution approaches a chi-square distribution with 1 degree of freedom. Thus even if the assumptions and approximations that motivate equation (4) are tenuous in some situations, the adjustments of this section may likely be an improvement over the unadjusted Wald sandwich test. 3. Simulations and Special Cases 3.1 Normal Model with the Same Design for Each Cluster Consider the (true) model Yi/ - N(X/, E), where Yi is an n x 1 vector of responses, X is an n x p full-rank design matrix, n > p, and E is a pxp covariance matrix. We use independence estimating equations (see Liang and Zeger, 1986), i.e., Ui = (XT(Yi - X/3), where q-1 is a scalar dispersion estimator. Under the null model for any nondegenerate E, X, and C, then T2 = {(K - 1)/K}T2 F1,K-1. Further, here the degrees of freedom estimators, dH and d, give K - 1, so that the test, say 65, that compares T2 with F1 , produces an exact test for any K (see Appendix B). In contrast, for K = 20, the standard sandwich test, 61 (T2 compared with X2) has size 0.071, 62 (T2 compared with F -) has simulated size 0.034, 63 (T2 compared with F i,) has size 0.055, and 64 (T2 compared with F1- ) has simulated size l,dH 0.031 where the simulations had 100,000 replications. The associated test statistic of Mancl and DeRouen (2001) is T/D = {(K - 1)/K}2 T2. For K = 20, comparing TMD to X1 has size 0.059, whereas comparing it to F1,K-p with p = 1 has size 0.045, and comparing it to F1,K-p with p = 2 has size 0.044. Thus in this case 65 performs best, but 63 and the tests of Mancl and DeRouen (2001) have close to nominal size. 3.2 Generalized Estimating Equations Consider GEEs of the form (see Liang and Zeger, 1986) V-1 Z= 1 Ei-U() DTUi()) (/,){Y - U(/P)}, where Yi is an ni x 1 vector of responses, /i is the model of E(Yi), Di = O,ai(P3)/o/3and Vi(3) estimates the "working variance" of Yi. The function Vi(/3) has a complicated form. [It is denoted Vi(3) in Liang and Zeger, 1986. See that paper for details.] Let our estimate of -OUi/O/ be Qi = DTvi-1Di, and Vs is the sandwich estimator proposed by Liang and Zeger (1986). To test the GEE models we simulated both Poisson and binomial data and tested the GEE model with both an independence and exchangeable working correlation within the cluster. For the Poisson case we simulated four types of data sets, all with K = 20 clusters. Each type of data set is denoted by one level of each of two descriptors, the variance (either Poisson or overdispersed Poisson), and the treatment assignment (either changed within the cluster or fixed within cluster). For the models that changed treatment assignments within the cluster we used the working independence variance, and for those that did not we used the working exchangeable variance. For the model-based variance we included a scalar overdispersion Small-Sample Adjustments for Wald-Type Tests 1201 Table 1 Simulations for poisson GEE: proportion rejected at a = 0.05 level (average length of confidence intervals) for 1000 simulations Test Distribution Test Variance (df) One Treatment Per Cluster Ri = Exchangeable T = Oa T = 1/2 Both Treatments for each Cluster Ri = Independence T= 0 T = 1/2 .091 (.424) .043 (.060) .330 (.115) .090 (.079) Vm 6m X2(1) .089 (.421) .075 (.058) .074 (.227) .103 (.079) Vs 61 X2(1) .052 .040 .028 .029 (.325) Vs 62 (.097) (.555) (.071) F(1,d) .067 .066 .053 .055 (.244) 63 Vs (.451) (.062) F(1,d) (.087) .042 .039 .022 .024 (.335) Va 64 (.570) (.073) F(1,dH) (.101) .059 (.091) .064 (.463) .046 (.064) .048 (.251) F(1, dH) Va 65 .041 (.098) .047 (.501) .043 (.066) .041 (.258) 66 VMD F(1, K- p) a Forthis column, 991 out of 1000 converged(see http: //srab. cancer. gov/sandwich for a non-converging data set); results relate to those 991 simulations. All other columns had 1000 convergedsimulations. Data are simulated by Yij Poisson(/ij), i = 1, ...,20; j = 1,...,nil + ni2 where tij = exp(log(10)+ xijbij), xij = 1 for j < nil, xij = -1 for j > nil, and bij are independent pseudonormal random variates with mean 0 and standard deviation r. All models of means are lij = exp(/30+ q3lxij). We test whether /1 = 0 and the average length of confidenceintervals are for P1 only. "Both treatments for each cluster" is nia = ceiling(Nia), for a = 1, 2, where Nia is distributed pseudo-Gammawith mean 10 and variance 20, and ceiling(X) gives the smallest integer greater than or equal to X. "Onetreatment per cluster" uses the same method for generating the nia, then sets ni2 = 0 for i < 10 and sets nil1 = 0 for i > 10. term as described in Liang and Zeger (1986); it is the default in the yags software that we modified to use for this simulation (the yags function written in Splus by V. Carey is reviewed in Horton and Lipsitz, 1999, and can be found at /index. ssoft. http: // biosunl. harvard. edu /carey html). The details of the data generation are listed with the results in Table 1. We see that 61 is liberal in all four cases, 63 is less liberal, whereas 65 appears to have values closer to the nomial level. The test 66 (T2D compared to F,K-p) appears to perform well, although perhaps slightly conservatively. The tests 62 and 64 can be very conservative, especially in the cases with both treatments in each cluster. We see in the last column of the first row that 6m may perform very poorly when the model is misspecified. All of the modified tests (62, 63, 64, 65, and 66) appear to have sizes that come closer to maintaining the nominal level than the standard sandwich test, and the average length of the confidence intervals give a crude measure of the price paid to achieve those sizes. We modeled the simulation for logistic regression after real data. Preisser and Qaqish (1999) have made available data from the Guidelines for Urinary Incontinence Discussion and Evaluation (GUIDE) study at http://www.phs.wfubmc. Preisser and Qaqish (1999) use edu/data/uipreiss.html. five covariates measured from a baseline survey to predict whether individual patients would respond yes or no as to whether they are bothered by accidental loss of urine. There were 137 patients from 38 practices in the study, and 54 patients responded that they were bothered by their urinary incontinence. We fit a logistic model without each of the five covariates in the model and used each of the five sets of fitted values as the true probability of response. We then simulated binary responses using those probabilities. Thus each of the five sets of simulated data came from a logistic model with one of the parameters equal to zero. We then tested whether that parameter equals zero. For this simulation we let the working correlation be the independence model. The results are presented in Table 2. In this case the model-based variance is correctly specified and 6m performs well and 61 appears consistently liberal, whereas 64 and 65 appear quite conservative, with 62 and 66 slightly less conservative. In this case, 63 appears to be the best of the sandwich tests. In Table 3 we list the sample variance of the parameter estimate and compare it to the mean of the variance estimates for the four variance estimates, Vm, Vs, VMD (Mancl and DeRouen's, 2001, adjustment), and Va. We bold the variance estimator with the mean closest to the sample variance of the parameter estimate. When the model is misspecified (see the fourth row), Vm underestimates the variance, but otherwise Vm does well. The estimator Vs tends to underestimate the variance. Neither of the corrected sandwich variance estimates, VMD and Va, appears to regularly outperform the other. In many situations both VMD and Va appear to overcorrect for bias. 3.3 Conditional Logistic Regression For conditional logistic regression, we have binary responses grouped into clusters, and within each cluster we condition on the total number of positive responses. The estimating equations can be written in terms of the sufficient statistic for 3, ti (i.e., Ui(/3) = ti - E(ti 1 3), where E(ti I 3) is the modeled value for ti given 3). Then fi(/3) = -OUi/O3. For details see, e.g., Fay et al. (1998). We repeated four sets of the simulations from Fay et al. (1998), which were motivated by a meta-analysis. We briefly describe the simulations using the same terminology as Fay et al. (1998); see that paper for more details. Each simulation 1202 December 2001 Biometrics, Table 2 Simulations for logistic independence estimating equations derived from GUIDE data: proportion rejected at a = 0.05 level (average length of confidence intervals) for 1000 simulations Test Var Test Distn (df) Model without FEMALE 6m 61 62 63 64 65 66 Vm Vs Vs Vs Va Va VMD X2(1) x2(1) F(1,d) F(1,d) F(1,dH) F(1,dH) F(1, K - p) .048 .078 .038 .051 .018 .026 .042 (2.58) (2.41) (3.06) (2.70) (3.47) (3.02) (2.87) Model without AGE .052 .067 .039 .057 .018 .029 .039 estimates 3 = [/3132]T from K clusters of 60 binary responses, and here we test whether P1 is equal to zero or not. Each of the four sets of simulations is described by both the "design" (either A or D) and the "case" (either 1 or 4) and consists of 1000 simulations. Design A has K = 20 clusters, and by design one-fourth of the clusters do not contribute the estimation of ,31 because of conditioning, whereas similarly design D has K = 40 clusters but 37/40 of the clusters do not contribute to the estimation of /1. The responses are all generated with a random intercept term for each cluster. For case 1 the effects for / are fixed, whereas for case 4 these effects are random. Thus the conditional logistic model is correctly specified for case 1 but not for case 4. All simulations are under the null hypothesis. The results are presented in Table 4. Note that design D shows that 61 can be very liberal in extreme cases. We return to the meta analysis of Fay et al. (1997) mentioned in the introduction. The six different test results that the effect of the n-3 PUFA is different from zero give two-sided p-values of p = 0.0134 for 6m, p = 0.3377 for 61, p = 0.4136 for 62, p = 0.3590 for 63, p = 0.4320 for 64, and p = 0.3824 (2.38) (2.32) (2.68) (2.46) (3.06) (2.74) (2.63) Model without DAYACC Model without SEVERE .049 .076 .044 .058 .031 .037 .040 .051 .062 .029 .049 .010 .017 .033 (.258) (.242) (.286) (.269) (.329) (.296) (.289) (1.42) (1.36) (1.62) (1.46) (1.87) (1.69) (1.56) Model without TOILET .055 .080 .040 .056 .016 .028 .042 (.350) (.330) (.405) (.375) (.636) (.551) (.408) for 65. If there is a random effect of the n-3 PUFA, the usual sandwich test, 61, gives better coverage than the model-based test, 6m, but judging from our simulations this may be slightly liberal. Based on our simulations, the modified sandwich tests (62,63,64, and 65) appear more likely to have nominal coverage. We recommend the use of either 63 or 65 for having better coverage than 61 while (usually) not being overly conservative. For details of the biological issues and the full model, see Fay et al. (1997). 3.4 Cox Proportional Hazards The Cox model uses the partial likelihood for right-censored failure time data. Let the associated efficient score statistic for /3 be written as ES=1 U (P3),where the summation is over the K individuals whose failure time may or may not be rightcensored (see Appendix C for details of the notation). The terms Ui are not independent, but Lin and Wei (1989) showed ' Ui (/) = that the estimating equation may be written as i= where the are Ui SK=1Ui(/3), asymptotically independent, Ui = Ui + Ui, and where EiN1 Ui'(3) = 0 for all /3. Then Table 3 Comparison of simulated variance estimators Sample Variance of 3i Poisson Ri=Exchangeable, 7 = Oa Ri=Exchangeable, r= 1/2 Ri=Independence, T = 0 Ri=Independence, T = 1/2 Mean of Mean of Mean of Vm Vs VMD Mean of Va GEE (Table 1) 0.00051 0.00043 0.00042 0.00056 0.00045 0.01410 0.00023 0.00405 0.01237 0.00024 0.00087 0.01201 0.00023 0.00376 0.01482 0.00025 0.00424 0.01264 0.00024 0.00398 Logistic Independence Estimating Equations (Table 2) 0.47065 0.43737 0.39234 0.51774 0.47954 Model without FEMALE 0.42700 0.40968 0.37511 0.35659 0.44090 Model without AGE 0.00434 0.00390 0.00521 0.00455 Model without DAYACC 0.00466 0.12274 0.12409 0.13186 0.15042 0.16223 Model without SEVERE 0.00806 0.00732 0.01050 0.01101 Model without TOILET 0.00898 a For this row, 991 out of 1000 converged;statistics calculated from only those 991 simulations. All other rows had 1000 convergedsimulations. The bold value in each row represents the closest mean to the sample variance of /i. Adjustments for Wald- Type Tests Small-Sample 1203 Table 4 Simulations for conditional logistic regression: proportion rejected at a = 0.05 level (average length of confidence intervals) for 1000 simulations Test Var Test Distn (df) Design A Case 1 Design A Case 4 Design D Case 1 6m 61 Vm Vs X2(1) X2(l) .049 (.577) .098 (.534) .430 (.601) .090 (1.36) .027 (2.60) .252 (1.85) .345 .253 (2.80) (4.02) 62 63 64 65 Vs Vs Va Va F(1,d) F(1,d) F(1,dH) F(1,dH) .042 .072 .033 .066 .045 .065 .038 .056 .044 .047 .025 .021 .054 .038 .030 .019 (12.00) (10.19) (16.33) (13.79) (.681) (.590) (.708) (.613) (1.74) (1.51) (1.82) (1.57) Design D Case 4 (5.35) (4.45) (6.92) (5.77) All simulations test whether i1 = 0, and average confidence interval lengths are for /1 only. In design A p1 is estimated from 15 clusters, whereas in design D 41 is estimated from three clusters. For case 1 the effects for p3 are fixed, whereas for case 4 these effects are random. Vm = -{IK aU /ao+UU/ao}o - (Et1 -aUa 4. Discussion We have examined four new modifications to the standard Wald test with a sandwich estimator of variance. These modifications were necessary because the standard sandwich test , /a)- and the sandwich estimator of Lin and Wei (1989) is Vs. Although we do not need to define Qi, our estimator of for the calculation of Vm and Vs, for our adjust-OUi/la/, ments we do need to define Qi for i = 1,..., K. To do this we use numerical derivatives similar to the method that Gail, is liberal when the parameter Lubin, and Rubinstein (1981) used to calculate U'/O3la. We let the abth element of Qi be {Qi(/3)}ab = (4h)-1{Uia(/ + hb) - Uia( - hb) + Uib(/3 + ha) - Uib(3 - ha)}, where to be tested is essentially Uia is the ath element of Ui, ha is a vector of length p with all zeros except the ath row, which has a value of h, and h is some small number. We use h = 0.0001 for our simulations. We performed simulations on three true models, (1) a proportional hazards model, (2) model 10 from Table 1 of Lin and Wei (1989) in which the model-based test (6m) performed worst for K = 50, and (3) model 11 from the same table in which the sandwich test (61) performed worst for K = 50. We give the details of the models and the results in Table 5. Again the modified sandwich tests (62,63,64, and 65) appear closer to the nominal level than the standard sandwich test, 61, although 63 appears quite liberal for K = 20 for models 10 and 11. Table 5 Simulations for Cox regression: proportion rejected at a = 0.05 level (average length of confidence intervals) for 1000 simulations Test 6m 61 62 63 64 65 Proportional Hazards K = 20 K = 50 .051 .103 .055 .051 .038 (1.13) (0.99) (1.33) (1.25) (1.48) .043 (1.37) .042 .063 .037 .043 .032 (.614) (.567) (.672) (.625) (.715) .037 (.652) Model 10 K = 20 .180 .119 .047 .080 .039 esti- mated from a small number of terms in the estimating equation. This liberalness can occur when either K, the number of terms in the estimating equation, is small, or when primarily a small proportion of terms are used to estimate the parameter (or linear combination of parameters). We have referred to the latter type of estimating equations as unbalanced for that parameter. In all of our simulations the four new modifications had an estimated size less than the liberal size of the standard sandwich test. In the simple balanced normal model, 65 is exact, and 65 did reasonably well in the other balanced cases (see Tables 1 and 5 and design A from Table 4). However, in more unbalanced cases (see Table 2 and design D from Table 4), 65 appeared to produce a quite conservative test. For this reason, unless the data are fairly well balanced, we favor the use of 63, which appears to be less liberal than 61 but not overly conservative even for unbalanced equations. The (1.71) (2.26) (3.29) (2.89) (3.69) .068 (3.13) Model 11 K = 50 .195 .079 .049 .063 .044 (1.02) (1.47) (1.67) (1.59) (1.72) .059 (1.63) K = 20 .104 .137 .076 .089 .059 (1.13) (1.01) (1.38) (1.28) (1.55) .073 (1.40) K = 50 .073 .091 .049 .066 .039 (.614) (.593) (.729) (.656) (.780) .054 (.686) All simulations modeled the hazard proportional to exp(/3lzI + 2 z2) and tested whether 31 = 0 or not. True models with t = time to event: (proportional hazards) t = exp(z2)e2; (model 10) t = exp(-.5z2 - z2+ .5ej); (model 11) t = exp(-.5z2) + .5e; where Z1,Z2, and e1 are independent standard normal pseudorandomvariables (p-r.v.), with z1 and z2 truncated at ?5, and e2 is a standard exponential p-r.v. 1204 Biometrics, difference between 65 and 63 is that 65 includes a bias correction for the variance, whereas 63 does not. In our simulations on the GEE case, the standard sandwich estimator regularly underestimated the variance, whereas both our adjusted variance estimator, Va, and that proposed by Mancl and DeRouen (2001), VMD, appeared less biased in the balanced cases (see top section of Table 3). In the more unbalanced cases (see bottom section of Table 3), both Va and VMD appeared to overcorrect for bias in most cases, and this overcorrection in Va may be the source of the conservativeness of 65. Further work on correcting the bias of the sandwich variance estimator in unbalanced data is required. For the balanced cases the advantage of Va over VMD is that it may be applied to other cases besides the GEE case. Although not listed in the tables, we included in our simulations a test comparing T2 to F1,K-p as has been used in the literature (see, e.g., Lipsitz et al., 1994). This test gave estimated sizes that were always less than the standard sandwich test (61), but at least as large as the four new tests (62, 3, 64, and 65). As expected this test does reasonably well with balanced data but not with unbalanced data. For example, for Table 4 the four estimated sizes (to test f1 = 0) were 0.078 (design A, case 1), 0.073 (design A, case 4), 0.245 (design D, case 1), and 0.248 (design D, case 4). For insight into this, note the failure of that method to address the degrees of freedom differences in the parameters in design D, where /1 is estimated from only three clusters, whereas 32 is estimated from 39 clusters (see Fay et al., 1998). The degrees of freedom estimate, K - p, does not properly account for this data structure and compares both the T2 statistic for testing p1 = 0 and the T2 statistic for testing /2 = 0 against the same distribution, F,K-p. The complete simulation results of the tests comparing T2 to Fl,K-p along with Splus functions that perform our adjustments and the Splus programs used to perform all the simulations are given at http://srab.cancer.gov/sandwich. ACKNOWLEDGEMENTS We thank William Davis, Edward Korn, and the reviewers and editors for helpful comments on earlier versions of this paper. RESUME L'estimateur sandwich de la variance peut etre utilise pour creer des tests robustes type Wald a partir d'equations d'estimation qui sont la somme de K termes independants ou presque independants. Par exemple, dans le cas de donnees repetees sur K sujets, chaque terme correspond a un sujet. Ces tests sur un parametre peuvent avoir un risque de premiere espece eleve si K est petit, ou, plus generalement, si le parametre a tester est surtout estime a partir d'un petit nombre de termes dans l'equation d'estimation. Nous proposons quelques modifications pratiques de ces tests robustes type Wald qui tendent asymptotiquement vers les classiques tests robustes type Wald. Nous montrons que l'une de ces modifications donne des resultats equivalents dans les cas simples, et etudions par simulation les modifications appliquees aux equations d'estimation generalisees de Liang et Zeger (1986), a la regression logistique conditionnelle et au modele de Cox a risques proportionnels. December 2001 REFERENCES Chesher, A. and Austin, G. (1991). The finite-sample distributions of heteroscedasticity robust Wald statistics. Journal of Econometrics 47, 153-173. Emrich, L. J. and Piedmonte, M. R. (1992). On some small sample properties of generalized estimating equation estimates for multivariate dichotomous outcomes. Journal of Statistical Computation and Simulation 41, 19-29. Fai, A. H-T. and Cornelius, P. L. (1996). Approximate F-tests of multiple degree of freedom hypotheses in generalized least squares analyses of unbalanced split-plot experiments. Journal of Statistical Computation and Simulation 54, 363-378. Fay, M. P., Freedman, L. S., Clifford, C. K., and Midthune, D. N. (1997). Effect of different types and amounts of fat on the development of mammary tumors in rodents: A review. Cancer Research 57, 3979-3988. Fay, M. P., Graubard, B. I., Freedman, L. S., and Midthune, D. N. (1998). Conditional logistic regression with sandwich estimators: application to a meta-analysis. Biometrics 54, 195-208. Gail, M. H., Lubin, J. H., and Rubinstein, L. V. (1981). Likelihood calculations for matched case-control studies and survival studies with tied death times. Biometrika 68, 703-707. Gunsolley, J. C., Getchell, C., and Chinchilli, V. M. (1995). Small sample characteristics of generalized estimating equations. Communications in Statistics: Simulation and Computation 24, 869-878. Horton, N. J. and Lipsitz, S. R. (1999). Review of software to fit generalized estimating equation regression models. American Statistician 53, 160-169. Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 1322. D. Y. and Wei, L. J. (1989). The robust inference for the Lin, Cox proportional hazards model. Journal of the American Statistical Association 84, 1074-1078. Lipsitz, S. R., Fitzmaurice, G. M., Orav, E. J., and Laird, N. M. (1994). Performance of generalized estimating equations in practical situations. Biometrics 50, 270-278. Lipsitz, S. R. and Ibrahim, J. G. (1999). A degrees-of-freedom approximation for a t-statistic with heterogeneous variance. The Statistician 48, 495-506. Mancl, L. A. and DeRouen, T. A. (2001). A covariance estimator for GEE with improved small sample properties. Biometrics 57, 126-134. Preisser, J. S. and Qaqish, B. F. (1999). Robust regression for clustered data with application to binary responses. Biometrics 55, 574-579. Searle, S. R. (1982). Matrix Algebra Useful for Statistics. New York: Wiley. Received February 2001. Revised August 2001. Accepted August 2001. APPENDIX A Details of Degrees of Freedom Approximation Using approximation (2), we write {CT (fwhere Bo = FVmCC 30)2} UTBoU, VmFT and FT = [Ip . Ip]. Rewrite Small-Sample Adjustments for Wald-Type Tests CTVaC as K CTVaC = CTVm (HiUiU H,) VmC i=l K = E UT (HiVmCCTVmHi) Ui, i=l where the second equality uses the fact that CTVmHiUi is a scalar; this is why we require C to be a vector. We rewrite this as CTVaC = UT MU, where M is a pK x pK block diagonal matrix with ith block equal to HiVmCCTVmHi. We combine the Taylor series approximations for the Ui to get U m GU, where G = IpK - VmFT and QT = [i... QK] Thus CTVaC UT BU, where B1 = GT MG. Now consider the three conditions: X21:Assume CTVmUi has mean 0 and un(1) a-2UTBoU known variance cr? (where we allow ai = 0 for some i), that the CTVmUi are independent, and that approximation (2) holds. Provided some regularity conditions are true, by the Liapunouv central limit theorem C-1CT(l - 1205 and is distributed tK-1 /KZ{(K-1)-1IK=1 (Zi-Z)2}-1/2 by the standard derivation of the t-test, and T2 is distributed F1,K-1. All proposed estimators of d use equation (5) with different estimators of 1i and different values of M (for d and d the value Hi is set to Ip in the definition of M). Let Ti (I) be an arbitrary estimator of 'i (I) and Mi be an arbitrary block diagonal element of M. Then because the Qi = OX7TX are all equal, trace(QB1) = {(K - )/K}IEK trace('iMi) and trace(iBli B1) = {(K-2)/K}K=1 trace('iMJiiMi)+ R R-1 I 1++ (K - 1)2 where riK -- oo, where a2 = EK1 i2. Thus do) -+ N(0, 1) as K - po)}2 is asymptotically chi-squared with 1 a-2{CT( degree of freedom. (2) c-2dUTB1 U Xd: Assume U is distributed normal with mean 0 and variance I = block diagonal (T1,..., K). Under these assumptions, UTB1U has a distribution described by a weighted sum of chi-square random variables; however, we approximate its distribution with a (much simpler) Gamma distribution with the same mean and variance, i.e., with mean equal to trace('B1) and variance equal to 2trace(BB1iJBi) (see Searle, 1982, p. 355). Since UTB1U is an approximately unbiased estimator of a2, we estimate a2 with trace(TBi), so that u-2UTB1U is approximately Gamma with mean 1 and or equivavariance 2trace(BB1'Bl)/{trace(BjB1)}-2, ~ = where d xd lently c-2dUTB1U {trace(IB1)}2/ trace (B B1B ). (3) UTBoU and UTB1U are independent: Assume that U is normal with mean 0 and variance ', with rank(t) = pK. Under correct model specification, i.e., Ji' cfi for 0 and UTBoU and UTB1U are all i. Then BoIB1 approximately independent (see Searle, 1982, p. 356). R = JZ(ZiK - )4 and the Zi are the same independent standard normal random variables as in the previous expression for Ta. APPENDIX C Notation for Cox Proportional Hazards Model We use standard notation for counting processes, so that the notations Yi, Zi, Xi, and 6i are different in this section than in the rest of the paper. Let X1,.. ,XK be the time until failure or right-censoring, with 6i = 1 when Xi represents a failure and 6i = 0 otherwise. Let Zi(t) be the covariate vector for the ith observation observed at time t. Then Ui = 6i{Zi(Xi)Z(3, Xi)}, where K Z(3, t)= E Yi(t)Zi(t)exp{/f'Zi(t)} -i_ K E Yj(t) exp{3'Zj(t)} j=l Refer to the structure defined in section 3.1. Let Zi = K-1/2 x where 02 = var{CT(P 2)2 i=l Details: Normal Model with Identical Designs i - CTP)}, {CT(XTX)-xTy = X(XT K-1CT(XTX)-lXTX 3o) 2 - E(Zi APPENDIX B a-1 To simplify these expres- EK= _3Ki1 trace(kiMijMj3). K-2 sions we use the theorem that trace(AB) = trace(BA) for any matrices A and B for which AB and BA are defined, so that the traces may be written as scalars. Then we can show that equation (5) gives K- 1 for both d and dH, whereas d and dH may both be written as - X)-lC. Under the null hypothesis assumptions of the model, the Zi are independent standard normal random variables. Then K-1EK Zi = Z = = and EzI1 (Zi-Z)2 = (2KCTVsC K-1/2a-1CT(^-/), The last step comes because Hi = Ip{K/ c-2(K-1)CTVaC. (K - 1)}1/2 for all i. Thus, under the null hypothesis, Ta = and here Yi(t) = 1 if Xi > t and 0 otherwise. Further, K 0U? U,- [6jYi(Xj)exp{ 3'Zi(Xj)}{Zi(Xj) - Z({, Xj)}] 3j=1 - K E -h=l Yh(Xj) exp { 'Zh (Xj)} Biometrics, December 2001 1206 -h=l Y9=i( -j= 9Y (Xi) exp {3'Zj(Xi) where A02 = AAT, for any vector, A. x {zj(Xi)- Z(3, Xi)}?2
© Copyright 2024