Robust Inference in Sample Selection Models Mikhail Zhelonkin∗, Marc G. Genton†, and Elvezio Ronchetti∗ November 29, 2012 Abstract The problem of non-random sample selectivity often occurs in practice in many different fields. The classical estimators introduced by Heckman (1979) are the backbone of the standard statistical analysis of these models. However, these estimators are very sensitive to small deviations from the distributional assumptions which are often not satisfied in practice. In this paper we develop a general framework to study the robustness properties of estimators and tests in sample selection models. We derive the influence function and the change-of-variance function of the Heckman’s two-stage estimator, and demonstrate the non-robustness of this estimator and its estimated variance to small deviations from the assumed model. We propose a procedure for robustifying the estimator, prove its asymptotic normality and give its asymptotic variance. This allows us to construct a simple robust alternative to the sample selection bias test. We illustrate the use of our new methodology in an analysis of ambulatory expenditures and compare the performance of the classical and robust methods in a Monte Carlo simulation study. Keywords: Change-of-variance function; Heckman model; Influence function; M-estimator; Robust estimation; Robust inference; Sample selection; Two-stage estimator. 1 Introduction A sample selectivity problem occurs when an investigator observes a non-random sample of a population, that is, when the observations are present according to some selection rule. Consider, for instance, the analysis of consumer expenditures, where typically the spending amount is related to the decision to spend. If a regression model is estimated using the entire sample (including zero-expenditures) or a sub-sample with non-zero expenditures, the ∗ Research Center for Statistics and Department of Economics, University of Geneva, Blv. Pont-d’Arve 40, CH-1211 Geneva 4, Switzerland. E-mail: {Mikhail.Zhelonkin, Elvezio.Ronchetti}@unige.ch. † CEMSE Division, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia. E-mail: [email protected] 1 corresponding estimators of the regression coefficients will be biased. This type of problems arise in many research fields, including economics, sociology, political science, finance, and many others. A sample selection model can be represented by the following regression system ∗ y1i = xT1i β1 + e1i , (1) ∗ y2i = xT2i β2 + e2i , (2) ∗ ∗ where the responses y1i and y2i are unobserved latent variables, xji is a vector of explanatory variables, βj is a pj × 1 vector of parameters, j = 1, 2, and the error terms follow a bivariate normal distribution with variances σ12 = 1, σ22 , and correlation ρ. Notice that the variance parameter σ12 is set to be equal to 1 to ensure identifiability. Here (1) is the selection equation, defining the observability rule, and (2) is the equation of interest. The observed variables are defined by ∗ y1i = I(y1i > 0), (3) ∗ ∗ y2i = y2i I(y1i > 0), (4) where I is the indicator function. This model is classified by Amemiya (1984) as a “Tobit type-2 model” or “Heckman model” originally discussed by Heckman (1979) in his seminal paper. If all the data were available, i.e. the regressor matrix was of full rank, or the data were missing at random, i.e. no selection mechanism was involved, we could estimate the model by Ordinary Least Squares (OLS). But in general these conditions are not satisfied and the OLS estimator is biased and inconsistent. Heckman (1979) proposed two estimation procedures for this model. The first one is a Maximum Likelihood Estimator (MLE), based on the assumption of bivariate normality of the error terms. The second one is a two-stage procedure. Consider the conditional expectation of y2i given x2i and the selection rule ∗ E(y2i |x2,i , y1i > 0) = xT2i β2 + E(e2i |e1i > −xT1i β1 ). 2 Then the conditional expectation of the error term is in general different from zero, which leads to the following modified regression y2i = xT2i β2 + βλ λi (xT1i β1 ) + vi , (5) where βλ = ρσ2 , λi = φ(xT1i β1 )/Φ(xT1i β1 ) is the inverse Mills ratio, vi is the error term with zero expectation, and φ(·) denotes the density and Φ(·) the cumulative distribution function of the standard normal distribution, respectively. Heckman (1979) proposed then in the first stage to estimate β1 by probit MLE and to compute estimated values of λ, and in a second stage to use OLS in (5), where the additional variable corrects for the sample selection bias. Both estimators have advantages and drawbacks, studied extensively in the literature; see e.g. Stolzenberg and Relles (1997), Puhani (2000), Toomet and Henningsen (2008), and the general reviews Winship and Mare (1992), Vella (1998) and references therein. An important criticism is their sensitivity to the normality assumption, which is often violated. Several Monte Carlo studies have investigated the behavior of the estimators under completely different distributional assumptions; see Paarsch (1984) and Zuehlke and Zeman (1991). Various methods aiming at relaxing the distributional assumptions have been proposed. They include several semiparametric (Ahn and Powell 1993, Newey 2009) and nonparametric (Das et al. 2003) methods. Recently, Marchenko and Genton (2012) derived estimators for a sample selection model based on the t distribution of the error terms. To illustrate the sensitivity of the classical procedures in this model, consider the behavior of the sample selection bias (SSB) test (Heckman 1979) when we move a single observation x11 from 0 (the mean value under the normal model) to −6 in a sample of size 200. The data generating process is described in Section 4.1. The boxplots of the test statistic, the log10 (p-values) and the empirical power of the test are shown in Figure 1. As x11 moves away from 0, the test statistic decreases and eventually becomes smaller than the threshold of 1.96 reversing the test decision. As a corollary, we see a substantial increase of the p-values and a drastic reduction of the empirical power of the test. Therefore, a single observation out of 200 3 can change completely the decision about the presence of sample selection bias. In this paper we focus on the robustness analysis of Heckman’s two-stage procedure. Although a robustness investigation of the MLE for this model could be carried out applying standard tools of robust statistics (Salazar 2008), we feel that it is more important to perform such an analysis for Heckman’s two-stage procedure. In fact the two-stage estimator is structurally simpler, has a straightforward interpretation, and is much less computationally intensive than the MLE which explains its success in applied economic analysis. Moreover, there are numerous extensions of the classical Heckman’s model, including switching regressions, simultaneous equations with selectivity, and models with self-selectivity, to mention a few (see Section 5), where the construction of the joint likelihood becomes difficult and cumbersome whereas Heckman’s estimator can be easily computed. In the past decades, robust estimators and tests have been developed for large classes of models both in the statistical and econometric literature; see for instance, the books by Huber (1981, 2nd edition by Huber and Ronchetti 2009), Hampel et al. (1986), Maronna et al. (2006) in the statistical literature and Peracchi (1990, 1991), Ronchetti and Trojani (2001), and Koenker (2005) in the econometric literature. However, except for a few specific contributions mentioned above, sample selection models have not received a lot of attention and no general robustness theory is available. In this paper we fill this gap by providing the following contributions to the literature. In Section 2 we investigate the robustness properties of Heckman’s estimator and its estimated variance by deriving the influence function and the change-of-variance function; see Hampel et al. (1986). These functions are used to quantify the bias of the estimator and its estimated variance respectively, when deviations from the assumed bivariate normal model are present and the true data generating process lies in a neighborhood of the assumed bivariate normal distribution. It turns out that both functions are unbounded and this implies that a small deviation can have a large impact on the bias of the estimator and on its variance. The latter 4 in turn has a large effect on the corresponding confidence intervals. Moreover, by means of these functions we provide a von Mises expansion of the test statistic for the test on SSB which confirms the results obtained in the simple illustration mentioned above. Since the classical estimation and testing procedures are not robust with respect to deviations from the assumed underlying stochastic model, we propose in Section 3 new robust estimators and a new robust test for SSB, which are the natural robust counterparts of the classical Heckman’s two-stage estimator and test of SSB. Section 4 reports the finite sample performance of the new estimators and test in a simulation setting and with an example on ambulatory expenditures data. In Section 5 we show the flexibility of our approach by discussing the use of our results for two important extensions of the basic model, namely switching regression models and simultaneous equations models with selectivity. The technical derivations and assumptions are provided in Appendix A, whereas Appendix B presents the output of the classical and robust analysis of our example obtained by using the new function ssmrob in R (R Core Team 2012), which is available from the authors. 2 Robustness Issues with Heckman’s Two-Stage Estimator In this section we present the main results concerning the two-stage estimator. We derive its influence function (IF) and its change-of-variance function (CVF) and discuss the robustness properties of the classical estimator. Moreover, we explain the connection between its IF and the asymptotic variance. Finally, we explore the robustness properties of the SSB test. We consider a parametric sample selection model {Fθ }, where θ = (β1 , β2 ) lies in Θ, a compact subset of Rp1 +p2 . Let FN be the empirical distribution function putting mass 1/N at each observation zi = (z1i , z2i ), where zji = (xji , yji ), j = 1, 2, i = 1, . . . , N , and let F be the distribution function of zi . The Heckman’s estimator is a particular case of general two-stage M-estimators, with probit MLE in the first stage and OLS in the second stage. Define two 5 statistical functionals S and T corresponding to the estimators of the first and second stage, respectively. The domain of S is a class of probability distributions on Rp1 and its range is a vector in Rp1 . The domain of T is a class of probability distributions on Rp1 +p2 and its range is a vector in Rp2 . The two-stage estimator can be expressed as a solution of the system: Z Ψ1 {(x1 , y1 ); S(F )}dF = 0, (6) Ψ2 [(x2 , y2 ); λ{(x1 , y1 ); S(F )}, T (F )]dF = 0, (7) Z where Ψ1 (·; ·) and Ψ2 (·; ·, ·) are the score functions of the first and second stage estimators, respectively. In the classical case Ψ1 (·; ·) is given by (29), and Ψ2 (·; ·, ·) is given by (25). Here λ{(x1 , y1 ); S(F )} denotes the dependence of λ on S(F ) = β1 , while T (F ) depends directly on F and indirectly on F through S(F ). 2.1 Influence Function For a given functional T (F ), the influence function (IF) is defined by Hampel (1974) as IF (z; T, F ) = lim→0 [T {(1 − )F + ∆z } − T (F )]/, where ∆z is the probability measure which puts mass 1 at the point z. In our case (1 − )F + ∆z is a contamination of the joint distribution of zi , but marginal contaminations on the components of zi can also be considered; see the comments below. The IF describes the standardized asymptotic bias on the estimator due to a small amount of contamination at the point z. An estimator is considered to be locally robust if small departures from the assumed distribution have only small effects on the estimator. Assume that we have a contaminated distribution F = (1 − )F + G, where G is some arbitrary distribution function. Using a von Mises (1947) expansion, we can approximate the statistical functional T (F ) at the assumed distribution F as Z T (F ) = T (F ) + IF (z; T, F )dG + o() 6 (8) and the maximum bias over the neighborhood described by F is approximately sup kT (F ) − T (F )k ∼ = sup kIF (z; T, F )k. z G Therefore, a condition for (local) robustness is a bounded IF with respect to z, which means that if the IF (·; ·, ·) is unbounded then the bias of the estimator can become arbitrarily large. Notice also that (8) can be used to approximate numerically the bias of the estimator T at a given underlying distribution F for a given G. The next proposition gives the influence function of Heckman’s two-stage estimator. Proposition 1. For the model (1)-(4), the IF of the Heckman’s two-stage estimator is −1 ( Z x2 x2 xT2 λx2 T (y2 − x2 β2 − λβλ ) y1 y1 dF IF (z; T, F ) = λ λ2 λxT2 ) Z 0 x2 β λ + y1 λ dF · IF (z; S, F ) , λβλ (9) where Z IF (z; S, F ) = −1 φ(xT1 β1 )2 x1 xT1 φ(xT1 β1 )x1 T dF {y1 − Φ(x1 β1 )} . Φ(xT1 β1 ){1 − Φ(xT1 β1 )} Φ(xT1 β1 ){1 − Φ(xT1 β1 )} (10) Proofs and derivations of all results are given in Appendix A. The first term of (9) is the score function of the second stage and it corresponds to the IF of a standard OLS regression. The second term contains the IF of the first stage estimator. Clearly, the first term is unbounded with respect to y2 , x2 and λ. Notice that the function λ is unbounded from the left, and it tends to zero from the right (the solid line in Figure 2). From (10) we can see that the second term is also unbounded, which means that there is a second source of unboundedness arising from the selection stage. Therefore, the estimator fails to be locally robust. A small amount of contamination is enough for the estimator to become arbitrarily biased. In Section 3 we will present two ways to construct a two-stage estimator with a bounded influence function. 7 2.2 Asymptotic Variance and Change-of-Variance Function The expression of the asymptotic variance for the two-stage estimator has been derived by Heckman (1979), and later corrected by Greene (1981). Duncan (1987) suggested another approach to derive the asymptotic variance using the M-estimation framework. Using the result in Hampel et al. (1986), the general expression of the asymptotic variance is given by Z V (T, F ) = IF (z, T, F ) · IF (z, T, F )T dF (z). Specifically, denote the components of the IF as follows: xT2 β2 x2 λ a(z) = (y2 − , − λβλ ) Z 0 x2 β λ b(z) = λ dF · IF (z; S, F ), λβλ Z x2 xT2 λx2 M (Ψ2 ) = dF . λxT2 λ2 (11) Then the expression of the asymptotic variance of Heckman’s two-stage estimator is V (T, F ) = M (Ψ2 ) −1 Z a(z)a(z)T + a(z)b(z)T + b(z)a(z)T + b(z)b(z)T dF (z) M (Ψ2 )−1 . After integration and some simplifications we obtain the asymptotic variance of the classical estimator V β2 βλ ,F T −1 = (X X) σ22 {X T (I βλ2 2 T T − 2 ∆)X} + βλ X ∆X1 · Var(S, F ) · X1 ∆X (X T X)−1 , σ2 where ∆ is a diagonal matrix with elements δii = ∂λ(x1i β1 ) ∂(x1i β1 ) and Var(S, F ) denotes the asymptotic variance of the probit MLE. Robustness issues are not limited to the bias of the estimator, but concern also the stability of the asymptotic variance. Indeed, the latter is used to construct confidence intervals for the parameters and we want the influence of small deviations from the underlying distribution on their coverage probability and length to be bounded. Therefore, we investigate the behavior of the asymptotic variance of the estimator under a contaminated distribution F and derive 8 the CVF, which reflects the influence of a small amount of contamination on the asymptotic variance of the estimator. These results will be used in Section 2.3 to investigate the robustness properties of the SSB test. The CVF of an M-estimator T at a distribution F is defined by the matrix CV F (z; T, F ) = (∂/∂)V {T, (1−)F +∆z } , for all z where this expression exists; see Hampel et al. (1981) =0 and Genton and Rousseeuw (1995). As in (8) a von Mises (1947) expansion of log V (T, F ) at F gives Z CV F (z; T, F ) dG . V (T, F ) ∼ = V (T, F ) exp V (T, F ) (12) If the CV F (z; T, F ) is unbounded then the variance can behave unpredictably (arbitrarily large or small); see Hampel et al. (1986, p. 175). Similarly to the approximation of the bias, using (12) one can obtain the numerical approximation of the variance of the estimator T at a given underlying distribution (1 − )F + G for a given G. Proposition 2. The CVF of the Heckman’s two-stage estimator is given by −1 Z x2 xT2 λx2 λ2 λxT2 y1 V CV F (z; S, T, F ) = V − M (Ψ2 ) DH dF + Z −1 + M (Ψ2 ) AH a(z)T + AH b(z)T + BH b(z)T dF M (Ψ2 )−1 Z −1 T + M (Ψ2 ) a(z)ATH + b(z)ATH + b(z)BH dF M (Ψ2 )−1 + M (Ψ2 )−1 a(z)a(z)T + a(z)b(z)T + b(z)a(z)T + b(z)b(z)T M (Ψ2 )−1 Z x2 xT2 λx2 −V DH dF + y1 M (Ψ2 )−1 , (13) λxT2 λ2 where V denotes the asymptotic variance of the Heckman (1979) two-stage estimator, M (Ψ2 ) is defined by (11), AH = ∂ a(z), ∂ BH = ∂ b(z), ∂ and DH = ∂ ∂ Ψ (z; λ, θ). ∂ ∂θ 2 Explicit expressions for these terms are given in Appendix A. The CVF has several sources of unboundedness. The first line of (13) contains the derivative of the score function Ψ2 (·; ·, ·) with respect to the parameter which is unbounded. The same holds for the fifth line. Finally, in the fourth line there are two terms depending on the score 9 functions of two estimators which are unbounded. Clearly, the CVF is unbounded, which means that the variance can become arbitrarily large. Taking into account that the two-stage estimator by definition is not efficient, we can observe a combined effect of inefficiency with non-robustness of the variance estimator. These problems can lead to misleading p-values and incorrect confidence intervals. Second order effects in the von Mises expansion are discussed in general in La Vecchia et al. (2012). 2.3 Sample Selection Bias Test Heckman (1979) proposed to test for the selection bias using the standard t-test of the coefficient βλ . Melino (1982) showed that this test is equivalent to a Lagrange multiplier test and has desirable asymptotic properties. Several other proposals are available in the literature, see e.g. Vella (1992), but the simple Heckman’s test is the most widely used by applied researchers. Here we investigate the effect of contamination on this test statistic τn = √ βˆλ np . V (βλ , F ) Using the expressions for the IF and CVF of the estimator and its asymptotic variance, we obtain the von Mises expansion of the test statistic: " # T (F ) IF (z; T, F ) 1 CV F (z, T, F ) T (F ) p =p + p + T (F ) + remainder, 2n V (F )/n V (F )/n V (F )/n {V (F )/n}5/2 which provides an approximation of the bias of the test statistic under contamination. It is clear that the IF of the test depends on the IF and CVF of the estimator. Hence, the IF of the test statistic is also unbounded. Since, according to Hampel et al. (1986, p. 199), the IFs of the level and of the power of the test are proportional to the IF of the test statistic, the test is not robust. Moreover, because Heckman’s two-stage estimator suffers from a lack of efficiency, small deviations from the model can enhance this effect and increase the probability of type I and type II errors of the SSB test. Notice however that the term containing the CVF is of higher order, which means that the influence of the contamination on the test statistic is mostly explained by the IF of the 10 corresponding estimator. Hence, for practical purposes we need to have at least a robust estimator with a bounded IF with an additional bonus if the CVF is bounded as well. 3 Robust Estimation and Inference In this section we suggest how to robustify the two-stage estimator and propose a simple robust alternative to the SSB test. 3.1 Robust Two-Stage Estimator From the expression of the IF in (9), it is natural to construct a robust two-stage estimator by robustifying the estimators in both stages. The idea is to obtain an estimator with bounded bias in the first stage, then compute λ, which will transfer potential leverage effects from the first stage to the second, and use the robust estimator in the second stage, which will correct for the remaining outliers. Consider the two-stage M-estimation framework given by (6) and (7). We can obtain a robust estimator by bounding both score functions. In the first stage, we construct a robust probit estimator following the idea of Cantoni and Ronchetti (2001). We use a general class of M-estimators of Mallows’ type, where the influence of deviations on y1 and x1 are bounded separately. The estimator is defined by the following score function: T ΨR 1 {z1 ; S(F )} = ν(z1 ; µ)ω1 (x1 )µ − α(β1 ), where α(β1 ) = 1 n Pn i=1 (14) E{ν(z1i ; µi )}ω1 (x1i )µTi is a term to ensure the unbiasedness of the estimating function with the expectation taken with respect to the conditional distribution of y|x, ν(·|·), ω1 (x1 ) are weight functions defined below, and µi = µi (z1i , β1 ) = Φ(xT1i β1 ). The weight functions are defined by ν(z1i ; µi ) = ψc1 (ri ) where ri = y1i −µi V 1/2 (µi ) 1 V 1/2 (µ i) , are Pearson residuals and ψc1 is the Huber function defined by r, |r| ≤ c1 , ψc1 (r) = c1 sign(r), |r| > c1 . 11 (15) The tuning constant c1 is chosen to ensure a given level of asymptotic efficiency at the model. √ A simple choice of the weight function ω1 (·) is ω1i = 1 − Hii , where Hii is the ith diagonal element of the hat matrix H = X(X T X)−1 X T . More sophisticated choices for ω1 are available, e.g. the inverse of the robust Mahalanobis distance based on high breakdown robust estimators of location and scatter of the x1i . For the probit case we have that µi = Φ(xT1i β1 ), V (µi ) = Φ(xT1i β1 ){1 − Φ(xT1i β1 )} and hence the quasi-likelihood estimating equations are n X ψc1 (ri )ω1 (x1i ) i=1 1 φ(xT1i β1 )x1i − α(β1 ) T [Φ(x1i β1 ){1 − Φ(xT1i β − 1)}]1/2 = 0, and E{ψc1 (ri )} in the α(β1 ) term is equal to E ψc1 y1i − µi V 1/2 (µi ) = ψc1 −µi 1/2 V (µi ) 1 − Φ(xT1i β1 ) + ψc1 1 − µi V 1/2 (µi ) Φ(xT1i β1 ). This estimator has a bounded IF and ensures robustness of the first estimation stage. To obtain a robust estimator for the equation of interest (second stage) we propose to use an M-estimator of Mallows-type with the following Ψ-function: T ΨR 2 (z2 ; λ, T ) = Ψc2 (y2 − x2 β2 − λβλ )ω(x2 , λ)y1 , (16) where Ψc2 (·) is the classical Huber function from (15), but with a different tuning constant c2 , ω(·) is the weight function, which can also be based on the robust Mahalanobis distance d(x2 , λ), e.g. ω(x2 , λ) = x2 , x2 c m , d(x2 ,λ) if d(x2 , λ) < cm , if d(x2 , λ) ≥ cm . We summarize the result in the following proposition. Proposition 3. Under the assumptions stated in Appendix A, Heckman’s two-stage estimator defined by (14) and (16) is robust, consistent, and asymptotically normal, with asymptotic variance given by: V (T, F ) = M −1 ΨR 2 Z {aR (z)aR (z)T + bR (z)bR (z)T }dF M ΨR 2 12 −1 , (17) R where M ΨR 2 = − ∂ ΨR (z; λ, T )dF , ∂β2 2 Z bR (z) = aR (z) = ΨR 2 (z; λ, T ), and ∂ R 0 Ψ2 (z; λ, T )λ dF ∂λ Z ∂ R Ψ (z; S)dF ∂β1 1 −1 ΨR 1 (z; S). The asymptotic variance of the robust estimator has the same structure as that of the classical Heckman’s estimator. Its computation can become complicated, depending on the choice of the score function, but for simple cases, e.g. Huber function, it is relatively simple. 3.2 Robust Inverse Mills Ratio Often the outliers appear only in one or a few components of the observation z = (x1 , y1 , x2 , y2 ). In these cases the use of robust estimators in both stages is not necessary. If outliers are in y2 and/or x2 , then a robust estimator for the equation of interest is needed. If outliers are in x1 and y1 = 0, then a robust estimator of the probit model is needed. The most complicated case is when a leverage point x1 with y1 = 1 in the selection equation is transferred to the equation of interest through the exogenous variable λ, a nonlinear transformation of xT1 β1 . In this case a natural solution is to bound the influence of the outliers that come into the main equation when computing λ. Since λ is unbounded and approximately linear in the predictor xT1 β1 from the left (see Figure 2), we can transform the linear predictor in such a way that it becomes bounded, ensuring the boundedness of λ, and hence the boundedness of the IF of the final estimator. To achieve this we rewrite the linear predictor xT1 β1 = Φ−1 {y1 − rV 1/2 (µ)}, where r = (y1 − µ)/V 1/2 (µ) and µ = Φ(xT1 β1 ). Then, using the bounded function ψc1 from (15), we obtain the bounded linear predictor η(xT1 β1 ) = Φ−1 y1 − ψc1 (r)V 1/2 (µ) , (18) and (18) ensures the boundedness of the inverse Mills ratio. It also has the advantage to avoid introducing an additional weight to bound directly λ, which would increase the complexity of the estimator. Note that, if c1 → ∞, then η = xT1 β1 . The classical and robust versions of the 13 inverse Mills ratio are plotted in Figure 2. Depending on the tuning parameter c1 of the robust probit, the influence of large linear predictors can be reduced, which ensures the robustness of the estimator. 3.3 Robust Sample Selection Bias Test To test SSB, i.e. H0 : βλ = 0 vs HA : βλ 6= 0, we simply propose to use a t-test based on the robust estimator of βλ and the corresponding estimator of its standard error derived in Section 3.1, where the latter is obtained by estimating (17). R −1 −1 aR (z)aR (z)T dF M (ΨR The first term of (17), M (ΨR 2 ) , is similar to the asymptotic 2) variance of standard linear regression, but with heteroscedasticity. Therefore, we use the Eicker (1967) - Huber (1967) - White (1980) heteroscedasticity-consistent variance estimator, i.e. we ˆ (ΨR )−1 1 P a ˆ (ΨR ) and a ˆ (ΨR )−1 , where M estimate this first term by M ˆR (z) are ˆ(zi )ˆ a(zi )T M 2 2 2 n the sample versions of M and aR (z), respectively. −1 The second term of the asymptotic variance, M (ΨR 2) R −1 bR (z)bR (z)T M (ΨR 2 ) , is the asymptotic variance of the probit MLE pre- and post-multiplied by the constant matrix, which depends on the form of the score function of the second stage. Thus, a consistent estimator is ˆ (ΨR )−1 1 M 2 n where 4 4.1 ∂ΨR 2i ∂β1 X = ˆbR (zi )ˆbR (zi )T M ˆ (ΨR )−1 = M ˆ (ΨR )−1 1 2 2 n X ∂ΨR 2i ∂β1 ˆ Var(S, F) X T 1 ∂ΨR 2i ˆ (ΨR )−1 , M 2 n ∂β1 ∂Ψ2 (z2i ;λ,T (F )) ∂λ(z1i ;S(F )) . ∂λ ∂β1 Numerical Examples Simulation Study Consider the model described in Section 1.1. We carry out a Monte Carlo simulation study to illustrate the robustness issues in this model and compare different estimators. In our experiment we generate x1 ∼ N (0, 1), x2 ∼ N (0, 1), and the errors e1 and e2 from a bivariate normal distribution with expectation zero, σ1 = σ2 = 1, and ρ = 0.5, which gives βλ = 0.5. The degree of censoring is controlled by the intercept in the selection equation, denoted by β10 14 and set to 0, which corresponds to 50% of censoring. Results (not shown here) are similar for other censoring proportions such as 75% and 25%. The intercept in the equation of interest is β20 = 0. The slope coefficients are β11 = β21 = 1. We find the estimates of β1 and β2 without contamination and with two types of contamination. In the first scenario we contaminate x1 when the corresponding y1 = 0. We generate observations from the model described above and replace them with probability = 0.01 by a point mass at (6, 0, 1, 1), corresponding to (x1 , y1 , x2 , y2 ). In this case we study the effect of leverage outliers when they are not transferred to the main equation. In the second scenario we contaminate x1 when the corresponding y1 = 1. We use the same type of contamination as in the first scenario, but the point mass is at (−6, 1, 1, 1). This is the most complicated case, because the outliers influence not only the first estimation stage, but also the second stage through λ. The sample size is N = 200 and we repeat the experiment 500 times. We do not consider here the cases when only the second stage is contaminated because this is essentially the situation of standard linear regression which has been studied extensively; see e.g. Hampel et al. (1986) and Maronna et al. (2006). In Table 1 we present the results of the estimation of the first stage using a classical probit MLE and a robust probit M-estimator. Under the model we see that the robust estimator is less efficient, but in the presence of a small amount of contamination it remains stable, both for bias and for variance. The classical estimator is clearly biased and becomes much less efficient. In Table 2 we present the results of the two-stage procedure for four different estimators. We compare the classical estimator, robust two-stage (robust 2S) from Section 3.1, robust inverse Mills ratio (IMR) from Section 3.2, and the estimator using only the robust probit in the first stage and OLS in the second. First of all, notice that all the estimators perform well without contamination. The loss of efficiency for the robust versions is reasonable and the bias is close to zero. Obviously, under contamination the classical estimator breaks down. This effect can be seen in Figure 3. The boxplots correspond to the four estimators described above. The classical estimator is denoted by (a), the robust probit with OLS by (b), the robust 15 IMR by (c), and the robust 2S by (d). In the case when the outlier is not transferred to the equation of interest (Figure 3 top panel) it is enough to use a robust probit, but when the outlier emerges in the equation of interest (Figure 3 bottom panel), a robust estimation of the second stage is necessary. The robust IMR and the robust two-stage estimators are both stable regardless of the presence or absence of outliers in λ or x1 . The robust IMR estimator has smaller variance than the robust two-stage, but in the case of outliers in y2 or x2 it will require a robust estimator in the second stage anyhow. So this estimator cannot be considered as a self-contained estimator, but it is a useful tool in some situations. 4.2 Ambulatory Expenditures Data To further illustrate the behavior of our new robust methodology, we consider the data on ambulatory expenditures from the 2001 Medical Expenditure Panel Survey analyzed by Cameron and Trivedi (2009, p. 545). The data consist of 3,328 observations, where 526 (15.8%) correspond to zero expenditures. The distribution of the expenditures is skewed, so the log scale is used. The selection equation includes such explanatory variables as age, gender (female), education status (educ), ethnicity (blhisp), number of chronic diseases (totchr ), and the insurance status (ins). The outcome equation holds the same variables. The exclusion restriction could be introduced by means of the income variable, but the use of this variable for this purpose is arguable. All these variables are significant for the decision to spend, and all except education status and insurance status are significant for the spending amount. The results of the estimation obtained using the R package selection are reported in the Appendix B. The p-value of the SSB t-test is 0.0986 which is not significant at the 5% level. The possible conclusion of no selection bias seems to be doubtful. The estimation of this model using the joint MLE returns the p-value of the Wald test equal to 0.38. Such behavior of the classical MLE is not surprising due to the fact that it uses a stronger distributional assumption than Heckman’s estimator, and is even more sensitive to the presence of deviations from this assumption. Using the robust two-stage estimator, we obtained results similar to Cameron and Trivedi’s 16 analysis using the classical Heckman’s estimator. The output is reported in the Appendix B. For all the variables the differences are not dramatic, except for the inverse Mills ratio (IMR) parameter. The robust estimator returns βˆRIM R = −0.6768, compared to the classical βˆIM R = −0.4802. The SSB test is highly significant with a p-value p = 0.009, which leads to the intuitive conclusion of the presence of SSB. 5 Discussion and Extensions We introduced a framework for robust estimation and testing for sample selection models. These methods allow to deal with data deviating from the assumed model and to carry out reliable inference even in the presence of deviations from the assumed normality model. Monte Carlo simulations demonstrated the good performance of the robust estimators under the model and with different types of contamination. Although we focused on the basic sample selection model, our methodology can be easily extended to more complex models. Here we briefly explore two such extensions. 5.1 Switching Regression Model A natural extension of Heckman’s model is the switching regression model or “Tobit type-5 model”. In this case we have two regimes, and the regime depends on the selection process. This model consists of three equations: ∗ y1i = xT1i β1 + e1i , ∗ y21i = xT21i β21 + e2i , ∗ y22i = xT22i β22 + e3i , where the error terms follow a multivariate normal distribution. The observed variables are ∗ y1i = I(y1i > 0) and y2i = ∗ y21i , ∗ y22i , 17 if y1i = 1, if y1i = 0. The model can be estimated by a two-stage procedure with the following switching regressions: y2i = xT21i β21 + βλ1 λi + v1i , ˜ i + v2i , xT22i β22 − βλ2 λ if y1i = 1, if y1i = 0, (19) ˜ i = λi (−xT β1 ). This system can be estimated as two independent where λi = λi (xT1i β1 ) and λ 1i linear models. The IF for the first regression in (19) is exactly the same as that in the basic Heckman’s model and is given by (9). For the second regression in (19), a slight modification is required. ˜ λ0 by The IF for the second regression is given by (9), where λ is replaced by λ, 2 T T T T ˜ 0 = −{1 − Φ(x1 β1 )}φ(x1 β1 )x1 β1 + φ(x1 β1 ) xT , λ 1 2 {1 − Φ(xT1 β1 )} and y1 by 1 − y1 . ˜ is unbounded from the right and therefore the conclusion about the roObviously, the λ bustness properties remains the same. A robust estimator can be easily obtained using the procedure described in Section 3.1. 5.2 Simultaneous Equations Models With Selectivity Lee et al. (1980) derived the asymptotic covariance matrices of two-stage probit and two-stage Tobit methods for simultaneous equations models with selectivity: Ii =xTi δ − ei , (20) B1 Y1i + Γ1 Z1i =e1i if Ii > 0, (21) B2 Y2i + Γ2 Z2i =e2i if Ii ≤ 0, (22) where Y1i and Y2i are vectors of p1 and p2 endogenous variables, Z1i and Z2i are vectors of q1 and q2 exogenous variables, B1 and B2 are p1 × p1 and p2 × p2 matrices, Γ1 is a p1 × q1 matrix, Γ2 is a p2 × q2 matrix, e1i is a p1 vector of residuals, e2i is a p2 vector of residuals, and x is an m vector, consisting of some or all exogenous variables Zji and another additional exogenous variables, and δ is an m vector of parameters. 18 The residuals e1 , e2 and e follow a multivariate normal distribution with mean 0 and covariance matrix Σ11 Σ12 Σ1e Σ22 Σ2e . Σ= 1 The estimation procedure is explained in Lee et al. (1980). We only mention that in the first stage the equation (20) is estimated using a probit analysis, and then the reduced forms of equations (21) and (22) are estimated using OLS with additional variables based on the inverse Mills ratio. In the third stage the structural parameters are estimated using the second stage of two-stage least squares (2SLS). In spite of the complexity of this model, it fits our framework. It is straightforward to show that the IF of the two-stage estimator of this model is unbounded. A robust estimator consists of a robust probit on the first stage with a robust 2SLS. The robust probit is given in Section 3.1, and the robustness of 2SLS has been studied in Zhelonkin et al. (2012) and Kim and Muller (2007). A Appendix: Technical Derivations A.1 IF of the Heckman’s two-stage estimator Proof of Proposition 1. The Heckman’s estimator belongs to the class of two-stage M-estimators and satisfies the conditions required in Zhelonkin et al. (2012). The IF of the general Mestimator has the following form: IF (z; T, F ) = M (Ψ2 )−1 Ψ2 [(x2 , y2 ); h{(x1 , y1 ); S(F )}, T (F )] Z + where M (Ψ2 ) = − R ! ∂ ∂ Ψ2 {(x2 , y2 ); θ, T (F )} h{(x1 , y1 ); η}dF · IF (z; S, F ) , ∂θ ∂η ∂ Ψ [(x2 , y2 ); h{(x1 , y1 ); S(F )}, ξ]dF ∂ξ 2 (23) and IF (z; S, F ) denotes the IF of the first stage. Note that in (23) T and S denote general M-estimators. To find the IF of the Heckman’s estimator we need to identify the derivatives in (23). The 19 h(·; ·) function is the inverse Mills ratio and its derivative with respect to β1 is: 2 −Φ(xT1 β1 )φ(xT1 β1 )xT1 β1 − φ(xT1 β1 ) T ∂ x1 . λ{(x1 , y1 ); η} = λ = 2 ∂η Φ(xT1 β1 ) 0 (24) The score function of the second stage is Ψ2 [(x2 , y2 ); λ{(x1 , y1 ); S(F )}, T (F )] = (y2 − xT2 β2 − λβλ ) x2 λ y1 . (25) The M (Ψ2 ) matrix is given by Z Z ∂ x2 xT2 λx2 − Ψ2 [(x2 , y2 ); λ{(x1 , y1 ); S(F )}, ξ] dF = − y1 dF. λ2 λxT2 ∂ξ (26) The derivative with respect to the second parameter of Ψ2 is ∂ Ψ2 {(x2 , y2 ); θ, T (F )} = ∂θ 0 y2 − xT2 β2 − λβλ y1 − x2 λ βλ y1 . (27) After taking the expectation, the first term of the righthand side of (27) becomes zero. The IF (z; S, F ) = M (Ψ1 )−1 Ψ1 {(x1 , y1 ); S(F )} is the IF of the MLE estimator for probit regression, where: Z M (Ψ1 ) = φ(xT1 β1 )2 x1 xT1 dF, T T Φ(x1 β1 ){1 − Φ(x1 β1 )} Ψ1 ((x1 , y1 ); S(F )) ={y1 − Φ(xT1 β1 )} φ(xT1 β1 ) x1 . Φ(xT1 β1 ){1 − Φ(xT1 β1 )} (28) (29) Inserting the expressions in formulas (24)-(29) in (23) we obtain the IF of Heckman’s estimator. 20 A.2 CVF of the Heckman’s two-stage estimator Proof of Proposition 2. The expression for the CVF of the general two-stage M-estimator (Zhelonkin et al. 2012) is given by Z ∂ CV F (z; S, T, F ) = V (T, F ) − M (Ψ2 ) DdF (z) + Ψ2 [z2 ; h{z1 ; S(F )}, θ] V (T, F ) ∂θ Z + M (Ψ2 )−1 Aa(z)T + Ba(z)T + Ab(z)T + Bb(z)T dF (z)M (Ψ2 )−1 Z −1 + M (Ψ2 ) a(z)AT + b(z)AT + a(z)B T + b(z)B T dF (z)M (Ψ2 )−1 −1 + M (Ψ2 )−1 a(z)a(z)T + a(z)b(z)T + b(z)a(z)T + b(z)b(z)T M (Ψ2 )−1 Z ∂ DdF (z) + Ψ2 [z2 ; h{z1 ; S(F )}, θ] M (Ψ2 )−1 , − V (T, F ) (30) ∂θ where D is a matrix with elements Dij = ∂ ∂Ψ2i (z2 ; h, θ) ∂h ∂θj T ∂h(z1 ; s) IF (z; S, F ) + ∂s ∂ ∂Ψ2i (z2 ; h, θ) ∂θ ∂θj T IF (z; T, F ). Here S and T denote general M-estimators. The vector a(z) denotes the score function of the second stage and its derivative is given by A= ∂ ∂ ∂ Ψ2 {z2 ; h, T (F )} h(z1 ; s)IF (z; S, F ) + Ψ2 [z2 ; h{z1 ; S(F )}, θ] · IF (z; T, F ). ∂h ∂s ∂θ The vector b(z) = R ∂ ∂ Ψ (z ; θ, T (F )) ∂η h(z1 ; η)dF (z) ∂θ 2 2 · IF (z; S, F ), and its derivative B has the following form Z Z ∂ ∂ B= R h(z1 ; s)dF IF (z; S, F ) + Ψ2 {z2 ; h, T (F )}R(2) dF IF (z; S, F ) ∂s ∂h Z Z ∂ ∂ ∂ −1 (1) − Ψ2 {z2 ; h, T (F )} h(z1 ; s)dF M (Ψ1 ) D dF + Ψ1 (z1 ; θ) IF (z; S, F ) ∂h ∂s ∂θ Z ∂ ∂ ∂ + Ψ2 {z2 ; h, T (F )} h(z1 ; s)dF M (Ψ1 )−1 Ψ1 (z1 ; θ)IF (z; S, F ) ∂h ∂s ∂θ ∂ ∂ + Ψ2 {z2 ; h, T (F )} h(z1 ; s) · IF (z; S, F ), (31) ∂h ∂s (1) where R(1) is the matrix with elements (1) Rij ∂ ∂Ψ2i {z2 ; h, T (F )} = ∂h ∂hj T ∂ h(z1 ; s)IF (z; S, F ) + ∂s 21 ∂ ∂Ψ2i (z2 ; h, θ) ∂θ ∂hj T IF (z; T, F ), (2) R(2) is the matrix with elements Rij = n ∂ ∂hi (z1 ;s) ∂s ∂sj oT IF (z; S, F ), and D(1) is an analog of the matrix D, but reflects only the first estimation stage, such that the elements of the matrix are T (1) ∂ ∂Ψ1i Dij = ∂θ IF (z; S, F ). ∂θj Now we can proceed with the computation of the CVF for the Heckman’s estimator. Recall that the function h{z1 ; S(F )} is a scalar function λ(xT1 β1 ) and its derivative with respect to β1 is given by (24). The IF’s of T (F ) and S(F ) are given by (9) and (10), respectively. The M (Ψ1 ) and M (Ψ2 ) matrices are given by (28) and (26), respectively and the scores Ψ1 {z; S(F )} and Ψ2 {z; λ, T (F )} are given by (29) and (25). The derivative of the score function with respect to λ is given in the formula (27). The second term of the matrix D is equal to zero, hence the D matrix for Heckman’s estimator takes the form DH = 0 x2 xT2 2λ 0 λ y1 IF (z; S, F ), where 0 is a (p2 − 1) × (p2 − 1) matrix of zeroes. Denote the analog of the A matrix for the Heckman’s estimator as AH . we obtain 0 x2 xT2 x2 λ −x2 βλ y1 IF (z; T, F ). y1 λ IF (z; S, F ) + AH = xT2 λ λ2 y2 − xT2 β2 − 2λβλ (1) (2) Next, we need to get the expression of BH . Define RH and RH corresponding to the R(1) and R(2) matrices : (1) RH = 0 −2βλ 0 y1 λ IF (z; S, F ) + 0 −x2 −x2 −2λ y1 IF (z; T, F ), where 0 in the first term is a (p2 − 1) vector, and in the second term 0 is a (p2 − 1) × (p2 − 1) (2) matrix. Let us compute RH : ∂ ∂ ∂ 0 (2) RH = λ = λ ∂ ∂s ∂ =0 =0 #T " T T −φ(x1 β1 )x1 β1 + Φ(xT1 β1 )(xT1 β1 )2 − Φ(xT1 β1 ) + 2φ(xT1 β1 ) φ(xT1 β1 ) = x1 xT1 IF (z; S, F ) Φ(xT1 β1 )2 " #T 2φ(xT1 β1 )2 −Φ(xT1 β1 )xT1 β1 − φ(xT1 β1 ) − x1 xT1 IF (z; S, F ) . Φ(xT1 β1 )3 22 The expression above is a p1 vector. The D(1) matrix for the probit estimator is: ∂ ∂ Ψ1 (z; θ) Dprobit = ∂ ∂θ =0 T −φ(x1 β1 )2 xT1 β1 T x IF (z; S, F ) x1 xT1 = Φ(xT1 β1 ){1 − Φ(xT1 β1 )} 1 φ(xT1 β1 )3 {1 − 2Φ(xT1 β1 )} T x IF (z; S, F ) x1 xT1 , − Φ(xT1 β1 )2 {1 − Φ(xT1 β1 )}2 1 where we first take the derivative with respect to θ, and when we take the derivative with respect to , we evaluate the second derivative with respect to θ at θ = S(F ). We defined all the ingredients of the BH matrix, and we can use them in (31). In order to obtain the expression of the CVF we need to insert the expressions obtained T are equal to zero, we above into (30). By noting that the integrals of BH a(z)T and a(z)BH finally obtain (13). A.3 Assumptions and Proof of Proposition 3 T T R . Assume the following conditions, which have Denote ΨR (z; θ) = ΨR 1 (z; β1 ) , Ψ2 (z; β1 , β2 ) been adapted from Duncan (1987): (i) z1 , . . . , zN is a sequence of independent identically distributed random vectors with distribution F defined on a space Z; (ii) Θ is a compact subset of Rp1 +p2 ; (iii) R ΨR (z; θ)dF = 0 has a unique solution, θ0 , in the interior of Θ; (iv) ΨR (z; θ) and ∂ ΨR (z; θ) ∂θ are measurable for each θ in Θ, continuous for each z in Z, and there exist F -integrable functions ξ1 and ξ2 such that for all θ ∈ Θ and z ∈ Z ∂ |ΨR (z; θ)ΨR (z; θ)T | ≤ ξ1 and | ∂θ ΨR (z; θ)| ≤ ξ2 ; (v) R ΨR (z; θ)ΨR (z; θ)T dF is non-singular for each θ ∈ Θ; (vi) R ∂ ΨR (z; θ0 )dF ∂θ is finite and non-singular. 23 Proof of Proposition 3. Consistency and asymptotic normality follow directly from Theorems 1-4 in Duncan (1987). The asymptotic variance consists of two terms, because aR (z)bR (z)T and bR (z)aR (z)T vanish after integration, due to the independence of the error terms. B Appendix: R output for the numerical results B.1 Classical two-stage estimator -------------------------------------------------------------2-step Heckman / heckit estimation Probit selection equation: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.71771 0.19247 -3.729 0.000195 *** age 0.09732 0.02702 3.602 0.000320 *** female 0.64421 0.06015 educ 0.07017 0.01134 6.186 6.94e-10 *** blhisp -0.37449 0.06175 -6.064 1.48e-09 *** totchr 0.79352 0.07112 11.158 ins 0.18124 0.06259 10.710 < 2e-16 *** < 2e-16 *** 2.896 0.003809 ** Outcome equation: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.30257 0.29414 18.028 < 2e-16 *** age 0.20212 0.02430 8.319 < 2e-16 *** female 0.28916 0.07369 3.924 8.89e-05 *** educ 0.01199 0.01168 1.026 blhisp -0.18106 0.06585 -2.749 0.006 ** totchr 0.49833 0.04947 10.073 < 2e-16 *** -0.04740 0.05315 -0.892 ins 24 0.305 0.373 Error terms: Estimate Std. Error t value Pr(>|t|) invMillsRatio sigma rho -0.4802 0.2907 -1.652 0.0986 . 1.2932 NA NA NA -0.3713 NA NA NA --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 -------------------------------------------------------------- B.2 Robust two-stage estimator -------------------------------------------------------------Robust 2-step Heckman / heckit M-estimation Probit selection equation: Coefficients: Estimate Std. Error z-value Pr(>|z|) (Intercept) -0.74914 0.19507 -3.840 0.000123 *** age 0.10541 0.02770 3.806 0.000141 *** female 0.68741 0.06226 educ 0.07012 0.01147 6.116 9.62e-10 *** blhisp -0.39775 0.06265 -6.349 2.17e-10 *** totchr 0.83284 0.08028 10.374 ins 0.18256 0.06371 11.041 < 2e-16 *** < 2e-16 *** 2.865 0.004166 ** Outcome equation: Value Std. Error t value (Intercept) 5.40154 0.27673 19.5192 *** age 0.20062 0.02451 8.1859 *** female 0.25501 0.06992 3.6467 *** 25 educ 0.01325 0.01162 1.1405 blhisp -0.15508 0.06507 -2.3835 * totchr 0.48116 0.03822 12.5861 *** ins -0.06707 0.05159 -1.2999 IMR -0.67676 0.25928 -2.6102 ** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 -------------------------------------------------------------- References Ahn, H. and Powell, J. L. (1993), “Semiparametric Estimation of Censored Selection Models with a Nonparametric Selection Mechanism,” Journal of Econometrics, 58, 3–29. Amemiya, T. (1984), “Tobit Models: a Survey,” Journal of Econometrics, 24, 3–61. Cameron, C. A. and Trivedi, P. K. (2009), Microeconometrics Using Stata, College Station, TX: Stata Press. Cantoni, E. and Ronchetti, E. (2001), “Robust Inference for Generalized Linear Models,” Journal of the American Statistical Association, 96, 1022–1030. Das, M., Newey, W. K., and Vella, F. (2003), “Nonparametric Estimation of Sample Selection Models,” Review of Economic Studies, 70, 33–58. Duncan, G. M. (1987), “A Simplified Approach to M-Estimation with Application to TwoStage Estimators,” Journal of Econometrics, 34, 373–389. Eicker, F. (1967), “Limit Theorems for Regression with Unequal and Dependent Errors,” in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, L.M. LeCam, J. Neyman (Eds.), Berkeley: University of California Press, 59–82. Genton, M. G. and Rousseeuw, P. J. (1995), “The Change-of-Variance Function of MEstimators of Scale under General Contamination,” Journal of Computational and Applied Mathematics, 64, 69–80. Greene, W. H. (1981), “Sample Selection Bias as a Specification Error: Comment,” Econometrica, 49, 795–798. Hampel, F. (1974), “The Influence Curve and Its Role in Robust Estimation,” Journal of the American Statistical Association, 69, 383–393. Hampel, F., Ronchetti, E. M., Rousseeuw, P. J., and Stahel, W. A. (1986), Robust Statistics: The Approach Based on Influence Functions, New York: John Wiley and Sons. 26 Hampel, F., Rousseeuw, P. J., and Ronchetti, E. (1981), “The Change-of-Variance Curve and Optimal Redescending M-Estimators,” Journal of the American Statistical Association, 76, 643–648. Heckman, J. (1979), “Sample Selection Bias as a Specification Error,” Econometrica, 47, 153– 161. Huber, P. J. (1967), “The Behavior of Maximum Likelihood Estimates under Nonstandard Conditions,” in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, L.M. LeCam, J. Neyman (Eds.), Berkeley: University of California Press, 221–233. — (1981), Robust Statistics, New York: John Wiley and Sons. Huber, P. J. and Ronchetti, E. (2009), Robust Statistics, New York: John Wiley and Sons, 2nd ed. Kim, T.-K. and Muller, C. (2007), “Two-Stage Huber Estimation,” Journal of Statistical Planning and Inference, 137, 405–418. Koenker, R. (2005), Quantile Regression, New York: Cambridge University Press. La Vecchia, D., Ronchetti, E., and Trojani, F. (2012), “Higher-Order Infinitesimal Robustness,” Journal of the American Statistical Association, to appear. Lee, L.-F., Maddala, G., and Trost, R. (1980), “Asymptotic Covariance Matrices of Two-Stage Probit and Two-Stage Tobit Methods for Simultaneous Equations Models With Selectivity,” Econometrica, 48, 491–503. Marchenko, Y. V. and Genton, M. G. (2012), “A Heckman Selection-t Model,” Journal of the American Statistical Association, 107, 304–317. Maronna, R. A., Martin, G. R., and Yohai, V. J. (2006), Robust Statistics: Theory and Methods, Chichester: John Wiley and Sons. Melino, A. (1982), “Testing for Sample Selection Bias,” Review of Economic Studies, XLIX, 151–153. Newey, W. K. (2009), “Two-Step Series Estimation of Sample Selection Models,” Econometrics Journal, 12, S217–S229. Paarsch, H. J. (1984), “A Monte Carlo Comparison of Estimators for Censored Regression Models,” Journal of Econometrics, 24, 197–213. Peracchi, F. (1990), “Bounded-Influence Estimators for the Tobit Model,” Journal of Econometrics, 44, 107–126. — (1991), “Robust M-Tests,” Econometric Theory, 7, 69–84. Puhani, P. A. (2000), “The Heckman Correction for Sample Selection and Its Critique,” Journal of Economic Surveys, 14, 53–68. 27 R Core Team (2012), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0. Ronchetti, E. and Trojani, F. (2001), “Robust Inference with GMM Estimators,” Journal of Econometrics, 101, 37–69. Salazar, L. (2008), “A Robustness Study of Heckman’s Model,” Master’s Thesis, University of Geneva, Switzerland. Stolzenberg, R. M. and Relles, D. A. (1997), “Tools for Intuition about Sample Selection Bias and Its Correction,” American Sociological Review, 62, 494–507. Toomet, O. and Henningsen, A. (2008), “Sample Selection Models in R: Package SampleSelection,” Journal of Statistical Software, 27, 1–23. Vella, F. (1992), “Simple Tests for Sample Selection Bias in Censored and Discrete Choice Models,” Journal of Applied Econometrics, 7, 413–421. — (1998), “Estimating Models with Sample Selection Bias: A Survey,” The Journal of Human Resources, 33, 127–169. von Mises, R. (1947), “On the Asymptotic Distribution of Differentiable Statistical Functions,” The Annals of Mathematical Statistics, 18, 309–348. White, H. (1980), “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity,” Econometrica, 48, 817–838. Winship, C. and Mare, R. D. (1992), “Models for Sample Selection Bias,” Annual Review of Sociology, 18, 327–350. Zhelonkin, M., Genton, M. G., and Ronchetti, E. (2012), “On the Robustness of Two-Stage Estimators,” Statistics & Probability Letters, 82, 726–732. Zuehlke, T. W. and Zeman, A. R. (1991), “A Comparison of Two-Stage Estimators of Censored Regression Models,” The Review of Economics and Statistics, 73, 185–188. 28 Table 1: Bias, Variance and MSE of the classical and robust probit MLE at the model and under two types of contamination N = 200 Classical β10 β11 Robust β10 β11 Not contaminated Bias Var MSE x1 is contaminated, y1 = 1 Bias Var MSE x1 is contaminated, y1 = 0 Bias Var MSE 0.0053 0.0092 0.0100 0.0188 0.0101 0.0189 0.0524 −0.4458 0.0084 0.0566 0.0112 0.2553 −0.0439 −0.4522 0.0084 0.0520 0.0103 0.2565 0.0062 0.0106 0.0110 0.0270 0.0111 0.0272 0.0062 0.0111 0.0111 0.0272 0.0112 0.0273 0.0061 0.0124 0.0113 0.0281 0.0113 0.0282 Table 2: Bias, Variance and MSE of the classical and robust two-stage estimators at the model and under two types of contamination N = 200 Classical β20 β21 β2λ Robust probit + OLS β20 β21 β2λ Robust IMR β20 β21 β2λ Robust 2S β20 β21 β2λ Not contaminated Bias Var MSE x1 is contaminated, y1 = 1 Bias Var MSE x1 is contaminated, y1 = 0 Bias Var MSE −0.0061 0.0053 −0.0044 0.0273 0.0092 0.0526 0.0274 0.0092 0.0526 0.1307 −0.0055 −0.2837 0.0252 0.0094 0.0353 0.0424 0.0094 0.1159 −0.3384 0.0052 0.3878 0.1812 0.0093 0.2923 0.2957 0.0093 0.4427 −0.0089 0.0053 −0.0004 0.0282 0.0092 0.0558 0.0283 0.0092 0.0558 0.1924 −0.0038 −0.3760 0.0197 0.0094 0.0295 0.0567 0.0094 0.1709 −0.0090 0.0052 −0.0012 0.0284 0.0093 0.0561 0.0285 0.0093 0.0561 −0.0107 0.0055 0.0088 0.0294 0.0092 0.0594 0.0295 0.0092 0.0595 −0.0101 0.0056 0.0100 0.0263 0.0088 0.0549 0.0264 0.0088 0.0550 −0.0108 0.0053 0.0084 0.0296 0.0093 0.0601 0.0297 0.0093 0.0602 −0.0142 0.0044 0.0035 0.0312 0.0106 0.0677 0.0314 0.0106 0.0677 0.0189 0.0029 −0.0462 0.0277 0.0105 0.0553 0.0280 0.0105 0.0586 −0.0143 0.0040 0.0033 0.0314 0.0107 0.0681 0.0316 0.0107 0.0681 Figure 1: Influence of a single observation x11 on the sample selection bias (SSB) test. The data generating process is described in Section 4.1 with the only difference that here ρ = 0.75. The sample size is 200 and we generate 1000 replicates. The top panel represents the SSB test statistic, the middle panel its log10 (p-values), and the bottom panel represents its empirical power, i.e. the fraction of rejections of the null hypothesis among the 1000 replicates. On the horizontal axis, x11 varies between 0 and −6 and the corresponding y11 = 1. The dashed line in the top panel corresponds to 1.96, and in the middle panel to log10 (0.05). Figure 2: The solid line corresponds to the classical inverse Mills ratio. The dotted lines correspond to the robust inverse Mills ratio from Section 3.2. Figure 3: Comparison of classical and robust two-stage estimators with contamination of x1 . Top panel corresponds to y1 = 0, and bottom panel corresponds to y1 = 1. Case (a) corresponds to the classical estimator, (b) corresponds to robust probit with OLS on the second stage, (c) corresponds to robust IMR, and (d) to robust two-stage. Horizontal lines mark the true values of the parameters.
© Copyright 2024