Sample Selection, Heteroscedasticity, and Quantile Regression Blaise Melly, Martin Huber Preliminary First draft: December 2006, Last changes: February 2008 Abstract: Independence of the error term and the covariates is a crucial assumption in virtually all sample selection models. If this assumption is not satis…ed, for instance due to heteroscedasticity, both mean and quantile regression estimators are inconsistent. If independence holds indeed, all quantile functions and the mean function are parallel, which naturally limits the usefulness of quantile estimators. However, quantile estimators can be used to build tests for the independence condition because they are consistent under the null hypothesis. Therefore, we propose powerful tests based on the whole conditional quantile regression process. If the independence assumption is violated, quantile functions are not point identi…ed, but we show that it is still possible to bound the coe¢ cients of interest. Our identi…ed set shrinks to a single point either if independence holds or if some observations are selected and observed with probability one. Therefore, our model generalizes simultaneously the traditional sample selection models and the identi…cation at in…nity strategy. Keywords: sample selection, quantile regression, heteroscedasticity, test, bootstrap, bounds JEL classi…cation: C12, C13, C14, C21 We have bene…ted from comments by Michael Lechner and seminar participants at the University of St. Gallen. Addresses for correspondence: Blaise Melly, MIT Department of Economics, 50 Memorial Drive, E52-251d, Cambridge, MA 02142, USA, [email protected], www.siaw.unisg.ch/lechner/melly; Martin Huber, SIAW, University of St. Gallen, Varnbüelstrasse 14, 9000 St. Gallen, Switzerland, [email protected]. 1 Introduction Selection bias arises when the outcome of interest is only observable for a subsample of individuals conditional on selection and when selection is not random. A prominent example in labor 1 economics consists of the determinants of wages and labor supply behavior of females. Individuals are assumed to o¤er a positive labor supply only if their potential wage exceeds their reservation wage such that a selection bias arises if we try to estimate the wage o¤er function in the working subsample. The ability to consistently estimate econometric models in the presence of nonrandom sample selection is one of the most important innovations in microeconometrics, as illustrated by the Nobel Prize received by James Heckman. Gronau (1974) and Heckman (1974 and 1979) have addressed the selectivity bias and proposed fully parametric estimators. Naturally, this approach leads to inconsistent results if the distribution of the error term is misspeci…ed. Therefore, Cosslett (1991), Gallant & Nychka (1987), Powell (1987), Ahn & Powell (1993), and Newey (1991) have proposed semiparametric estimators for the sample selection model. More recently, Das, Newey & Vella (2003) have proposed a fully nonparametric estimator for this model. While these papers have weakened the parametric and distributional assumptions made originally, they all assume independence between the error term and the regressors. This assumption is crucial in parametric, semiparametric and nonparametric models, as conditioning on the selection probability is not su¢ cient to correct for the selection bias if it is violated. However, dependence in general and heteroscedasticity in particular is an ubiquitous phenomenon in the …elds where sample selection models have been used. As suggested by Mincer (1973) in his famous human capital earnings model, residual wage dispersion should increase with experience and education. In line with this …nding, the large majority of the applications of quantile regression in the empirical literature …nds di¤erent coe¢ cients on di¤erent parts of the conditional distribution. Therefore, the independence assumption cannot be taken as granted in most economic applications. Donald (1995) has alleviated this assumption and proposed a two-step estimator that allows for conditional heteroscedasticity but requires the error terms to be bivariate normally distributed.1 Since distributional assumptions are always di¢ cult to motivate and it is not clear why the regressors should a¤ect only the …rst two moments of the conditional distribution, we suppress the normality assumption in this paper. In the absence of selection, Koenker & Bassett (1978) proposed and derived the statistical 1 Chen & Khan (2003) have proposed a semiparametric estimator allowing for heteroscedasticity. However, it appears that the proper identi…cation of their model requires a variable a¤ecting the variance but not the mean of the dependent variable conditionally on the regressors. This additional exclusion restriction renders this model unattractive. 2 properties of a parametric (linear) estimator for conditional quantile models. Due to its ability to capture heterogeneous e¤ects, its theoretical properties have been studied extensively and it has been used in many empirical studies; see, for example, Powell (1986), Guntenbrunner & Jureµcková (1992), Buchinsky (1994), Koenker & Xiao (2002), Angrist, Chernozhukov & FernándezVal (2006). Chaudhuri (1991) analyzed nonparametric estimation of conditional QTE. Buchinsky (1998b), Koenker & Hallock (2001), and Koenker (2005) provide a comprehensive discussion of the quantile regression models and its recent developments. Buchinsky (1998 and 2001) was the …rst to consider the semiparametric sample selection model for conditional quantiles. He extends the series estimator of Newey (1991) for the mean to the estimation of quantiles. The problem with this approach is that the independence assumption is required to obtain the nice partial linear representation for the mean and the quantiles. Implicitly, Buchinsky (1998 and 2001) assumes independence between the error term and the regressors conditional on the selection probability. One implication of this assumption is that all quantile regression curves are parallel. Naturally, this restricts the usefulness of the estimator because it implies that all quantile regression coe¢ cients are identical and equal to the mean slope coe¢ cients. Thus, this approaches does not allow to estimate the e¤ects of the regressors on the conditional distribution of the dependent variable. However, Buchinsky (1998a) estimator can be very usefull. The …rst motivation for quantile regression was not to estimate the e¤ects of covariates on the conditional distribution (actually, Koenker and Bassett assume independence in their seminal paper) but was the robustness of the estimates in the presence of non-Gaussian errors. A similar result applies to the sample selection model and we show that very signi…cant e¢ ciency gains can be achieved when the distribution of the error term has fat tails. The second motivation for quantile regression was to provide robust and e¢ cient tests for the presence of heteroscedasticity, as suggested by Koenker & Bassett (1982). Testing the independence assumption is even more acute in the presence of sample selection. As explained above, both mean and quantile estimators are inconsistent when this assumption is violated. It is therefore surprising that such a test has not been proposed, what we do in section 3. Under the null hypothesis of independence, the quantile regression estimator proposed by Buchinsky (1998a) consistently estimates the coe¢ cients, that are the same for all quantiles. When the independence assumption is violated, the estimates are not consistent but the slope coe¢ cients di¤er from one quantile to another, which gives power to the test. 3 As suggested by Koenker & Bassett (1982) we could consider a …nite number of quantiles at test whether the regression coe¢ cients are the same at all of these quantiles. However, a more powerful test statistic can be built using the whole conditional quantile process. We therefore suggest a test procedure similar to that proposed by Chernozhukov & Fernández-Val (2005). The critical values for this test are obtained by subsampling the empirical quantile regression processes. Since the computation of the estimates is quite demanding, we also apply the suggestion of Chernozhukov & Hansen (2006) consisting in resampling the score instead of re-computing the whole process. Our Monte Carlo simulations show that both the size and power of our Kolmogorov-Smirnov and Cramer-Von-Mises-Smirnov statistics are very satisfactory. This paper would be incomplete if we did not o¤er a solution when the independence assumption is rejected, what we expect to be the case in a wide range of applications. In this case, it appears that point identi…cation of the mean and quantile coe¢ cients is impossible. However, in the spirit of the work of Manski (1989 and 1994), we show that it is still possible to bound the quantile coe¢ cients even in the absence of bounded support. Our bounds are more informative than the worst-case bounds of Manski because we maintain the linear functional form and we make an independence assumption for the ranks of the distribution function. This last assumption relaxes the traditional independence assumption since it allows for heteroscedasticity and all type of dependence between the covariates and the potential wage distribution. A very appealing feature of our bounds is that the identi…ed interval collapses to a single point in two special cases: when there is independence between the error terms and the covariates and when there are observations whose probability of selection is close to one. The …rst case is obvious since we are back to the classical sample selection model but it is important since it implies that the upper and lower bounds will be quite close when we have only a small amount of dependence. The second case is an example of Chamberlain (1986) "identi…cation at in…nity". This approach has been used by Heckman (1990) and Andrews & Schafgans (1998) to identify the constant in a traditional sample selection model. In the case of dependence between the error term and the covariates, it can even be used to identify the slope coe¢ cients. Our bounds also generalize this identi…cation strategy to the case where some observations are observed with a high, but below 1 probability. In this case, we may get a short identi…ed interval for the coe¢ cients of the quantile regression even when they are not point identi…ed. Two short applications illustrate our results. First we apply our tests to the small textbook 4 data set of Mroz (1987) and can reject the independence assumption at the 10% signi…cance level. Second, using the data set of Mulligan & Rubinstein (2005), we …rst reject the null hypothesis of independence at the 0.1% level. We then bound the coe¢ cients of the model under our weaker set of assumptions. The remainder of this paper is organized as follows. In section 2 we describe the sample selection model and discuss the role of the independence assumption. In section 3 we outline the test procedure. Section 4 is devoted to the bounds of the coe¢ cient bounds when the independence assumption is rejected. In section 5 Monte Carlo simulations show the possible e¢ ciency gains of quantile regression in the sample selection model as well as the power and size properties of our tests. Section 6 revisits two typical applications of sample selection models. Section 7 concludes. 2 The Classical Sample Selection Model The parametric two step estimator to control for sample selection bias in economic applications was …rst proposed by Heckman (1976 and 1979) and is known as type II tobit or heckit estimator. Newey (1991) suggested a semiparametric two step estimator based on series expansion of the inverse Mill’s ratio. Buchinsky (1998a) suggested an extension of this model to the estimation of conditional quantiles. As in these papers, we assume that the potential outcome is linearly dependent of X, a vector of covariates2 : Yi = c ( ) + Xi0 ( ) + "i ( ) . (1) The error term is assumed to satisfy the th quantile restriction, Q (" ( ) jX) = 0, such that ( ) could be estimated consistently by traditional quantile regression if there was no sample selection problem. However, Y is latent and only observed conditional on Di = 1. Thus, the observed outcome Y is de…ned as Yi = c ( ) + Xi0 ( ) + "i ( ) if Di = 1 and not observed otherwhise. D is an indicator function that depends on Z, a superset of X.3 Identi…cation of ( ) requires identi…cation of Pr (D = 1jZ). The rest of the paper does not depend on how Pr (D = 1jZ) is 2 It is important to precise that all the insights of this paper (inconsistency in the presence of dependence, possibility to test the independence assumption, bounds) are valid for a nonparametric sample selection model. We consider the classical parametric model because it is more often applied and for simplicity. 3 For identi…cation, Z has to include at least one continuous variable which is not in X and has a non-zero coe¢ cient in the selection equation. 5 identi…ed but for completeness we make the following assumption Di = 1 Zi0 + ui 0 : (2) This is a parametric restriction for the sample selection equation. We implement our test statistic by using the estimator suggested by Klein & Spady (1993) to estimate . Therefore, we use their assumptions, in particular we assume that U ? ZjZ 0 This independence assumption condition- ally on the index can be relaxed if Pr (D = 1jZ) is estimated nonparametrically, as proposed in Ahn & Powell (1993). The observed conditional quantiles of the observed outcome can be formulated as Q (Y jX; D = 1) = c ( ) + Xi0 ( ) + Q (" ( ) jX; D = 1). If selection into observed outcomes was random, then Q (" ( ) jX; D = 1) = 0 and the outcome equation could be estimated consistently by quantile regression. However, in general Q (" ( ) jX; D = 1) 6= 0. For identi…cation, Buchinsky (1998a) assumes Assumption 1 : (u; ") has a continuous density; Assumption 2 : fu;" ( jZ) = fu;" ( jZ 0 ). Assumption 2 implies that Q (" ( ) jX; D = 1) depends on X only through the linear index 4 Z0 and provides us with the following representation Q (Y jX; D = 1) = c ( ) + Xi0 ( ) + Q (" ( ) jX; D = 1): h (Zi0 ) = Q (" ( ) jZi0 ; D = 1) is an unknown nonlinear function and the residual ( ) satis…es the quantile restriction Q ( ( ) jX; h (Z ) ; D = 1) = 0 by construction. This representation shows that ( ) can be estimated by a quantile regression of Y on X and on a series approximation of Z 0 as long as there is an excluded element in Z with a corresponding nonzero coe¢ cient in . The problem is that Assumption 2 implies that the quantile slope coe¢ cients ( ) are con- stant across the distribution and are equal to the mean slope coe¢ cient . This becomes obvious when remembering that X Z: Therefore, Assumption 2 implies that "?XjZ 0 . Conditional on h (Z ) the distribution of (u; ") does not depend on the regressors’ values X and are homoscedastic. Thus, quantile regression does not provide more information on the e¤ect of X on 4 More generally, we can allow for fU;E ( jZ) = fU;E ( j Pr (D = 1jZ)). This weakens the …rst step independence assumption but not the second step one. 6 Y than mean regression. All conditional quantile functions are parallel and only the constant changes from one conditional quantile to another.5 Let us now assume that "?XjZ 0 case ", does not hold such that assumption 2 is violated. In this are generally heteroscedastic and dependent on X: and fu;" is not independent of Z even when conditioning on Z 0 . Coe¢ cients di¤er across quantiles and the regression slopes Xi0 are not parallel for various . Including h (Zi0 ) in the outcome equation will generally not allow to estimate ( ), consistently. This is due to the fact that the selection rule is likely to select either low or high values of latent outcomes into the subsample of observed outcomes due to the covariance of (u; "). This is innocuous when the errors are homoscedastic conditional on Z 0 , as selection shifts only the location of the (parallel) quantile and mean regression curves, whereas their gradient remains unchanged. However, in the presence of heteroscedastic errors, positive or negative sample selection generally causes ^ ( ) and ^ to be systematically over- or underestimated. A graphical illustration shall elucidate the intuition. Figure 1 displays 500 simulated realizations of (X; Y ) under homoscedastic (1a) and heteroscedastic (1b) errors conditional on Z 0 . The true median regression curve (solid line) in 1a and 1b is horizontal, and thus, the median slope coe¢ cient (0:5) on X given Z 0 is zero. Sample selection in 1a leaves the regression curve (dashed line) in the subsample of observed (X; Y ), i.e. crosses with framing, unchanged. Merely its location is shifted upward, as realizations with small Y are more likely not to be observed (crosses without framing). However, when errors are heteroscedastic as in 1b, the disproportionate non-observability of realizations with low Y causes the slope coe¢ cient among observed realizations to diverge from the true value. In the case considered, the upward slanted regression curve indicates a positive relationship between the regressor and the outcome, even though the true coe¢ cient is zero. As one cannot observe whether the selection process more likely ’cuts out’ high or low Y , nonconstant estimates of quantile slope coe¢ cients bear no economic interpretation. They merely tell us, that at least one basic condition necessary in sample selection models is violated and that neither quantile, nor mean coe¢ cient estimators are consistent6 . However, this obvious 5 The constant is not identi…ed without further assumptions. Only an identi…cation at in…nity argument can solve this identi…cation problem. 6 Only if selection was random, the estimators would be consistent, but then h (Z 0 ) = 0 and the basic intuition for using sample selection models breaks down. 7 shortcoming also bears the major attraction of the two step quantile estimator. In fact, it can be used as a test for the independence assumption between errors and regressors by testing the null hypothesis H0 : = 8 [0; 1], where is some unknown constant coe¢ cient. If the null hypothesis holds, so does assumption 2. If coe¢ cients di¤er signi…cantly across quantiles, neither H0 nor assumption 2 hold and both the quantile and mean estimators yield inconsistent estimates. Figure 1 Regression slopes under homoscedasticity (1a) and heteroscedasticity (1b) 3 Test Procedure Our test procedure can be sketched as follows. We …rst estimate the selection equation using Klein & Spady (1993) estimator. We then estimate the conditional quantile regression process by approximating the bias by a series expansion of the inverse Mill’s ratio, as suggested by Newey (1991) for the mean and Buchinsky (1998a and 2001) for the quantiles. We test the independence assumption by testing whether the quantile regression slopes are the same over the whole distribution. The critical values of the test statistic are obtained by resampling as 8 presented in Chernozhukov & Fernández-Val (2005). When resampling is computationally too costly we use score resampling as suggested by Chernozhukov & Hansen (2006). In details, the semiparametric discrete choice estimator suggested in Klein & Spady (1993) is used to estimate the selection equation (2). Formally7 : ^ max < Xn (1 o ^ ^ i jZ; )] ; E(DjZ; )] + Di log[E(D Di ) log[1 where ^ E(DjZ; )= P j6=i P Dj ((Zi0 Zj0 )=bn ) ((Zi0 j6=i Zj0 )=bn ) (3) ; (4) where bn is a bandwidth dependent on sample size n and ( ) is a kernel function. The optimal bandwidth bopt is determined by the generalized cross validation criterion (GCV) as discussed n in Craven & Wahba (1979), Golub, Heath & Wahba (1979) and Li (1985), among others. This estimator attains the semiparametric e¢ ciency bound and is the most e¢ cient among those semiparametric estimators that do not put any restrictions on the distribution of the error term. Furthermore, heteroscedasticity of unknown form is allowed as long as it depends on the regressors only via the index. Klein and Spady’s Monte Carlo simulations indicate that e¢ ciency losses are only modest compared to probit estimation when the error term is standard normal, while being considerably more e¢ cient in …nite samples when the errors are non-Gaussian. In a second step, the function h(Z 0 ) is approximated by a power series expansion. The exact form of the approximation is asymptotically irrelevant. As suggested by Buchinsky (1998), we use a power series expansion of the inverse Mill’s ratio of the normalized estimated index. Thus, the …rst order approximation will be su¢ cient if the error term is normally distributed. Anyway, the estimator is consistent since the order of the approximation increases with the sample size. The coe¢ cient estimates ^ and ^ are obtained by solving the following minimization problem: X ^ ( ) ; ^ ( ) = min 1 ; n where (a) = a( sett (1978) and J 1 (a Yi Xi0 J Zi0 ^ (5) 0)) is the check function suggested by Koenker & Bas- (Zi0 ^ ) is a polynomial vector in the inverse Mill’s ratio 0 J (Zi ^ ) = (1; (Zi0 ^ ); (Zi0 ^ )2 ; :::; (Zi0 ^ )J ). Again, GCV is used to determine the optimal maximum order J. 7 More precisely: we apply the estimator used in Ger…n (1996) where the trimming term o Pn ^ ^ equation (3) such that ^ max < (1 d) log[1 + E(djz; )] + d log[E(djz; )+ ] . 9 = 0:5=n is added to As discussed in section 2, one would like to test H0 : ( )= for 8 (0; 1), where is some unknown constant coe¢ cient. This can be done by de…ning a grid of q equidistant quantiles between zero and one, 1:q T (0; 1) ; and considering the general null hypothesis H0 : We estimate ( )= ; T: (6) by the median regression coe¢ cients vector ^ (0:5). Alternatively, we could use the mean estimate but this would require the existence of at least the …rst two moments of Y given X. We test the null hypothesis using the Kolmogorov-Smirnov (KS) and the Cramer-VonMises-Smirnov (CMS) statistics on the empirical inference process ^ ( ) p TnKS = sup njj ^ ( ) T where jjajj ^ denotes uniformly in . p ^ (0:5) jj ^ and TnCM S = n Z T jj ( ) ^ (0:5): ^ (0:5) jj2 d ; ^ a0 ^ a and ^ is a positive weighting matrix such that ^ = (7) + op (1), is positive de…nite, continuous and symmetric, again uniformly in We simply use the identity matrix in our simulations and aplications. . TnKS ; TnCM S per se are obviously not very useful since we don’t know their asymptotic distribution. However, Chernozhukov & Fernández-Val (2005) show that asymptotically valid critical values can be obtained by bootstrapping the recentered test statistic. To this end, B subsamples of block size m are drawn out of the original sample with replacement to compute the inference process ^ where 1 i m;i ( ) ^ m;i (0:5) ; (8) B and ^ m;i ( ) are the quantile slope coe¢ cient estimates for draw i and block size m. The corresponding KS and CMS test statistics of the recentered bootstrapped process are p KS Tn;m;i = sup mjj ^ m;i ( ) ^ m;i (0:5) ( ^ ( ) ^ (0:5))jj ^ and T Z CM S Tn;m;i = m jj ^ m;i ( ) ^ m;i (0:5) ( ^ ( ) ^ (0:5))jj2^ d : (9) T The distribution free p-values for the test statistics are obtained by simply estimating the probP ability Pr[T ( ^ ( ) ^ (0:5) ( ( ) )) > Tn ] by 1=B B i=1 IfTn;m;i > Tn g, where Pr[ ] is a probability measure and I is the indicator function. However, the repeated computation of co- e¢ cient estimates for each resampling step can get quite costly, especially in large samples. For this reason, we follow Chernozhukov & Fernández-Val (2005) and use score resampling based on 10 the linear approximations of the empirical inference processes instead, which is considerably less burdensome. The linear representation for the inference process is given by p n n( ^ ( ) ^ (0:5) ( ( ) )) = 1 X p si ( ) + op (1): n (10) i=1 Again, B subsamples of estimated scores are drawn. Let m and 1 i a speci…c subsample for blocksize to P B. The estimated inference process is de…ned as 1=m j i s^j ( ). KS and CMS i statistics are then KS Tn;m;i p sup mjj1=m T X j s^j ( )jj ^ and i CM S Tn;m;i m Z T jj1=m X j s^j ( )jj2^ d : (11) i The computation of the score function in the quantile sample selection model framework is presented in the appendix of Buchinsky (1998a), using the Ichimura (1993) estimator for the …rst step estimation of the selection probability. In the appendix, we present a slightly di¤erent version, which is adapted to the estimator suggested by Klein & Spady (1993). 4 Bounds It has been argued in section 2 that the traditional mean and quantile sample selection estimators are inconsistent if the independence assumption is violated. Still, this does not necessarily mean that nothing can be said about the the coe¢ cients’size. In the normal multiplicative heteroscedastic sample selection model, the coe¢ cients are even point identi…ed as shown by Donald (1995). Under a much more general setting, Manski (1989 and 1994) derives the worst-case bounds for the conditional mean and quantiles in the absence of any further assumption. Here, an intermediate path between the worst case bounds of Manski and the classical sample selection model is pursued. Apart from the independence assumption, all model assumptions made by Buchinsky (1998a) are maintained to derive bounds of the quantile regression coe¢ cients. Thus, the distribution of Y is not restricted and the regressors are allowed to in‡uence the whole conditional distribution of the dependent variable and not only the …rst moment(s). To this end, independence is replaced by the following, less restrictive, assumption: FY (FY 1 ( jX = x)jX = x; P = p; D = 1) = FY (FY 1 ( jX = x ~)jX = x ~; P = p; D = 1) = (p); (12) 8 x and x ~ in the support of X. P denotes the selection probability de…ned as P = Pr(D = 1jZ). Equation (12) states that the rank (p) in the observed distribution that corresponds to rank 11 in the latent (true) distribution does not depend on X given P . This assumption is implied by the stronger independence assumption that has been made in the majority of studies using parametric and nonparametric selection models. In contrast to Manski (1989 and 1994), equation (12) excludes for instance the possibility of positive selection for low values of X and negative selection for high values of X. This is the main reason why tighter bounds are obtained. The second di¤erence with Manski is that a linear parametric speci…cation for the conditional quantile of the dependent variable is assumed: FY 1 ( jX = x) = c ( ) + x0 ( ) : (13) Linearity is not essential for the basic idea of the bounds but it will be maintained for simplicity. By combining equations (12) and (13), one obtains FY (c ( ) + x0 ( ) jX = x; P = p; D = 1) = FY (c ( ) + x ~0 ( ) jX = x ~; P = p; D = 1) = (p): The value at the (p)th conditional quantile in the observed outcome corresponds to the value at the th conditional outcome in the latent outcome. If (p) was known, ( ) could be con- sistently estimated by regressing Y on X and on a nonparametric series expansion of P at the (p)th quantile, conditional on D = 1. However, (p) is unknown to the researcher due to the unconsciousness of the selection rule. If there was random or no sample selection, (p) would be (1 p) , p equal to . Under nonrandom and highly positive selection, (p) is equal to under highly negative selection, (p) is equal to p 8. whereas Along with assumption (12) this information can be used to bound the unknown (p). By bounding (p), denote the vector of the true slope coe¢ cients at the th ( ) is bounded, too. Let ( ; p) quantile for observations with P = p and let ( ; p) denote the vector of the slope coe¢ cients at the (p)th quantile for the observed realizations with P = p. Let PD=1 denote the set of selection probabilities for the observed population, i.e. PD=1 encounters all P conditional on D = 1. The true quantile coe¢ cient at P = p, 8 An example should elucidate this point. Consider a distribution that is equal to the closed interval f1; 2; ::; 10g. The value at = 0:4 is then 4. Let p = 0:8. Under highly positive selection, the lower 1 p (0:2) share of the distribution will not be observed after selection. I.e., f1; 2g are not selected and the observed interval is f3; 4; :::; 10g. 4 is now at the 0:25th quantile of observed values, i.e. 0:25 = (p) = (1 p) . p Under highly positive selection , the upper 0:2 share of the distribution, i.e. f9; 10g are not selected an the observed interval is f1; 2; :::; 8g. 4 is now at the 0:5th quantile of observed values, i.e. 0:5 = (p) = p . The same holds for any other value for 12 and p. ( ; p), is located within these bounds: ( ; p) [L (p) = min (1 p) ;p] p 2[ ( ; p); U (p) = max 2[ ( ; p)]; (1 p) ;p] p (14) where p 2 PD=1 (15) Let us additionally assume that ( ; p) is constant in p, i.e., ( ; p) = ( ). Then, ( ) has to be an element out of the intersection of possible values for ( ; p) across all selection probabilities in the observed population: ( ) 2 [max L (p); p Thus, the …nal bounds for of possible values for min U (p)]: p are the best case solution (in the sense that they minimize the range over various p) out of all bounds obtained in equation (14). However, the identi…cation of bounds on (p) (and ( )) hinges on two conditions. Let denote the maximum of all P 2 PD=1 , i.e. the maximum of the selection probabilities in the observed population. In order to obtain informative bounds, (i) > 0:5 and (ii) 1 < < has to hold. For 0:5, bounds on (p), that is naturally bounded to the interval [0; 1], are not informative as either (1 ) 0, or 1, or both. For such cases it is obvious that has to lie within the 1 distribution, as 1 ; ( ) cannot be bounded either. Secondly, upper share and lower share of the latent determine the observed distribution for the worst cases of highly positive and highly negative selection, respectively. Informative bounds cannot be obtained for any hitting the boundaries of or falling out of the interval [1 ; ], which gives rise to 1 < < . We can use our framework of interval identi…cation based on bounds to investigate under which conditions point identi…cation of ( ) is obtained. Point identi…cation can be considered as special case when the interval that is identi…ed collapses to a single point. This is the case when either independence between the error terms and the covariates or identi…cation at in…nity (or both) is satis…ed. If independence is satis…ed, point identi…cation of the slope coe¢ cients is obtained because all slope coe¢ cients are constant across quantiles and thus, the minimum is equal to the maximum in 14. In this case, we are back in the framework of the classical sample selection model. If independence does not hold, point identi…cation is still feasible if the data contain observations with selection probability P = 1. Note that P = 1 is not only su¢ cient, but also necessary for point identi…cation under heteroscedasticity. This is obvious from the fact that only for P = 1, (1 p) p = p = = (p), whereas for 0 < p < 1, the upper and lower 13 bound of (p) di¤ers by (1 p) . p Identi…cation conditional on P = 1 is known as identi…cation at in…nity and has …rst been discussed by Chamberlain (1986). The strategy suggested in this section, however, does not require to have such observations at hand and still allows to bound the coe¢ cients for the share of the population for which P > 0:5. Furthermore, the constant can be bounded by the same strategy. In contrast to the slope coe¢ cient estimation, the constant is not point identi…ed even when independence is satis…ed, but P < 1 for all observations. However, point identi…cation of the constant is feasible by identi…cation at in…nity as it has been discussed in Heckman (1990) and Andrews & Schafgans (1998). As a …nal remark it is worth noting that one needs not necessarily assume (p) = (see the discussion in Heckman & Vytlacil (2005)). Without this assumption, it is not possible to integrate the results over di¤erent participation probabilities but the bounds on 5 (p) at each P = p remain valid. Monte Carlo Simulations In this section, we present the results of Monte Carlo simulations on the e¢ ciency and robustness of quantile regression in sample selection models as well as on the power and small sample size properties of our tests in …nite samples. In their seminal paper on quantile regression, Koenker & Bassett (1978) provide Monte Carlo results on the e¢ ciency of various estimators for several distributions of the errors. One of their conclusions is that under Gaussian errors, the median estimator makes only small sacri…ces of e¢ ciency compared to the mean estimator. It is, however, considerably more e¢ cient when errors follow a non-Gaussian distribution, such as Laplace, Cauchy or contaminated Gaussian. Thus, even if the errors are independent of the covariates, quantile regression methods can be preferable to mean regression for the sake of e¢ ciency gains. A second argument in favor of quantile regression is its increased robustness in the case of contaminated outcomes resulting in smaller biases in coe¢ cient estimates. To illustrate that such e¢ ciency and robustness considerations also apply to sample selection models, we conducted Monte Carlo simulations for (i) t-distributed error terms with three degrees of freedom, (ii) Cauchy distributed error terms, (iii) contaminated normal errors, (iv) contaminated outcomes. The data generating process in speci…cations (i) to 14 (iii) is de…ned as Di = IfXi + Zi + ui > 0g; Yi = Xi + "i if Di = 1; X N (0; 1); = 1; In speci…cation (i) u; " Z N (0; 1); = 1; t(df=3) and in (ii) u; " = 1: Cauchy. The covariance between the errors is set to 0.8 for both cases, i.e. Cov(U; E) = 0:8. In (iii), the errors are a Gaussian mixture which is constructed as ui = 0:95 ui1 + 0:05 ui2 ; "i = 0:95 "i1 + 0:05 "i2 ; Cov(u1 ; "1 ) = (0:8); where " u1 "1 N (0; 1); N (0; 1); u2 "2 N (0; 100); N (0; 100); Cov(u2 ; "2 ) = (8); is the cumulated density function. In (iv) the errors are standard normal: u N (0; 1), N (0; 1), Cov(u; ") = 0:8. However, the outcome is contaminated by the factor 10 with 5% probability: Yi = Di [(1 j)(Xi + ") + j(10Xi + ")]; j U (0; 1); P r(j = 1) = 0:05: For each of the model speci…cations a simulation with 1000 replications is conducted for a sample of n = 400 observations. The bandwidth for the (…rst step) selection estimator is set to bn = 0:3. Table 1 Coefficient estimates and variances of mean and median estimators 400 observations, 1000 replications, bandwidth = 0.3 Median estimator Distributions Mean estimator Estimate Variance Estimate Variance (i) Student’s t (df=3) 1.015 0.018 1.004 0.028 (ii) Cauchy 1.026 0.037 1.498 2756.653 (iii) Contaminated normal error 0.998 0.014 0.985 0.061 (iv) Contaminated outcome 1.105 0.016 1.914 0.178 The estimates and variances of the -coe¢ cients for the median and mean estimator are reported in table 1. In all speci…cations considered, the median estimator is more e¢ cient than the 15 mean estimator. In the contaminated outcome speci…cation it outperforms the mean estimator more than 10 times. In the case of Cauchy distributed error terms, the variance of the mean coe¢ cient is theoretically unbounded whereas it stays quite moderate for the median estimator. The median estimator is also superior in robustness, which becomes particularly clear when looking at the coe¢ cient estimates of the contaminated outcome speci…cation. While the mean estimate is severely upward biased, the median estimate is still only moderately higher than the true coe¢ cient = 1. In the second part of this section, we present results on the power and size properties of the Kolmogorov-Smirnov (KS) and Cramer-Von-Mises-Smirnov (CMS) resampling tests in …nite samples. We do so for three speci…cations based on (i) Gaussian, (ii) t-distributed and (iii) Cauchy distributed error terms u and ". In speci…cation (i), the data generating process is de…ned as: Di = IfXi + Zi + ui g > 0; Yi = Xi + (1 + Xi )"i ]; u N (0; 1); = 1; where , " = 1; N (0; 1); = 1; Cov(u; ") = 0:8; X N (0; 1); Z N (0; 1); = 0; 0:2; 0:5; are mean coe¢ cients. One is interested in the rejection frequencies of the KS and CMS statistics testing the null hypothesis of constant coe¢ cients across all quantiles by drawing repeated bootstrap samples. As outlined in section 2, H0 : = is some unknown constant coe¢ cient. In the location shift model ( 8 0 1, where = 0), the error term E is independent of the regressor X and hence, H0 is true. In this case, the rejection rates in the Monte Carlo simulations yield the tests’size properties. Heteroscedasticity is introduced if is set to values di¤erent from zero. In the location scale shift models ( = 0:2; 0:5), the null hypothesis is false and the rejection frequencies indicate the tests’power to reject the incorrect H0 . In order to construct the test statistics, the coe¢ cients are estimated at equidistant quantiles with step size 0:01 and compared to the the median estimate ^ =0:5 . Results are presented for three di¤erent quantile regions for which the quantile coe¢ cients are estimated: 8 > > T = f0:05; 0:06; :::; 0:95g > < [0:05;0:95] T[0:1;0:9] = f0:10; 0:11; :::; 0:90g > > > : T = f0:20; 0:21; :::; 0:80g [0:2;0:8] Therefore, the number of estimated quantile coe¢ cients di¤ers across regions. In particular, 16 the largest region [0:05; 0:95] will include quantiles that are relatively close to the boundaries 0; 1, whereas all quantiles in the ‘narrow’ region [0:2; 0:8] are situated well in the interior. As will be shown below, the choice of the region a¤ects the tests’…nite sample properties and it is the distribution of the error term " that determines wether a narrow or large quantile region is preferable. Similarly to Chernozhukov & Fernández-Val (2005), 1000 Monte Carlo replications per simulation and 250 bootstrap replications within each replication are conducted to compute the critical values of the test statistics. Five samples of sizes n = 100 to n = 3200 are generated. In each sample, we draw two di¤erent subsamples with replacement for the bootstrap, with block size m = 20 + n1=4 and m = n, respectively. 17 Table 2 Empirical rejection frequencies for 5% resampling tests " N(0,1), 250 bootstrap draws, 1000 replications Kolmogorov-Smirnov statistics m = 20 + n1=4 =0 = 0:2 = 0:5 [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] n = 100 0.006 0.000 0.000 0.007 0.003 0.001 0.034 0.011 0.002 n = 400 0.030 0.014 0.003 0.306 0.184 0.070 0.920 0.818 0.500 n = 800 0.024 0.026 0.012 0.644 0.517 0.283 1.000 0.998 0.957 n = 1600 0.040 0.032 0.020 0.936 0.906 0.717 1.000 1.000 1.000 n = 3200 0.030 0.032 0.031 0.998 1.000 0.973 1.000 1.000 1.000 m=n =0 = 0:2 = 0:5 [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] n = 100 0.003 0.000 0.000 0.005 0.001 0.001 0.021 0.004 0.001 n = 400 0.022 0.006 0.002 0.273 0.145 0.054 0.895 0.780 0.434 n = 800 0.024 0.019 0.009 0.636 0.482 0.253 0.996 0.994 0.952 n = 1600 0.038 0.033 0.017 0.934 0.885 0.705 1.000 1.000 1.000 n = 3200 0.042 0.029 0.027 0.999 1.000 0.975 1.000 1.000 1.000 Cramer-Von-Mises-Smirnov statistics m = 20 + n1=4 =0 = 0:2 = 0:5 [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] n = 100 0.001 0.000 0.000 0.001 0.001 0.000 0.011 0.003 0.002 n = 400 0.011 0.006 0.002 0.191 0.103 0.047 0.933 0.835 0.468 n = 800 0.011 0.010 0.007 0.610 0.451 0.216 1.000 0.998 0.955 n = 1600 0.023 0.013 0.008 0.959 0.898 0.704 1.000 1.000 1.000 n = 3200 0.022 0.020 0.014 1.000 0.998 0.969 1.000 1.000 1.000 m=n =0 = 0:2 = 0:5 [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] n = 100 0.001 0.000 0.000 0.001 0.000 0.000 0.006 0.005 0.003 n = 400 0.011 0.006 0.002 0.192 0.112 0.045 0.924 0.838 0.498 n = 800 0.017 0.009 0.010 0.617 0.463 0.252 0.999 0.998 0.957 n = 1600 0.026 0.020 0.014 0.958 0.911 0.735 1.000 1.000 1.000 n = 3200 0.026 0.024 0.021 1.000 0.999 0.978 1.000 1.000 1.000 18 The empirical rejection frequencies reported in table 2 suggest that the resampling tests seem to work quite well under standard normally distributed errors. In the case of homoscedastic errors ( = 0), both the KS and CMS statistics are more conservative than the true rejection frequencies (5%) of the data generating process, at least for the sample sizes considered. However, both statistics generally seem to converge to the true values as sample size increases, although not monotonically. The KS test does so at a faster pace than the CMS test. For the latter, the larger resampling block size (m = n) works somewhat better than the smaller one (m = 20 + n1=4 ). Under heteroscedastic errors, the test statistics converge to the true rejection rates of 100% as sample size increases. As expected, this happens at a faster pace for = 0:5 than for = 0:2. The power properties of the CMS and the KS statistics are rather similar and quite satisfactory, given that the sample size is not too small. The convergence of the KS statistic is faster at both levels of heteroscedasticity for the smaller block size, whereas the converse seems to be true for the CMS statistic, given that the sample size is not too small. As one would expect, the empirical rejection frequencies converge faster to the true values as the quantile region increases and this holds true for both tests and any value . Summing up, both tests seem to perform pretty well in …nite samples with Gaussian errors. For moderate sample sizes of several thousand observations, the power is su¢ ciently high and small size distortions o hardly a¤ect the test statistics. In our simulation, the KS test seems to be somewhat superior due to a faster convergence when = 0. In speci…cation (ii), almost the same model is used as before, only the errors are changed to be t-distributed with three degrees of freedom, U t(df=3), " t(df=3), Cov(u; ") = (0:8). Table 3 reports the rejection frequencies for t-distributed error terms. As one would expect, deviations from true rejection rates in …nite samples generally increase due to the fatter tails compared to Gaussian errors. For = 0, rejection frequencies of the KS test ‘overshoot’in small samples and seem to converge to the true values as sample size increases. At least in small samples the CMS test seems to perform slightly better as it stays on the ‘safe side’, i.e., it is more conservative than the true rejection rates. Furthermore, the CMS rejection frequencies converge faster to 100% under heteroscedastic errors. The larger block size (m = n) yields better results for both tests, given that the sample size is not too small. Contrarily to the case of Gaussian errors, the largest quantile region is generally not the best choice. Under heteroscedastic errors, T[0:1;0:9] is superior for both the KS and CMS statistic, which is again due to fat tails. Even though power and size properties are quite appealing for both tests, the CMS based procedure seems to be preferable in 19 the case of t-distributed errors. 20 Table 3 Empirical rejection frequencies for 5% resampling tests " t (df=3), 250 bootstrap draws, 1000 replications Kolmogorov-Smirnov statistics m = 20 + n1=4 =0 = 0:2 = 0:5 [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] n = 100 0.058 0.007 0.000 0.062 0.012 0.001 0.087 0.014 0.004 n = 400 0.084 0.044 0.011 0.284 0.189 0.070 0.810 0.811 0.482 n = 800 0.094 0.063 0.019 0.410 0.441 0.281 0.962 0.991 0.950 n = 1600 0.056 0.056 0.032 0.570 0.719 0.651 0.999 1.000 1.000 n = 3200 0.052 0.056 0.043 0.827 0.957 0.942 1.000 1.000 1.000 m=n =0 = 0:2 = 0:5 [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] n = 100 0.037 0.006 0.000 0.041 0.005 0.001 0.064 0.006 0.001 n = 400 0.070 0.032 0.009 0.252 0.153 0.043 0.801 0.758 0.415 n = 800 0.103 0.068 0.013 0.447 0.414 0.253 0.966 0.986 0.943 n = 1600 0.066 0.061 0.026 0.642 0.743 0.632 0.999 1.000 1.000 n = 3200 0.068 0.055 0.045 0.909 0.969 0.945 1.000 1.000 1.000 Cramer-Von-Mises-Smirnov statistics m = 20 + n1=4 =0 = 0:2 = 0:5 [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] n = 100 0.001 0.000 0.000 0.003 0.002 0.000 0.026 0.005 0.000 n = 400 0.022 0.012 0.002 0.215 0.107 0.026 0.894 0.815 0.466 n = 800 0.050 0.020 0.004 0.540 0.441 0.209 0.999 0.997 0.958 n = 1600 0.036 0.023 0.012 0.876 0.852 0.632 1.000 1.000 1.000 n = 3200 0.035 0.030 0.024 0.993 0.996 0.967 1.000 1.000 1.000 m=n =0 = 0:2 = 0:5 [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] n = 100 0.000 0.000 0.000 0.001 0.000 0.000 0.014 0.003 0.001 n = 400 0.020 0.007 0.001 0.205 0.104 0.019 0.890 0.800 0.462 n = 800 0.053 0.017 0.004 0.581 0.462 0.230 1.000 0.998 0.960 n = 1600 0.050 0.034 0.020 0.896 0.862 0.665 1.000 1.000 1.000 n = 3200 0.049 0.036 0.036 0.995 0.998 0.976 1.000 1.000 1.000 21 Lastly, table 4 displays the test rejection rates for Cauchy distributed errors (iii): u " Cauchy, Cauchy, Cov(u; ") = (0:8). Due to the Cauchy distribution’s property of unde…ned …rst and higher moments, the sample size needs to be su¢ ciently large (at least several 1000 observations) to obtain satisfactory results. Under homoscedastic errors, the CMS test is again more conservative than the KS test for any chosen block size. Given that the sample size is not too small, the former outperforms the latter for = 0:2, 0:5, as its rejection rates converge faster to 100%. In the majority of scenarios considered, the smaller block size yields better results for both tests and across di¤erent levels of , due to the smaller probability of extreme values related to the non-stationary behavior of the Cauchy distribution. For the same reason, the small quantile region T[0:2;0:8] is preferable to its larger alternatives. Thus, the power gains obtained by reducing the probability of disturbing outliers clearly outweigh the power losses due to considering a decreased range of the distribution in the inference process. Summing up, the CMS test seems to be somewhat superior to the KS test when errors are non-Gaussian, at least for the sample sizes considered. 22 Table 4 Empirical rejection frequencies for 5% resampling tests " Cauchy, 250 bootstrap draws, 1000 replications Kolmogorov-Smirnov statistics m = 20 + n1=4 =0 = 0:2 = 0:5 [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] n = 100 0.127 0.032 0.008 0.117 0.037 0.003 0.149 0.051 0.006 n = 400 0.081 0.056 0.035 0.097 0.102 0.053 0.303 0.375 0.335 n = 800 0.085 0.064 0.041 0.104 0.095 0.097 0.437 0.607 0.739 n = 1600 0.062 0.036 0.025 0.150 0.176 0.262 0.698 0.900 0.981 n = 3200 0.059 0.034 0.038 0.261 0.350 0.508 0.913 0.988 1.000 m=n =0 = 0:2 = 0:5 [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] n = 100 0.088 0.018 0.004 0.088 0.025 0.003 0.113 0.030 0.006 n = 400 0.091 0.056 0.018 0.103 0.084 0.030 0.285 0.333 0.270 n = 800 0.086 0.063 0.036 0.097 0.076 0.072 0.433 0.553 0.661 n = 1600 0.069 0.031 0.023 0.132 0.144 0.220 0.704 0.881 0.963 n = 3200 0.079 0.041 0.035 0.279 0.325 0.472 0.926 0.989 0.999 Cramer-Von-Mises-Smirnov statistics m = 20 + n1=4 =0 = 0:2 = 0:5 [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] n = 100 0.063 0.007 0.000 0.067 0.008 0.001 0.098 0.018 0.001 n = 400 0.073 0.041 0.004 0.104 0.075 0.017 0.420 0.489 0.279 n = 800 0.070 0.043 0.014 0.118 0.123 0.078 0.658 0.815 0.832 n = 1600 0.053 0.030 0.014 0.194 0.293 0.337 0.890 0.986 0.996 n = 3200 0.046 0.032 0.029 0.379 0.563 0.708 0.992 1.000 1.000 m=n =0 = 0:2 = 0:5 [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] n = 100 0.053 0.005 0.000 0.057 0.007 0.000 0.066 0.013 0.001 n = 400 0.072 0.036 0.004 0.090 0.066 0.013 0.395 0.441 0.254 n = 800 0.083 0.040 0.016 0.110 0.092 0.063 0.625 0.773 0.812 n = 1600 0.052 0.033 0.014 0.174 0.245 0.292 0.888 0.980 0.993 n = 3200 0.061 0.031 0.025 0.385 0.525 0.668 0.994 1.000 1.000 23 6 Labor Market Applications In this section, two applications of the test procedure to labor market data are presented. The …rst application is a text book example for heckit estimation given by (Greene 2003), using a sample of 753 married women originally investigated by (Mroz 1987). The data set contains information on the wages and hours worked of the 428 women with positive labor supply (D = 1) along with a set of regressors for the whole sample. Estimation is based on the conventional selection model with normally distributed errors and and additive and linear bias correction function. Yi = Xi0 + "i if Di = 1; Di = IfXi0 + "i Zi0 + ui g; E("jX) = 0; E("jX; D = 1) 6= 0; (16) Zi0 + ui now denotes the reservation wage that depends on a set of characteristics Z and the error term u. An individual provides positive labor supply (Di = 1) only if the o¤ered wage Xi0 0 + " is at least as high as the reservation wage. In the model presented in Greene (2003), Y is the hourly wage and X consists of experience, experience2 , eduction and a dummy for living in a large urban area and Z contains age, age2 , family income, a dummy for having kids, and education. See page 786 in chapter 22 of Greene (2003) for the results. Table 5 Labor market application I: p-values of the KS and CMS tests Test m = 20 + n1=4 m = 20 + n1=2:01 m=n m = n=4 KS 0.037 0.037 0.039 0.045 CMS 0.094 0.097 0.086 0.094 Table 5 reports the p-values of the KS and CMS tests, however approximating h(g) by a polynomial as outlined in section 2 rather than assuming a linear relationship. The number of bootstraps is B = 10; 000 and analogously to Chernozhukov & Fernández-Val (2005), four di¤erent block sizes m are used for the bootstrap subsamples. The coe¢ cients at 99 equidistant quantiles, T81 = f0:10; 0:11; :::; 0:89; 0:90g. Again, the GCV is applied to determine the optimal bandwidth bopt in (4) and the optimal maximum order n 9 opt bn = 0:2, opt J are estimated =2 24 opt 9 . J For all chosen m the KS and CMS tests reject the null hypothesis of constant quantile coe¢ cients on the 5% and 10% level, respectively. So even in very small samples, the tests proof to be quite powerful. The second application deals with a considerably larger data set. In their study on US women’s relative wages, Mulligan & Rubinstein (2005) estimate the conditional mean wages of married white women using the heckit estimator. They investigate two repeated cross-sections covering the periods 1975-1979 and 1995-1995 stemming from the US Current Population Survey (CPS) and consider only observations of married white couples. In this application Y represents the wife’s log weekly wage, which is computed with respect to the total annual earnings de‡ated by the US Consumer Price Index (CPI). Only prime age workers (25-54) that worked full time and at least 50 weeks in the respective year are considered. The vector of regressors X consists of wife’s working experience-15, wife’s (working experience-15)2 =100 and wife’s education, including a teacher dummy. Z contains X and additionally includes husband’s education, husband’s (working experience-15), husband’s (working experience-15)2 =100, and the number of children aged 0-6 present in the household. In period 1975-79, all in all 97,067 observations are available and d = 1 in 20,839 cases. For 1995-1999, the respective numbers are 87,004 and 35,206. Table 6 Labor market application II: p-values of the KS and CMS tests 1975-1979 Test m = 20 + n1=4 m = 20 + n1=2:01 m=n m = n=4 KS 0.003 0.000 0.000 0.000 CMS 0.001 0.000 0.000 0.000 1995-1999 Test m = 20 + n1=4 m = 20 + n1=2:01 m=n m = n=4 KS 0.000 0.000 0.000 0.000 CMS 0.000 0.000 0.000 0.000 Mulligan & Rubinstein (2005) compare the coe¢ cient estimates of the heckit regression to OLS estimates and …nd that the sample selection bias has changed over time from being negative (-0.075) in the …rst period to positive (0.161) in the second. This would suggest that from 25 1975 to 1979, married women out of labor force had higher average potential earnings than their working counterparts, whereas the converse was true within 1995 and 1999. Given that all other parametric assumptions are correct, this holds only if E is homoscedastic. However, the hypothesis of independent errors is rejected by both the KS and CMS tests on the 0.1% level for B = 1000, T81 = f0:10; 0:11; :::; 0:89; 0:90g10 , as reported in table 6. The highly signi…cant p-values suggest that one can neither draw conclusions on the size, nor on the sign of the selection bias. In fact, the bias might have been positive or negative throughout both periods. Therefore, the authors’conclusions about the development of potential earnings are not necessarily correct. As independence of E and X does not hold, coe¢ cient estimates are inconsistent and bear no economic interpretation. 7 Conclusions Independence between regressors and errors is a conditio sine qua non for the identi…cation in any parametric or semiparametric sample selection model. It is, however, a rather strong restriction that is likely to be violated in many …elds of research where two step estimators to correct for selection bias are heavily applied. In cases where homoscedasticity holds indeed, quantile regression methods do not seem to be particularly attractive in sample selection models at a …rst glance, as all quantile regression curves are parallel and all conditional quantile coe¢ cients are of the same quantity as the mean coe¢ cient. Applications which have found (and were happy to …nd) signi…cant di¤erences between coe¢ cients at di¤erent quantiles have merely proofed that their model assumptions are wrong and the estimator is inconsistent. However, quantile regression methods have valuable properties that also apply to sample selection models. Firstly, and this was the …rst motivation for suggesting quantile regression in Buchinsky (1998a), quantile-based estimators are more robust and more e¢ cient than mean estimators when distributions have fat tails. Secondly, as argued in Koenker & Bassett (1982), quantile coe¢ cients can be used to detect heteroscedasticity and thus violations of the independence assumption between errors and regressors. Thirdly, if independence is not satis…ed, quantile coe¢ cients can be bounded quite easily, which is not the case for mean coe¢ cients. In this paper, 10 Instead of the Gaussian kernel, the Epanechnikov kernel is used for ( ) in (4) to reduce computational burden. bonp t is 0.13 and 0.06 for 1975-1979 and 1995-1999, respectively, but it is locally increased to include at least 10 observations in ( ). opt J is 3 and 2, respectively. 26 all three arguments in favor of quantile regression were discussed in the light of sample selection models. We suggested a quantile-based and distribution-free procedure to test for homoscedasticity, demonstrated the satisfactory size and power properties by Monte Carlo simulations and applied it to labor market data. Given that independence between the errors and covariates holds, we showed that quantile estimators are more e¢ cient and more robust in sample selection models than mean estimators if errors are non-Gaussian. For the case that independence does not hold, we proposed conditions to identify the upper and lower bound for the interval identi…cation of quantile coe¢ cients. The application of the test procedure to labor market data previously investigated by Mulligan & Rubinstein (2005) clearly rejects homoscedasticity and indicates that the coe¢ cients obtained by heckit estimation are inconsistent. We have the strong suspicion that this is not an exception, but that a considerable share of the results presented in the sample selection literature are questionable due to the violation of the standard assumption of homoscedasticity. 27 References Ahn, H. & Powell, J. (1993), ‘Semiparametric estimation of censored selection models with a nonparametric selection mechanism’, Journal of Econometrics 58, 3–29. Andrews, D. & Schafgans, M. (1998), ‘Semiparametric estimation of the intercept of a sample selection model’, Review of Economic Studies 65, 497–517. Angrist, J., Chernozhukov, V. & Fernández-Val, I. (2006), ‘Vouchers for private schooling in colombia’, Econometrica. Buchinsky, M. (1994), ‘Changes in the u.s. wage structure 1963-1987: Application of quantile regression’, Econometrica 62, 405–458. Buchinsky, M. (1998a), ‘The dynamics of changes in the female wage distribution in the usa: A quantile regression approach’, Journal of Applied Econometrics 13, 1–30. Buchinsky, M. (1998b), ‘Recent advances in quantile regression models: A practical guidline for empirical research’, The Journal of Human Resources 33(1), 88–126. Buchinsky, M. (2001), ‘Quantile regression with sample selection: Estimating women’s return to education in the u.s.’, Empirical Economics 26, 87–113. Chamberlain, G. (1986), ‘Asymptotic e¢ ciency in semiparametric models with censoring’, Journal of Econometrics 32, 189–218. Chaudhuri, P. (1991), ‘Global nonparametric estimation of conditional quantile functions and their derivatives’, Journal of Multivariate Analysis 39, 246–269. Chen, S. & Khan, S. (2003), ‘Semiparametric estimation of a heteroskedastic sample selection model’, Econometric Theory 19, 1040–1064. Chernozhukov, V. & Fernández-Val, I. (2005), ‘Subsampling inference on quantile regression processes’, Sankhya: The Indian Journal of Statistics 67, 253–276. Chernozhukov, V. & Hansen, C. (2006), ‘Instrumental quantile regression inference for structural and treatment e¤ect models’, Journal of Econometrics 132, 491–525. Cosslett, S. (1991), Distribution-free estimator of a regression model with sample selectivity, in W. Barnett, J. Powell & G. Tauchen, eds, ‘Nonparametric and semiparametric methods in econometrics and statistics’, Cambridge University Press, Camdridge, UK, pp. 175–198. Craven, P. & Wahba, G. (1979), ‘Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation’, Numerische Mathematik 31, 377–403. Das, M., Newey, W. & Vella, F. (2003), ‘Nonparametric estimation of sample selection models’, Review of Economic Studies 70, 33–58. Donald, S. G. (1995), ‘Two-step estimation of heteroskedastic sample selection models’, Journal of Econometrics 65, 347–380. 28 Gallant, A. & Nychka, D. (1987), ‘Semi-nonparametric maximum likelihood estimation’, Econometrica 55, 363–390. Ger…n, M. (1996), ‘Parametric and semi-parametric estimation of the binary response model of labour market participation’, Journal of Applied Econometrics 11(3), 321–339. Golub, G. H., Heath, M. & Wahba, G. (1979), ‘Generalized cross validation as a method for choosing a good ridge parameter’, Technometrics 21(2), 215–224. Greene, W. H. (2003), Econometric Analysis, New Jersey: Pearson Education. Gronau, R. (1974), ‘Wage comparisons-a selectivity bias’, Journal of Political Economy 82(6), 1119–1143. Guntenbrunner, C. & Jureµcková, J. (1992), ‘Regression quantile and regression rank score process in the linear model and derived statistics’, Annals of Statistics 20, 305–330. Heckman, J. (1974), ‘Shadow prices, market wages and labor supply’, Econometrica 42, 679–694. Heckman, J. (1990), ‘Varieties of selection bias’, American Economic Review, Papers and Proceedings 80, 313–318. Heckman, J. J. (1979), ‘Sample selection bias as a speci…cation error’, Econometrica 47(1), 153– 161. Heckman, J. J. & Vytlacil, E. (2005), ‘Structural equations, treatment e¤ects, and econometric ½ policy evaluation 1’, Econometrica 73(3), 669U738. Ichimura, H. (1993), ‘Semiparametric least squares (sls) and weighted sls estimation of singleindex models’, Journal of Econometrics 58, 71–120. Klein, R. W. & Spady, R. H. (1993), ‘An e¢ cient semiparametric estimator for binary response models’, Econometrica 61(2), 387–421. Koenker, R. (2005), Quantile Regression, Cambridge University Press. Koenker, R. & Bassett, G. (1978), ‘Regression quantiles’, Econometrica 46(1), 33–50. Koenker, R. & Bassett, G. (1982), ‘Robust tests for heteroskedasticity based on regression quantiles’, Econometrica 50(1), 43–62. Koenker, R. & Hallock, K. F. (2001), ‘Quantile regression’, Journal of Economic Perspectives 15, 143–156. Koenker, R. & Xiao, Z. (2002), ‘Inference on the quantile regression process’, Econometrica 70, 1583–1612. Li, K. C. (1985), ‘From stein’s unbiased risk estimates to the method of generalized cross validation’, The Annals of Statistics 13(4), 1352–1377. Manski, C. F. (1989), ‘Anatomy of the selection problem’, The Journal of Human Resources 24(3), 343–360. 29 Manski, C. F. (1994), The selection problem, in C. Sims., ed., ‘Advances in Econometrics: Sixth World Congress’, Cambridge University Press, pp. 143–170. Mincer, J. (1973), Schooling, Experience, and Earnings, NBER, New York. Mroz, T. A. (1987), ‘The sensitivity of an empirical model of married women’s hours of work to economic and statistical assumptions’, Econometrica 55(4), 765–799. Mulligan, C. B. & Rubinstein, Y. (2005), ‘Selection, investment, and women’s relative wages since 1975’, NBER Working Paper. Newey, W. K. (1991), ‘Two-step series estimation of sample selection models’, unpublished manuscript, M.I.T. Powell, J. (1986), ‘Censored regression quantiles’, Journal of Econometrics 32, 143–155. Powell, J. (1987), ‘Semiparametric estimation of bivariate latent variable models’. unpublished manuscript, University of Wisconsin-Madison. 8 Appendix If h(g) is known, ^ can be estimated in a GMM framework by the moment condition (z 0 ; y; d; ; ; ) = d[ where r = (x; h(g)). Let mately 0 = ( 0; 1=2 + 1=2 sgn(y x0 h(g(z; )))]r ). The quantile regression estimator for 1=n n X solves approxi- (z 0 ; y; d; ^ ; ^ ) = 0 i=1 p p and su¢ cient smoothness of (z 0 ; y; d; ; ; ) w.r.t. Assuming n-consistency of ^ , ^ ! , the mean value theorem applied to the …rst-order conditions of to get 0 = 1=n n X ( ; )+ i=1 @ ( l; l) (^ @ 0 )+ @ ( l; l) (^ @ 0 0) ; where l , l are on the line segments connecting ^ ; and ^ ; , respectively and where is the short form of (z 0 ; y; d; ; ; ). It follows from equation (17) that p n n(^ )= 1 X @ ( l; l) n @ 0 1 i=1 Buchinsky (1998a) shows that p = n 1 X @ ( l; l) p n(^ n @ 0 0; 0) i=1 L n(^ 1 fr [ n 1 X p ( n ) ! N (0; (1 ) rr + , (17) ( ; ) 0) i=1 (18) ) and f rx 0 f rx ] 1 fr (19) where fr = E[dfe (0jr)rr0 ]; f rx = E[dfe (0jr) 30 @h(g) @g 0 rz 0 ]; rr = E[drr0 ]: (20) and f (=cdot) denotes the distribution function. The asymptotic covariance matrix is the top left k k matrix of . By combining (18), (20) and (A4) of Buchinsky (1998a) one obtains p n 1 X p )! p ` n n(^ ; f rx p n(^ 0) (21) i=1 where ` ; = d( Ify < x0 (1993) into (21) to get p n(^ + h(g(z; ))g)r. We now insert equation (49) of Klein & Spady p )! n 1 fr 1 X p (` n ; f rx p 1 k ); (22) i=1 where k = @p( @ 0) d p( 0 ) p( 0 )(1 p( 0 )) and p =E @ 31 P( @ 0 0 ) @P ( 0 ) @ 1 p( 0 )(1 p( 0 )) . (23)
© Copyright 2024