Sample Selection, Heteroscedasticity, and Quantile Regression Blaise Melly, Martin Huber Preliminary

Sample Selection, Heteroscedasticity, and Quantile Regression
Blaise Melly, Martin Huber
Preliminary
First draft: December 2006, Last changes: February 2008
Abstract:
Independence of the error term and the covariates is a crucial assumption in virtually all
sample selection models. If this assumption is not satis…ed, for instance due to heteroscedasticity,
both mean and quantile regression estimators are inconsistent. If independence holds indeed, all
quantile functions and the mean function are parallel, which naturally limits the usefulness of
quantile estimators. However, quantile estimators can be used to build tests for the independence
condition because they are consistent under the null hypothesis. Therefore, we propose powerful
tests based on the whole conditional quantile regression process. If the independence assumption is
violated, quantile functions are not point identi…ed, but we show that it is still possible to bound
the coe¢ cients of interest. Our identi…ed set shrinks to a single point either if independence
holds or if some observations are selected and observed with probability one. Therefore, our
model generalizes simultaneously the traditional sample selection models and the identi…cation
at in…nity strategy.
Keywords: sample selection, quantile regression, heteroscedasticity, test, bootstrap, bounds
JEL classi…cation:
C12, C13, C14, C21
We have bene…ted from comments by Michael Lechner and seminar participants at the University of St.
Gallen. Addresses for correspondence: Blaise Melly, MIT Department of Economics, 50 Memorial Drive, E52-251d,
Cambridge, MA 02142, USA, [email protected], www.siaw.unisg.ch/lechner/melly; Martin Huber, SIAW, University
of St. Gallen, Varnbüelstrasse 14, 9000 St. Gallen, Switzerland, [email protected].
1
Introduction
Selection bias arises when the outcome of interest is only observable for a subsample of individuals conditional on selection and when selection is not random. A prominent example in labor
1
economics consists of the determinants of wages and labor supply behavior of females. Individuals are assumed to o¤er a positive labor supply only if their potential wage exceeds their reservation wage such that a selection bias arises if we try to estimate the wage o¤er function in the
working subsample. The ability to consistently estimate econometric models in the presence of
nonrandom sample selection is one of the most important innovations in microeconometrics, as
illustrated by the Nobel Prize received by James Heckman.
Gronau (1974) and Heckman (1974 and 1979) have addressed the selectivity bias and
proposed fully parametric estimators. Naturally, this approach leads to inconsistent results if
the distribution of the error term is misspeci…ed. Therefore, Cosslett (1991), Gallant & Nychka
(1987), Powell (1987), Ahn & Powell (1993), and Newey (1991) have proposed semiparametric
estimators for the sample selection model. More recently, Das, Newey & Vella (2003) have
proposed a fully nonparametric estimator for this model. While these papers have weakened
the parametric and distributional assumptions made originally, they all assume independence
between the error term and the regressors.
This assumption is crucial in parametric,
semiparametric and nonparametric models, as conditioning on the selection probability is not
su¢ cient to correct for the selection bias if it is violated.
However, dependence in general and heteroscedasticity in particular is an ubiquitous phenomenon in the …elds where sample selection models have been used. As suggested by Mincer (1973)
in his famous human capital earnings model, residual wage dispersion should increase with experience and education. In line with this …nding, the large majority of the applications of quantile
regression in the empirical literature …nds di¤erent coe¢ cients on di¤erent parts of the conditional distribution. Therefore, the independence assumption cannot be taken as granted in most
economic applications. Donald (1995) has alleviated this assumption and proposed a two-step
estimator that allows for conditional heteroscedasticity but requires the error terms to be bivariate normally distributed.1 Since distributional assumptions are always di¢ cult to motivate and
it is not clear why the regressors should a¤ect only the …rst two moments of the conditional distribution, we suppress the normality assumption in this paper.
In the absence of selection, Koenker & Bassett (1978) proposed and derived the statistical
1
Chen & Khan (2003) have proposed a semiparametric estimator allowing for heteroscedasticity. However, it
appears that the proper identi…cation of their model requires a variable a¤ecting the variance but not the mean
of the dependent variable conditionally on the regressors. This additional exclusion restriction renders this model
unattractive.
2
properties of a parametric (linear) estimator for conditional quantile models. Due to its ability
to capture heterogeneous e¤ects, its theoretical properties have been studied extensively and it
has been used in many empirical studies; see, for example, Powell (1986), Guntenbrunner & Jureµcková (1992), Buchinsky (1994), Koenker & Xiao (2002), Angrist, Chernozhukov & FernándezVal (2006). Chaudhuri (1991) analyzed nonparametric estimation of conditional QTE. Buchinsky
(1998b), Koenker & Hallock (2001), and Koenker (2005) provide a comprehensive discussion of
the quantile regression models and its recent developments.
Buchinsky (1998 and 2001) was the …rst to consider the semiparametric sample selection
model for conditional quantiles. He extends the series estimator of Newey (1991) for the mean
to the estimation of quantiles.
The problem with this approach is that the independence
assumption is required to obtain the nice partial linear representation for the mean and the
quantiles. Implicitly, Buchinsky (1998 and 2001) assumes independence between the error term
and the regressors conditional on the selection probability. One implication of this assumption
is that all quantile regression curves are parallel. Naturally, this restricts the usefulness of the
estimator because it implies that all quantile regression coe¢ cients are identical and equal to
the mean slope coe¢ cients. Thus, this approaches does not allow to estimate the e¤ects of the
regressors on the conditional distribution of the dependent variable.
However, Buchinsky (1998a) estimator can be very usefull. The …rst motivation for quantile
regression was not to estimate the e¤ects of covariates on the conditional distribution (actually,
Koenker and Bassett assume independence in their seminal paper) but was the robustness of
the estimates in the presence of non-Gaussian errors. A similar result applies to the sample
selection model and we show that very signi…cant e¢ ciency gains can be achieved when the
distribution of the error term has fat tails. The second motivation for quantile regression was to
provide robust and e¢ cient tests for the presence of heteroscedasticity, as suggested by Koenker
& Bassett (1982). Testing the independence assumption is even more acute in the presence of
sample selection. As explained above, both mean and quantile estimators are inconsistent when
this assumption is violated. It is therefore surprising that such a test has not been proposed, what
we do in section 3. Under the null hypothesis of independence, the quantile regression estimator
proposed by Buchinsky (1998a) consistently estimates the coe¢ cients, that are the same for all
quantiles. When the independence assumption is violated, the estimates are not consistent but
the slope coe¢ cients di¤er from one quantile to another, which gives power to the test.
3
As suggested by Koenker & Bassett (1982) we could consider a …nite number of quantiles at
test whether the regression coe¢ cients are the same at all of these quantiles. However, a more
powerful test statistic can be built using the whole conditional quantile process. We therefore
suggest a test procedure similar to that proposed by Chernozhukov & Fernández-Val (2005).
The critical values for this test are obtained by subsampling the empirical quantile regression
processes. Since the computation of the estimates is quite demanding, we also apply the suggestion
of Chernozhukov & Hansen (2006) consisting in resampling the score instead of re-computing
the whole process. Our Monte Carlo simulations show that both the size and power of our
Kolmogorov-Smirnov and Cramer-Von-Mises-Smirnov statistics are very satisfactory.
This paper would be incomplete if we did not o¤er a solution when the independence assumption is rejected, what we expect to be the case in a wide range of applications. In this case, it
appears that point identi…cation of the mean and quantile coe¢ cients is impossible. However, in
the spirit of the work of Manski (1989 and 1994), we show that it is still possible to bound the
quantile coe¢ cients even in the absence of bounded support. Our bounds are more informative
than the worst-case bounds of Manski because we maintain the linear functional form and we
make an independence assumption for the ranks of the distribution function. This last assumption relaxes the traditional independence assumption since it allows for heteroscedasticity and all
type of dependence between the covariates and the potential wage distribution.
A very appealing feature of our bounds is that the identi…ed interval collapses to a single point
in two special cases: when there is independence between the error terms and the covariates and
when there are observations whose probability of selection is close to one. The …rst case is obvious
since we are back to the classical sample selection model but it is important since it implies that
the upper and lower bounds will be quite close when we have only a small amount of dependence.
The second case is an example of Chamberlain (1986) "identi…cation at in…nity". This approach
has been used by Heckman (1990) and Andrews & Schafgans (1998) to identify the constant in
a traditional sample selection model. In the case of dependence between the error term and the
covariates, it can even be used to identify the slope coe¢ cients. Our bounds also generalize this
identi…cation strategy to the case where some observations are observed with a high, but below 1
probability. In this case, we may get a short identi…ed interval for the coe¢ cients of the quantile
regression even when they are not point identi…ed.
Two short applications illustrate our results. First we apply our tests to the small textbook
4
data set of Mroz (1987) and can reject the independence assumption at the 10% signi…cance level.
Second, using the data set of Mulligan & Rubinstein (2005), we …rst reject the null hypothesis of
independence at the 0.1% level. We then bound the coe¢ cients of the model under our weaker
set of assumptions.
The remainder of this paper is organized as follows. In section 2 we describe the sample
selection model and discuss the role of the independence assumption. In section 3 we outline the
test procedure. Section 4 is devoted to the bounds of the coe¢ cient bounds when the independence
assumption is rejected. In section 5 Monte Carlo simulations show the possible e¢ ciency gains of
quantile regression in the sample selection model as well as the power and size properties of our
tests. Section 6 revisits two typical applications of sample selection models. Section 7 concludes.
2
The Classical Sample Selection Model
The parametric two step estimator to control for sample selection bias in economic applications
was …rst proposed by Heckman (1976 and 1979) and is known as type II tobit or heckit estimator.
Newey (1991) suggested a semiparametric two step estimator based on series expansion of the
inverse Mill’s ratio. Buchinsky (1998a) suggested an extension of this model to the estimation
of conditional quantiles. As in these papers, we assume that the potential outcome is linearly
dependent of X, a vector of covariates2 :
Yi = c ( ) + Xi0 ( ) + "i ( ) .
(1)
The error term is assumed to satisfy the th quantile restriction, Q (" ( ) jX) = 0, such that
( ) could be estimated consistently by traditional quantile regression if there was no sample
selection problem. However, Y
is latent and only observed conditional on Di = 1. Thus, the
observed outcome Y is de…ned as
Yi = c ( ) + Xi0 ( ) + "i ( ) if Di = 1 and not observed otherwhise.
D is an indicator function that depends on Z, a superset of X.3 Identi…cation of
( ) requires
identi…cation of Pr (D = 1jZ). The rest of the paper does not depend on how Pr (D = 1jZ) is
2
It is important to precise that all the insights of this paper (inconsistency in the presence of dependence,
possibility to test the independence assumption, bounds) are valid for a nonparametric sample selection model. We
consider the classical parametric model because it is more often applied and for simplicity.
3
For identi…cation, Z has to include at least one continuous variable which is not in X and has a non-zero
coe¢ cient in the selection equation.
5
identi…ed but for completeness we make the following assumption
Di = 1 Zi0 + ui
0 :
(2)
This is a parametric restriction for the sample selection equation. We implement our test statistic
by using the estimator suggested by Klein & Spady (1993) to estimate . Therefore, we use their
assumptions, in particular we assume that U ? ZjZ 0
This independence assumption condition-
ally on the index can be relaxed if Pr (D = 1jZ) is estimated nonparametrically, as proposed in
Ahn & Powell (1993).
The observed conditional quantiles of the observed outcome can be formulated as
Q (Y jX; D = 1) = c ( ) + Xi0 ( ) + Q (" ( ) jX; D = 1).
If selection into observed outcomes was random, then Q (" ( ) jX; D = 1) = 0 and the
outcome equation could be estimated consistently by quantile regression. However, in general
Q (" ( ) jX; D = 1) 6= 0. For identi…cation, Buchinsky (1998a) assumes
Assumption 1 : (u; ") has a continuous density;
Assumption 2 : fu;" ( jZ) = fu;" ( jZ 0 ).
Assumption 2 implies that Q (" ( ) jX; D = 1) depends on X only through the linear index
4
Z0
and provides us with the following representation
Q (Y jX; D = 1) = c ( ) + Xi0 ( ) + Q (" ( ) jX; D = 1):
h (Zi0 ) = Q (" ( ) jZi0 ; D = 1) is an unknown nonlinear function and the residual
( ) satis…es
the quantile restriction Q ( ( ) jX; h (Z ) ; D = 1) = 0 by construction. This representation
shows that
( ) can be estimated by a quantile regression of Y on X and on a series approximation
of Z 0 as long as there is an excluded element in Z with a corresponding nonzero coe¢ cient in .
The problem is that Assumption 2 implies that the quantile slope coe¢ cients
( ) are con-
stant across the distribution and are equal to the mean slope coe¢ cient . This becomes obvious
when remembering that X
Z: Therefore, Assumption 2 implies that "?XjZ 0 . Conditional
on h (Z ) the distribution of (u; ") does not depend on the regressors’ values X and are homoscedastic. Thus, quantile regression does not provide more information on the e¤ect of X on
4
More generally, we can allow for fU;E ( jZ) = fU;E ( j Pr (D = 1jZ)). This weakens the …rst step independence
assumption but not the second step one.
6
Y than mean regression. All conditional quantile functions are parallel and only the constant
changes from one conditional quantile to another.5
Let us now assume that "?XjZ 0
case ",
does not hold such that assumption 2 is violated. In this
are generally heteroscedastic and dependent on X: and fu;" is not independent of Z
even when conditioning on Z 0 . Coe¢ cients di¤er across quantiles and the regression slopes
Xi0
are not parallel for various . Including h (Zi0 ) in the outcome equation will generally
not allow to estimate ( ),
consistently. This is due to the fact that the selection rule is likely
to select either low or high values of latent outcomes into the subsample of observed outcomes
due to the covariance of (u; "). This is innocuous when the errors are homoscedastic conditional
on Z 0 , as selection shifts only the location of the (parallel) quantile and mean regression curves,
whereas their gradient remains unchanged. However, in the presence of heteroscedastic errors,
positive or negative sample selection generally causes ^ ( ) and ^ to be systematically over- or
underestimated.
A graphical illustration shall elucidate the intuition. Figure 1 displays 500 simulated realizations of (X; Y ) under homoscedastic (1a) and heteroscedastic (1b) errors conditional on Z 0 .
The true median regression curve (solid line) in 1a and 1b is horizontal, and thus, the median
slope coe¢ cient (0:5) on X given Z 0 is zero. Sample selection in 1a leaves the regression curve
(dashed line) in the subsample of observed (X; Y ), i.e. crosses with framing, unchanged. Merely
its location is shifted upward, as realizations with small Y
are more likely not to be observed
(crosses without framing). However, when errors are heteroscedastic as in 1b, the disproportionate non-observability of realizations with low Y causes the slope coe¢ cient among observed realizations to diverge from the true value. In the case considered, the upward slanted regression
curve indicates a positive relationship between the regressor and the outcome, even though the
true coe¢ cient is zero.
As one cannot observe whether the selection process more likely ’cuts out’ high or low Y ,
nonconstant estimates of quantile slope coe¢ cients bear no economic interpretation. They merely
tell us, that at least one basic condition necessary in sample selection models is violated and
that neither quantile, nor mean coe¢ cient estimators are consistent6 . However, this obvious
5
The constant is not identi…ed without further assumptions. Only an identi…cation at in…nity argument can
solve this identi…cation problem.
6
Only if selection was random, the estimators would be consistent, but then h (Z 0 ) = 0 and the basic intuition
for using sample selection models breaks down.
7
shortcoming also bears the major attraction of the two step quantile estimator. In fact, it can
be used as a test for the independence assumption between errors and regressors by testing the
null hypothesis H0 :
=
8
[0; 1], where
is some unknown constant coe¢ cient. If the
null hypothesis holds, so does assumption 2. If coe¢ cients di¤er signi…cantly across quantiles,
neither H0 nor assumption 2 hold and both the quantile and mean estimators yield inconsistent
estimates.
Figure 1
Regression slopes under homoscedasticity (1a) and heteroscedasticity (1b)
3
Test Procedure
Our test procedure can be sketched as follows. We …rst estimate the selection equation using
Klein & Spady (1993) estimator. We then estimate the conditional quantile regression process
by approximating the bias by a series expansion of the inverse Mill’s ratio, as suggested by
Newey (1991) for the mean and Buchinsky (1998a and 2001) for the quantiles. We test the
independence assumption by testing whether the quantile regression slopes are the same over
the whole distribution. The critical values of the test statistic are obtained by resampling as
8
presented in Chernozhukov & Fernández-Val (2005). When resampling is computationally too
costly we use score resampling as suggested by Chernozhukov & Hansen (2006).
In details, the semiparametric discrete choice estimator suggested in Klein & Spady (1993) is
used to estimate the selection equation (2). Formally7 :
^
max
<
Xn
(1
o
^
^ i jZ; )] ;
E(DjZ;
)] + Di log[E(D
Di ) log[1
where
^
E(DjZ;
)=
P
j6=i
P
Dj ((Zi0
Zj0 )=bn )
((Zi0
j6=i
Zj0 )=bn )
(3)
;
(4)
where bn is a bandwidth dependent on sample size n and ( ) is a kernel function. The optimal
bandwidth bopt
is determined by the generalized cross validation criterion (GCV) as discussed
n
in Craven & Wahba (1979), Golub, Heath & Wahba (1979) and Li (1985), among others. This
estimator attains the semiparametric e¢ ciency bound and is the most e¢ cient among those
semiparametric estimators that do not put any restrictions on the distribution of the error term.
Furthermore, heteroscedasticity of unknown form is allowed as long as it depends on the regressors
only via the index. Klein and Spady’s Monte Carlo simulations indicate that e¢ ciency losses are
only modest compared to probit estimation when the error term is standard normal, while being
considerably more e¢ cient in …nite samples when the errors are non-Gaussian.
In a second step, the function h(Z 0 ) is approximated by a power series expansion. The exact
form of the approximation is asymptotically irrelevant. As suggested by Buchinsky (1998), we
use a power series expansion of the inverse Mill’s ratio of the normalized estimated index. Thus,
the …rst order approximation will be su¢ cient if the error term is normally distributed. Anyway,
the estimator is consistent since the order of the approximation increases with the sample size.
The coe¢ cient estimates ^ and ^ are obtained by solving the following minimization problem:
X
^ ( ) ; ^ ( ) = min 1
; n
where
(a) = a(
sett (1978) and
J
1 (a
Yi
Xi0
J
Zi0 ^
(5)
0)) is the check function suggested by Koenker & Bas-
(Zi0 ^ ) is a polynomial vector in the inverse Mill’s ratio
0
J (Zi ^ )
=
(1; (Zi0 ^ ); (Zi0 ^ )2 ; :::; (Zi0 ^ )J ). Again, GCV is used to determine the optimal maximum order
J.
7
More precisely: we apply the estimator used in Ger…n (1996) where the trimming term
o
Pn
^
^
equation (3) such that ^ max <
(1 d) log[1 +
E(djz;
)] + d log[E(djz;
)+ ] .
9
= 0:5=n is added to
As discussed in section 2, one would like to test H0 :
( )=
for 8
(0; 1), where
is
some unknown constant coe¢ cient. This can be done by de…ning a grid of q equidistant quantiles
between zero and one,
1:q
T
(0; 1) ; and considering the general null hypothesis
H0 :
We estimate
( )= ;
T:
(6)
by the median regression coe¢ cients vector ^ (0:5). Alternatively, we could use
the mean estimate but this would require the existence of at least the …rst two moments of Y
given X. We test the null hypothesis using the Kolmogorov-Smirnov (KS) and the Cramer-VonMises-Smirnov (CMS) statistics on the empirical inference process ^ ( )
p
TnKS = sup njj ^ ( )
T
where jjajj ^ denotes
uniformly in
.
p
^ (0:5) jj ^
and
TnCM S = n
Z
T
jj ( )
^ (0:5):
^ (0:5) jj2 d ;
^
a0 ^ a and ^ is a positive weighting matrix such that ^ =
(7)
+ op (1),
is positive de…nite, continuous and symmetric, again uniformly in
We simply use the identity matrix in our simulations and aplications.
.
TnKS ; TnCM S per se
are obviously not very useful since we don’t know their asymptotic distribution.
However,
Chernozhukov & Fernández-Val (2005) show that asymptotically valid critical values can be
obtained by bootstrapping the recentered test statistic. To this end, B subsamples of block size
m are drawn out of the original sample with replacement to compute the inference process
^
where 1
i
m;i (
)
^
m;i (0:5) ;
(8)
B and ^ m;i ( ) are the quantile slope coe¢ cient estimates for draw i and block size
m. The corresponding KS and CMS test statistics of the recentered bootstrapped process are
p
KS
Tn;m;i
= sup mjj ^ m;i ( ) ^ m;i (0:5) ( ^ ( ) ^ (0:5))jj ^
and
T
Z
CM S
Tn;m;i
= m
jj ^ m;i ( ) ^ m;i (0:5) ( ^ ( ) ^ (0:5))jj2^ d :
(9)
T
The distribution free p-values for the test statistics are obtained by simply estimating the probP
ability Pr[T ( ^ ( ) ^ (0:5) ( ( )
)) > Tn ] by 1=B B
i=1 IfTn;m;i > Tn g, where Pr[ ] is a
probability measure and I is the indicator function. However, the repeated computation of co-
e¢ cient estimates for each resampling step can get quite costly, especially in large samples. For
this reason, we follow Chernozhukov & Fernández-Val (2005) and use score resampling based on
10
the linear approximations of the empirical inference processes instead, which is considerably less
burdensome. The linear representation for the inference process is given by
p
n
n( ^ ( )
^ (0:5)
( ( )
)) =
1 X
p
si ( ) + op (1):
n
(10)
i=1
Again, B subsamples of estimated scores are drawn. Let
m and 1
i
a speci…c subsample for blocksize to
P
B. The estimated inference process is de…ned as 1=m j i s^j ( ). KS and CMS
i
statistics are then
KS
Tn;m;i
p
sup mjj1=m
T
X
j
s^j ( )jj ^
and
i
CM S
Tn;m;i
m
Z
T
jj1=m
X
j
s^j ( )jj2^ d :
(11)
i
The computation of the score function in the quantile sample selection model framework is presented in the appendix of Buchinsky (1998a), using the Ichimura (1993) estimator for the …rst
step estimation of the selection probability. In the appendix, we present a slightly di¤erent version, which is adapted to the estimator suggested by Klein & Spady (1993).
4
Bounds
It has been argued in section 2 that the traditional mean and quantile sample selection estimators are inconsistent if the independence assumption is violated. Still, this does not necessarily
mean that nothing can be said about the the coe¢ cients’size. In the normal multiplicative heteroscedastic sample selection model, the coe¢ cients are even point identi…ed as shown by Donald (1995). Under a much more general setting, Manski (1989 and 1994) derives the worst-case
bounds for the conditional mean and quantiles in the absence of any further assumption.
Here, an intermediate path between the worst case bounds of Manski and the classical sample
selection model is pursued. Apart from the independence assumption, all model assumptions
made by Buchinsky (1998a) are maintained to derive bounds of the quantile regression coe¢ cients.
Thus, the distribution of Y is not restricted and the regressors are allowed to in‡uence the whole
conditional distribution of the dependent variable and not only the …rst moment(s). To this end,
independence is replaced by the following, less restrictive, assumption:
FY (FY 1 ( jX = x)jX = x; P = p; D = 1) = FY (FY 1 ( jX = x
~)jX = x
~; P = p; D = 1) = (p);
(12)
8 x and x
~ in the support of X. P denotes the selection probability de…ned as P = Pr(D = 1jZ).
Equation (12) states that the rank (p) in the observed distribution that corresponds to rank
11
in the latent (true) distribution does not depend on X given P . This assumption is implied
by the stronger independence assumption that has been made in the majority of studies using
parametric and nonparametric selection models. In contrast to Manski (1989 and 1994), equation
(12) excludes for instance the possibility of positive selection for low values of X and negative
selection for high values of X. This is the main reason why tighter bounds are obtained. The
second di¤erence with Manski is that a linear parametric speci…cation for the conditional quantile
of the dependent variable is assumed:
FY 1 ( jX = x) = c ( ) + x0 ( ) :
(13)
Linearity is not essential for the basic idea of the bounds but it will be maintained for simplicity.
By combining equations (12) and (13), one obtains
FY (c ( ) + x0 ( ) jX = x; P = p; D = 1) = FY (c ( ) + x
~0 ( ) jX = x
~; P = p; D = 1) = (p):
The value at the (p)th conditional quantile in the observed outcome corresponds to the value
at the
th
conditional outcome in the latent outcome. If (p) was known,
( ) could be con-
sistently estimated by regressing Y on X and on a nonparametric series expansion of P at the
(p)th quantile, conditional on D = 1. However, (p) is unknown to the researcher due to the
unconsciousness of the selection rule. If there was random or no sample selection, (p) would be
(1 p)
,
p
equal to . Under nonrandom and highly positive selection, (p) is equal to
under highly negative selection, (p) is equal to
p
8.
whereas
Along with assumption (12) this information
can be used to bound the unknown (p). By bounding (p),
denote the vector of the true slope coe¢ cients at the
th
( ) is bounded, too. Let ( ; p)
quantile for observations with P = p
and let ( ; p) denote the vector of the slope coe¢ cients at the (p)th quantile for the observed
realizations with P = p. Let PD=1 denote the set of selection probabilities for the observed population, i.e. PD=1 encounters all P conditional on D = 1. The true quantile coe¢ cient at P = p,
8
An example should elucidate this point. Consider a distribution that is equal to the closed interval f1; 2; ::; 10g.
The value at
= 0:4 is then 4. Let p = 0:8. Under highly positive selection, the lower 1
p (0:2) share of the
distribution will not be observed after selection. I.e., f1; 2g are not selected and the observed interval is f3; 4; :::; 10g.
4 is now at the 0:25th quantile of observed values, i.e. 0:25 = (p) =
(1 p)
.
p
Under highly positive selection , the
upper 0:2 share of the distribution, i.e. f9; 10g are not selected an the observed interval is f1; 2; :::; 8g. 4 is now at
the 0:5th quantile of observed values, i.e. 0:5 = (p) =
p
. The same holds for any other value for
12
and p.
( ; p), is located within these bounds:
( ; p) [L (p) =
min
(1 p)
;p]
p
2[
( ; p);
U (p) =
max
2[
( ; p)];
(1 p)
;p]
p
(14)
where p 2 PD=1
(15)
Let us additionally assume that ( ; p) is constant in p, i.e., ( ; p) =
( ). Then,
( ) has to
be an element out of the intersection of possible values for ( ; p) across all selection probabilities
in the observed population:
( ) 2 [max L (p);
p
Thus, the …nal bounds for
of possible values for
min U (p)]:
p
are the best case solution (in the sense that they minimize the range
over various p) out of all bounds obtained in equation (14). However, the
identi…cation of bounds on (p) (and
( )) hinges on two conditions. Let
denote the maximum
of all P 2 PD=1 , i.e. the maximum of the selection probabilities in the observed population.
In order to obtain informative bounds, (i)
> 0:5 and (ii) 1
<
<
has to hold. For
0:5, bounds on (p), that is naturally bounded to the interval [0; 1], are not informative as
either
(1
)
0, or
1, or both. For such cases
it is obvious that
has to lie within the 1
distribution, as 1
;
( ) cannot be bounded either. Secondly,
upper share and
lower share of the latent
determine the observed distribution for the worst cases of highly positive
and highly negative selection, respectively. Informative bounds cannot be obtained for any
hitting the boundaries of or falling out of the interval [1
; ], which gives rise to 1
<
< .
We can use our framework of interval identi…cation based on bounds to investigate under
which conditions point identi…cation of
( ) is obtained. Point identi…cation can be considered
as special case when the interval that is identi…ed collapses to a single point. This is the case
when either independence between the error terms and the covariates or identi…cation at in…nity
(or both) is satis…ed. If independence is satis…ed, point identi…cation of the slope coe¢ cients is
obtained because all slope coe¢ cients are constant across quantiles and thus, the minimum is
equal to the maximum in 14. In this case, we are back in the framework of the classical sample
selection model. If independence does not hold, point identi…cation is still feasible if the data
contain observations with selection probability P = 1. Note that P = 1 is not only su¢ cient,
but also necessary for point identi…cation under heteroscedasticity. This is obvious from the fact
that only for P = 1,
(1 p)
p
=
p
=
= (p), whereas for 0 < p < 1, the upper and lower
13
bound of (p) di¤ers by
(1 p)
.
p
Identi…cation conditional on P = 1 is known as identi…cation
at in…nity and has …rst been discussed by Chamberlain (1986). The strategy suggested in this
section, however, does not require to have such observations at hand and still allows to bound
the coe¢ cients for the share of the population for which P > 0:5. Furthermore, the constant can
be bounded by the same strategy. In contrast to the slope coe¢ cient estimation, the constant
is not point identi…ed even when independence is satis…ed, but P < 1 for all observations.
However, point identi…cation of the constant is feasible by identi…cation at in…nity as it has
been discussed in Heckman (1990) and Andrews & Schafgans (1998). As a …nal remark it is
worth noting that one needs not necessarily assume
(p) =
(see the discussion in Heckman
& Vytlacil (2005)). Without this assumption, it is not possible to integrate the results over
di¤erent participation probabilities but the bounds on
5
(p) at each P = p remain valid.
Monte Carlo Simulations
In this section, we present the results of Monte Carlo simulations on the e¢ ciency and robustness
of quantile regression in sample selection models as well as on the power and small sample size
properties of our tests in …nite samples.
In their seminal paper on quantile regression, Koenker & Bassett (1978) provide Monte Carlo
results on the e¢ ciency of various estimators for several distributions of the errors. One of their
conclusions is that under Gaussian errors, the median estimator makes only small sacri…ces of
e¢ ciency compared to the mean estimator. It is, however, considerably more e¢ cient when errors
follow a non-Gaussian distribution, such as Laplace, Cauchy or contaminated Gaussian. Thus,
even if the errors are independent of the covariates, quantile regression methods can be preferable
to mean regression for the sake of e¢ ciency gains. A second argument in favor of quantile
regression is its increased robustness in the case of contaminated outcomes resulting in smaller
biases in coe¢ cient estimates. To illustrate that such e¢ ciency and robustness considerations
also apply to sample selection models, we conducted Monte Carlo simulations for (i) t-distributed
error terms with three degrees of freedom, (ii) Cauchy distributed error terms, (iii) contaminated
normal errors, (iv) contaminated outcomes. The data generating process in speci…cations (i) to
14
(iii) is de…ned as
Di = IfXi + Zi + ui > 0g;
Yi = Xi + "i if Di = 1;
X
N (0; 1);
= 1;
In speci…cation (i) u; "
Z
N (0; 1);
= 1;
t(df=3) and in (ii) u; "
= 1:
Cauchy. The covariance between the errors is
set to 0.8 for both cases, i.e. Cov(U; E) = 0:8. In (iii), the errors are a Gaussian mixture which
is constructed as
ui = 0:95 ui1 + 0:05 ui2 ;
"i = 0:95 "i1 + 0:05 "i2 ;
Cov(u1 ; "1 ) = (0:8);
where
"
u1
"1
N (0; 1);
N (0; 1);
u2
"2
N (0; 100);
N (0; 100);
Cov(u2 ; "2 ) = (8);
is the cumulated density function. In (iv) the errors are standard normal: u
N (0; 1),
N (0; 1), Cov(u; ") = 0:8. However, the outcome is contaminated by the factor 10 with 5%
probability:
Yi = Di [(1
j)(Xi + ") + j(10Xi + ")];
j
U (0; 1);
P r(j = 1) = 0:05:
For each of the model speci…cations a simulation with 1000 replications is conducted for a sample
of n = 400 observations. The bandwidth for the (…rst step) selection estimator is set to bn = 0:3.
Table 1
Coefficient estimates and variances of mean and median estimators
400 observations, 1000 replications, bandwidth = 0.3
Median estimator
Distributions
Mean estimator
Estimate
Variance
Estimate
Variance
(i) Student’s t (df=3)
1.015
0.018
1.004
0.028
(ii) Cauchy
1.026
0.037
1.498
2756.653
(iii) Contaminated normal error
0.998
0.014
0.985
0.061
(iv) Contaminated outcome
1.105
0.016
1.914
0.178
The estimates and variances of the
-coe¢ cients for the median and mean estimator are
reported in table 1. In all speci…cations considered, the median estimator is more e¢ cient than the
15
mean estimator. In the contaminated outcome speci…cation it outperforms the mean estimator
more than 10 times. In the case of Cauchy distributed error terms, the variance of the mean
coe¢ cient is theoretically unbounded whereas it stays quite moderate for the median estimator.
The median estimator is also superior in robustness, which becomes particularly clear when
looking at the coe¢ cient estimates of the contaminated outcome speci…cation. While the mean
estimate is severely upward biased, the median estimate is still only moderately higher than the
true coe¢ cient
= 1.
In the second part of this section, we present results on the power and size properties of
the Kolmogorov-Smirnov (KS) and Cramer-Von-Mises-Smirnov (CMS) resampling tests in …nite
samples. We do so for three speci…cations based on (i) Gaussian, (ii) t-distributed and (iii) Cauchy
distributed error terms u and ". In speci…cation (i), the data generating process is de…ned as:
Di = IfXi + Zi + ui g > 0;
Yi = Xi + (1 + Xi )"i ];
u
N (0; 1);
= 1;
where
,
"
= 1;
N (0; 1);
= 1;
Cov(u; ") = 0:8;
X
N (0; 1);
Z
N (0; 1);
= 0; 0:2; 0:5;
are mean coe¢ cients. One is interested in the rejection frequencies of the KS and
CMS statistics testing the null hypothesis of constant coe¢ cients across all quantiles by drawing
repeated bootstrap samples. As outlined in section 2, H0 :
=
is some unknown constant coe¢ cient. In the location shift model (
8 0
1, where
= 0), the error term E
is independent of the regressor X and hence, H0 is true. In this case, the rejection rates in the
Monte Carlo simulations yield the tests’size properties. Heteroscedasticity is introduced if
is set
to values di¤erent from zero. In the location scale shift models ( = 0:2; 0:5), the null hypothesis
is false and the rejection frequencies indicate the tests’power to reject the incorrect H0 . In order
to construct the test statistics, the coe¢ cients
are estimated at equidistant quantiles with
step size 0:01 and compared to the the median estimate ^
=0:5 .
Results are presented for three
di¤erent quantile regions for which the quantile coe¢ cients are estimated:
8
>
>
T
= f0:05; 0:06; :::; 0:95g
>
< [0:05;0:95]
T[0:1;0:9] = f0:10; 0:11; :::; 0:90g
>
>
>
: T
= f0:20; 0:21; :::; 0:80g
[0:2;0:8]
Therefore, the number of estimated quantile coe¢ cients di¤ers across regions. In particular,
16
the largest region [0:05; 0:95] will include quantiles that are relatively close to the boundaries
0; 1, whereas all quantiles in the ‘narrow’ region [0:2; 0:8] are situated well in the interior. As
will be shown below, the choice of the region a¤ects the tests’…nite sample properties and it is
the distribution of the error term " that determines wether a narrow or large quantile region is
preferable. Similarly to Chernozhukov & Fernández-Val (2005), 1000 Monte Carlo replications
per simulation and 250 bootstrap replications within each replication are conducted to compute
the critical values of the test statistics. Five samples of sizes n = 100 to n = 3200 are generated.
In each sample, we draw two di¤erent subsamples with replacement for the bootstrap, with block
size m = 20 + n1=4 and m = n, respectively.
17
Table 2
Empirical rejection frequencies for 5% resampling tests
"
N(0,1), 250 bootstrap draws, 1000 replications
Kolmogorov-Smirnov statistics
m = 20 + n1=4
=0
= 0:2
= 0:5
[0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]
n = 100
0.006
0.000
0.000
0.007
0.003
0.001
0.034
0.011
0.002
n = 400
0.030
0.014
0.003
0.306
0.184
0.070
0.920
0.818
0.500
n = 800
0.024
0.026
0.012
0.644
0.517
0.283
1.000
0.998
0.957
n = 1600
0.040
0.032
0.020
0.936
0.906
0.717
1.000
1.000
1.000
n = 3200
0.030
0.032
0.031
0.998
1.000
0.973
1.000
1.000
1.000
m=n
=0
= 0:2
= 0:5
[0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]
n = 100
0.003
0.000
0.000
0.005
0.001
0.001
0.021
0.004
0.001
n = 400
0.022
0.006
0.002
0.273
0.145
0.054
0.895
0.780
0.434
n = 800
0.024
0.019
0.009
0.636
0.482
0.253
0.996
0.994
0.952
n = 1600
0.038
0.033
0.017
0.934
0.885
0.705
1.000
1.000
1.000
n = 3200
0.042
0.029
0.027
0.999
1.000
0.975
1.000
1.000
1.000
Cramer-Von-Mises-Smirnov statistics
m = 20 + n1=4
=0
= 0:2
= 0:5
[0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]
n = 100
0.001
0.000
0.000
0.001
0.001
0.000
0.011
0.003
0.002
n = 400
0.011
0.006
0.002
0.191
0.103
0.047
0.933
0.835
0.468
n = 800
0.011
0.010
0.007
0.610
0.451
0.216
1.000
0.998
0.955
n = 1600
0.023
0.013
0.008
0.959
0.898
0.704
1.000
1.000
1.000
n = 3200
0.022
0.020
0.014
1.000
0.998
0.969
1.000
1.000
1.000
m=n
=0
= 0:2
= 0:5
[0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]
n = 100
0.001
0.000
0.000
0.001
0.000
0.000
0.006
0.005
0.003
n = 400
0.011
0.006
0.002
0.192
0.112
0.045
0.924
0.838
0.498
n = 800
0.017
0.009
0.010
0.617
0.463
0.252
0.999
0.998
0.957
n = 1600
0.026
0.020
0.014
0.958
0.911
0.735
1.000
1.000
1.000
n = 3200
0.026
0.024
0.021
1.000
0.999
0.978
1.000
1.000
1.000
18
The empirical rejection frequencies reported in table 2 suggest that the resampling tests seem
to work quite well under standard normally distributed errors. In the case of homoscedastic
errors ( = 0), both the KS and CMS statistics are more conservative than the true rejection
frequencies (5%) of the data generating process, at least for the sample sizes considered. However,
both statistics generally seem to converge to the true values as sample size increases, although not
monotonically. The KS test does so at a faster pace than the CMS test. For the latter, the larger
resampling block size (m = n) works somewhat better than the smaller one (m = 20 + n1=4 ).
Under heteroscedastic errors, the test statistics converge to the true rejection rates of 100% as
sample size increases. As expected, this happens at a faster pace for
= 0:5 than for
= 0:2.
The power properties of the CMS and the KS statistics are rather similar and quite satisfactory,
given that the sample size is not too small. The convergence of the KS statistic is faster at both
levels of heteroscedasticity for the smaller block size, whereas the converse seems to be true for
the CMS statistic, given that the sample size is not too small. As one would expect, the empirical
rejection frequencies converge faster to the true values as the quantile region increases and this
holds true for both tests and any value . Summing up, both tests seem to perform pretty well in
…nite samples with Gaussian errors. For moderate sample sizes of several thousand observations,
the power is su¢ ciently high and small size distortions o hardly a¤ect the test statistics. In our
simulation, the KS test seems to be somewhat superior due to a faster convergence when
= 0.
In speci…cation (ii), almost the same model is used as before, only the errors are changed to be
t-distributed with three degrees of freedom, U
t(df=3), "
t(df=3), Cov(u; ") = (0:8). Table
3 reports the rejection frequencies for t-distributed error terms. As one would expect, deviations
from true rejection rates in …nite samples generally increase due to the fatter tails compared to
Gaussian errors. For
= 0, rejection frequencies of the KS test ‘overshoot’in small samples and
seem to converge to the true values as sample size increases. At least in small samples the CMS
test seems to perform slightly better as it stays on the ‘safe side’, i.e., it is more conservative than
the true rejection rates. Furthermore, the CMS rejection frequencies converge faster to 100%
under heteroscedastic errors. The larger block size (m = n) yields better results for both tests,
given that the sample size is not too small. Contrarily to the case of Gaussian errors, the largest
quantile region is generally not the best choice. Under heteroscedastic errors, T[0:1;0:9] is superior
for both the KS and CMS statistic, which is again due to fat tails. Even though power and size
properties are quite appealing for both tests, the CMS based procedure seems to be preferable in
19
the case of t-distributed errors.
20
Table 3
Empirical rejection frequencies for 5% resampling tests
"
t (df=3), 250 bootstrap draws, 1000 replications
Kolmogorov-Smirnov statistics
m = 20 + n1=4
=0
= 0:2
= 0:5
[0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]
n = 100
0.058
0.007
0.000
0.062
0.012
0.001
0.087
0.014
0.004
n = 400
0.084
0.044
0.011
0.284
0.189
0.070
0.810
0.811
0.482
n = 800
0.094
0.063
0.019
0.410
0.441
0.281
0.962
0.991
0.950
n = 1600
0.056
0.056
0.032
0.570
0.719
0.651
0.999
1.000
1.000
n = 3200
0.052
0.056
0.043
0.827
0.957
0.942
1.000
1.000
1.000
m=n
=0
= 0:2
= 0:5
[0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]
n = 100
0.037
0.006
0.000
0.041
0.005
0.001
0.064
0.006
0.001
n = 400
0.070
0.032
0.009
0.252
0.153
0.043
0.801
0.758
0.415
n = 800
0.103
0.068
0.013
0.447
0.414
0.253
0.966
0.986
0.943
n = 1600
0.066
0.061
0.026
0.642
0.743
0.632
0.999
1.000
1.000
n = 3200
0.068
0.055
0.045
0.909
0.969
0.945
1.000
1.000
1.000
Cramer-Von-Mises-Smirnov statistics
m = 20 + n1=4
=0
= 0:2
= 0:5
[0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]
n = 100
0.001
0.000
0.000
0.003
0.002
0.000
0.026
0.005
0.000
n = 400
0.022
0.012
0.002
0.215
0.107
0.026
0.894
0.815
0.466
n = 800
0.050
0.020
0.004
0.540
0.441
0.209
0.999
0.997
0.958
n = 1600
0.036
0.023
0.012
0.876
0.852
0.632
1.000
1.000
1.000
n = 3200
0.035
0.030
0.024
0.993
0.996
0.967
1.000
1.000
1.000
m=n
=0
= 0:2
= 0:5
[0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]
n = 100
0.000
0.000
0.000
0.001
0.000
0.000
0.014
0.003
0.001
n = 400
0.020
0.007
0.001
0.205
0.104
0.019
0.890
0.800
0.462
n = 800
0.053
0.017
0.004
0.581
0.462
0.230
1.000
0.998
0.960
n = 1600
0.050
0.034
0.020
0.896
0.862
0.665
1.000
1.000
1.000
n = 3200
0.049
0.036
0.036
0.995
0.998
0.976
1.000
1.000
1.000
21
Lastly, table 4 displays the test rejection rates for Cauchy distributed errors (iii): u
"
Cauchy,
Cauchy, Cov(u; ") = (0:8). Due to the Cauchy distribution’s property of unde…ned …rst and
higher moments, the sample size needs to be su¢ ciently large (at least several 1000 observations)
to obtain satisfactory results. Under homoscedastic errors, the CMS test is again more conservative than the KS test for any chosen block size. Given that the sample size is not too small,
the former outperforms the latter for
= 0:2, 0:5, as its rejection rates converge faster to 100%.
In the majority of scenarios considered, the smaller block size yields better results for both tests
and across di¤erent levels of , due to the smaller probability of extreme values related to the
non-stationary behavior of the Cauchy distribution. For the same reason, the small quantile region T[0:2;0:8] is preferable to its larger alternatives. Thus, the power gains obtained by reducing
the probability of disturbing outliers clearly outweigh the power losses due to considering a decreased range of the distribution in the inference process. Summing up, the CMS test seems to
be somewhat superior to the KS test when errors are non-Gaussian, at least for the sample sizes
considered.
22
Table 4
Empirical rejection frequencies for 5% resampling tests
"
Cauchy, 250 bootstrap draws, 1000 replications
Kolmogorov-Smirnov statistics
m = 20 + n1=4
=0
= 0:2
= 0:5
[0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]
n = 100
0.127
0.032
0.008
0.117
0.037
0.003
0.149
0.051
0.006
n = 400
0.081
0.056
0.035
0.097
0.102
0.053
0.303
0.375
0.335
n = 800
0.085
0.064
0.041
0.104
0.095
0.097
0.437
0.607
0.739
n = 1600
0.062
0.036
0.025
0.150
0.176
0.262
0.698
0.900
0.981
n = 3200
0.059
0.034
0.038
0.261
0.350
0.508
0.913
0.988
1.000
m=n
=0
= 0:2
= 0:5
[0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]
n = 100
0.088
0.018
0.004
0.088
0.025
0.003
0.113
0.030
0.006
n = 400
0.091
0.056
0.018
0.103
0.084
0.030
0.285
0.333
0.270
n = 800
0.086
0.063
0.036
0.097
0.076
0.072
0.433
0.553
0.661
n = 1600
0.069
0.031
0.023
0.132
0.144
0.220
0.704
0.881
0.963
n = 3200
0.079
0.041
0.035
0.279
0.325
0.472
0.926
0.989
0.999
Cramer-Von-Mises-Smirnov statistics
m = 20 + n1=4
=0
= 0:2
= 0:5
[0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]
n = 100
0.063
0.007
0.000
0.067
0.008
0.001
0.098
0.018
0.001
n = 400
0.073
0.041
0.004
0.104
0.075
0.017
0.420
0.489
0.279
n = 800
0.070
0.043
0.014
0.118
0.123
0.078
0.658
0.815
0.832
n = 1600
0.053
0.030
0.014
0.194
0.293
0.337
0.890
0.986
0.996
n = 3200
0.046
0.032
0.029
0.379
0.563
0.708
0.992
1.000
1.000
m=n
=0
= 0:2
= 0:5
[0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8] [0.05,0.95] [0.1,0.9] [0.2,0.8]
n = 100
0.053
0.005
0.000
0.057
0.007
0.000
0.066
0.013
0.001
n = 400
0.072
0.036
0.004
0.090
0.066
0.013
0.395
0.441
0.254
n = 800
0.083
0.040
0.016
0.110
0.092
0.063
0.625
0.773
0.812
n = 1600
0.052
0.033
0.014
0.174
0.245
0.292
0.888
0.980
0.993
n = 3200
0.061
0.031
0.025
0.385
0.525
0.668
0.994
1.000
1.000
23
6
Labor Market Applications
In this section, two applications of the test procedure to labor market data are presented. The …rst
application is a text book example for heckit estimation given by (Greene 2003), using a sample
of 753 married women originally investigated by (Mroz 1987). The data set contains information
on the wages and hours worked of the 428 women with positive labor supply (D = 1) along with
a set of regressors for the whole sample. Estimation is based on the conventional selection model
with normally distributed errors and and additive and linear bias correction function.
Yi = Xi0 + "i if Di = 1;
Di = IfXi0 + "i
Zi0 + ui g;
E("jX) = 0;
E("jX; D = 1) 6= 0;
(16)
Zi0 + ui now denotes the reservation wage that depends on a set of characteristics Z and the
error term u. An individual provides positive labor supply (Di = 1) only if the o¤ered wage
Xi0
0
+ " is at least as high as the reservation wage. In the model presented in Greene (2003), Y
is the hourly wage and X consists of experience, experience2 , eduction and a dummy for living
in a large urban area and Z contains age, age2 , family income, a dummy for having kids, and
education. See page 786 in chapter 22 of Greene (2003) for the results.
Table 5
Labor market application I: p-values of the KS and CMS tests
Test
m = 20 + n1=4
m = 20 + n1=2:01
m=n
m = n=4
KS
0.037
0.037
0.039
0.045
CMS
0.094
0.097
0.086
0.094
Table 5 reports the p-values of the KS and CMS tests, however approximating h(g) by a
polynomial as outlined in section 2 rather than assuming a linear relationship. The number
of bootstraps is B = 10; 000 and analogously to Chernozhukov & Fernández-Val (2005), four
di¤erent block sizes m are used for the bootstrap subsamples. The coe¢ cients
at 99 equidistant quantiles,
T81 = f0:10; 0:11; :::; 0:89; 0:90g. Again, the GCV is applied to
determine the optimal bandwidth bopt
in (4) and the optimal maximum order
n
9 opt
bn
= 0:2,
opt
J
are estimated
=2
24
opt 9
.
J
For all
chosen m the KS and CMS tests reject the null hypothesis of constant quantile coe¢ cients on
the 5% and 10% level, respectively. So even in very small samples, the tests proof to be quite
powerful.
The second application deals with a considerably larger data set. In their study on US women’s
relative wages, Mulligan & Rubinstein (2005) estimate the conditional mean wages of married
white women using the heckit estimator. They investigate two repeated cross-sections covering
the periods 1975-1979 and 1995-1995 stemming from the US Current Population Survey (CPS)
and consider only observations of married white couples. In this application Y represents the
wife’s log weekly wage, which is computed with respect to the total annual earnings de‡ated by
the US Consumer Price Index (CPI). Only prime age workers (25-54) that worked full time and at
least 50 weeks in the respective year are considered. The vector of regressors X consists of wife’s
working experience-15, wife’s (working experience-15)2 =100 and wife’s education, including a
teacher dummy. Z contains X and additionally includes husband’s education, husband’s (working
experience-15), husband’s (working experience-15)2 =100, and the number of children aged 0-6
present in the household. In period 1975-79, all in all 97,067 observations are available and d = 1
in 20,839 cases. For 1995-1999, the respective numbers are 87,004 and 35,206.
Table 6
Labor market application II: p-values of the KS and CMS tests
1975-1979
Test
m = 20 + n1=4
m = 20 + n1=2:01
m=n
m = n=4
KS
0.003
0.000
0.000
0.000
CMS
0.001
0.000
0.000
0.000
1995-1999
Test
m = 20 + n1=4
m = 20 + n1=2:01
m=n
m = n=4
KS
0.000
0.000
0.000
0.000
CMS
0.000
0.000
0.000
0.000
Mulligan & Rubinstein (2005) compare the coe¢ cient estimates of the heckit regression to
OLS estimates and …nd that the sample selection bias has changed over time from being negative
(-0.075) in the …rst period to positive (0.161) in the second. This would suggest that from
25
1975 to 1979, married women out of labor force had higher average potential earnings than
their working counterparts, whereas the converse was true within 1995 and 1999. Given that all
other parametric assumptions are correct, this holds only if E is homoscedastic. However, the
hypothesis of independent errors is rejected by both the KS and CMS tests on the 0.1% level
for B = 1000, T81 = f0:10; 0:11; :::; 0:89; 0:90g10 , as reported in table 6. The highly signi…cant
p-values suggest that one can neither draw conclusions on the size, nor on the sign of the selection
bias. In fact, the bias might have been positive or negative throughout both periods. Therefore,
the authors’conclusions about the development of potential earnings are not necessarily correct.
As independence of E and X does not hold, coe¢ cient estimates are inconsistent and bear no
economic interpretation.
7
Conclusions
Independence between regressors and errors is a conditio sine qua non for the identi…cation in any
parametric or semiparametric sample selection model. It is, however, a rather strong restriction
that is likely to be violated in many …elds of research where two step estimators to correct
for selection bias are heavily applied. In cases where homoscedasticity holds indeed, quantile
regression methods do not seem to be particularly attractive in sample selection models at a …rst
glance, as all quantile regression curves are parallel and all conditional quantile coe¢ cients are
of the same quantity as the mean coe¢ cient. Applications which have found (and were happy to
…nd) signi…cant di¤erences between coe¢ cients at di¤erent quantiles have merely proofed that
their model assumptions are wrong and the estimator is inconsistent.
However, quantile regression methods have valuable properties that also apply to sample selection models. Firstly, and this was the …rst motivation for suggesting quantile regression in
Buchinsky (1998a), quantile-based estimators are more robust and more e¢ cient than mean estimators when distributions have fat tails. Secondly, as argued in Koenker & Bassett (1982), quantile coe¢ cients can be used to detect heteroscedasticity and thus violations of the independence
assumption between errors and regressors. Thirdly, if independence is not satis…ed, quantile coe¢ cients can be bounded quite easily, which is not the case for mean coe¢ cients. In this paper,
10
Instead of the Gaussian kernel, the Epanechnikov kernel is used for ( ) in (4) to reduce computational burden.
bonp t
is 0.13 and 0.06 for 1975-1979 and 1995-1999, respectively, but it is locally increased to include at least 10
observations in ( ).
opt
J
is 3 and 2, respectively.
26
all three arguments in favor of quantile regression were discussed in the light of sample selection
models. We suggested a quantile-based and distribution-free procedure to test for homoscedasticity, demonstrated the satisfactory size and power properties by Monte Carlo simulations and applied it to labor market data. Given that independence between the errors and covariates holds,
we showed that quantile estimators are more e¢ cient and more robust in sample selection models
than mean estimators if errors are non-Gaussian. For the case that independence does not hold,
we proposed conditions to identify the upper and lower bound for the interval identi…cation of
quantile coe¢ cients.
The application of the test procedure to labor market data previously investigated by Mulligan & Rubinstein (2005) clearly rejects homoscedasticity and indicates that the coe¢ cients obtained by heckit estimation are inconsistent. We have the strong suspicion that this is not an
exception, but that a considerable share of the results presented in the sample selection literature
are questionable due to the violation of the standard assumption of homoscedasticity.
27
References
Ahn, H. & Powell, J. (1993), ‘Semiparametric estimation of censored selection models with a
nonparametric selection mechanism’, Journal of Econometrics 58, 3–29.
Andrews, D. & Schafgans, M. (1998), ‘Semiparametric estimation of the intercept of a sample
selection model’, Review of Economic Studies 65, 497–517.
Angrist, J., Chernozhukov, V. & Fernández-Val, I. (2006), ‘Vouchers for private schooling in
colombia’, Econometrica.
Buchinsky, M. (1994), ‘Changes in the u.s. wage structure 1963-1987: Application of quantile
regression’, Econometrica 62, 405–458.
Buchinsky, M. (1998a), ‘The dynamics of changes in the female wage distribution in the usa: A
quantile regression approach’, Journal of Applied Econometrics 13, 1–30.
Buchinsky, M. (1998b), ‘Recent advances in quantile regression models: A practical guidline for
empirical research’, The Journal of Human Resources 33(1), 88–126.
Buchinsky, M. (2001), ‘Quantile regression with sample selection: Estimating women’s return to
education in the u.s.’, Empirical Economics 26, 87–113.
Chamberlain, G. (1986), ‘Asymptotic e¢ ciency in semiparametric models with censoring’, Journal
of Econometrics 32, 189–218.
Chaudhuri, P. (1991), ‘Global nonparametric estimation of conditional quantile functions and
their derivatives’, Journal of Multivariate Analysis 39, 246–269.
Chen, S. & Khan, S. (2003), ‘Semiparametric estimation of a heteroskedastic sample selection
model’, Econometric Theory 19, 1040–1064.
Chernozhukov, V. & Fernández-Val, I. (2005), ‘Subsampling inference on quantile regression
processes’, Sankhya: The Indian Journal of Statistics 67, 253–276.
Chernozhukov, V. & Hansen, C. (2006), ‘Instrumental quantile regression inference for structural
and treatment e¤ect models’, Journal of Econometrics 132, 491–525.
Cosslett, S. (1991), Distribution-free estimator of a regression model with sample selectivity, in
W. Barnett, J. Powell & G. Tauchen, eds, ‘Nonparametric and semiparametric methods in
econometrics and statistics’, Cambridge University Press, Camdridge, UK, pp. 175–198.
Craven, P. & Wahba, G. (1979), ‘Smoothing noisy data with spline functions: estimating the
correct degree of smoothing by the method of generalized cross-validation’, Numerische
Mathematik 31, 377–403.
Das, M., Newey, W. & Vella, F. (2003), ‘Nonparametric estimation of sample selection models’,
Review of Economic Studies 70, 33–58.
Donald, S. G. (1995), ‘Two-step estimation of heteroskedastic sample selection models’, Journal
of Econometrics 65, 347–380.
28
Gallant, A. & Nychka, D. (1987), ‘Semi-nonparametric maximum likelihood estimation’, Econometrica 55, 363–390.
Ger…n, M. (1996), ‘Parametric and semi-parametric estimation of the binary response model of
labour market participation’, Journal of Applied Econometrics 11(3), 321–339.
Golub, G. H., Heath, M. & Wahba, G. (1979), ‘Generalized cross validation as a method for
choosing a good ridge parameter’, Technometrics 21(2), 215–224.
Greene, W. H. (2003), Econometric Analysis, New Jersey: Pearson Education.
Gronau, R. (1974), ‘Wage comparisons-a selectivity bias’, Journal of Political Economy
82(6), 1119–1143.
Guntenbrunner, C. & Jureµcková, J. (1992), ‘Regression quantile and regression rank score process
in the linear model and derived statistics’, Annals of Statistics 20, 305–330.
Heckman, J. (1974), ‘Shadow prices, market wages and labor supply’, Econometrica 42, 679–694.
Heckman, J. (1990), ‘Varieties of selection bias’, American Economic Review, Papers and Proceedings 80, 313–318.
Heckman, J. J. (1979), ‘Sample selection bias as a speci…cation error’, Econometrica 47(1), 153–
161.
Heckman, J. J. & Vytlacil, E. (2005), ‘Structural equations, treatment e¤ects, and econometric
½
policy evaluation 1’, Econometrica 73(3), 669U738.
Ichimura, H. (1993), ‘Semiparametric least squares (sls) and weighted sls estimation of singleindex models’, Journal of Econometrics 58, 71–120.
Klein, R. W. & Spady, R. H. (1993), ‘An e¢ cient semiparametric estimator for binary response
models’, Econometrica 61(2), 387–421.
Koenker, R. (2005), Quantile Regression, Cambridge University Press.
Koenker, R. & Bassett, G. (1978), ‘Regression quantiles’, Econometrica 46(1), 33–50.
Koenker, R. & Bassett, G. (1982), ‘Robust tests for heteroskedasticity based on regression quantiles’, Econometrica 50(1), 43–62.
Koenker, R. & Hallock, K. F. (2001), ‘Quantile regression’, Journal of Economic Perspectives
15, 143–156.
Koenker, R. & Xiao, Z. (2002), ‘Inference on the quantile regression process’, Econometrica
70, 1583–1612.
Li, K. C. (1985), ‘From stein’s unbiased risk estimates to the method of generalized cross validation’, The Annals of Statistics 13(4), 1352–1377.
Manski, C. F. (1989), ‘Anatomy of the selection problem’, The Journal of Human Resources
24(3), 343–360.
29
Manski, C. F. (1994), The selection problem, in C. Sims., ed., ‘Advances in Econometrics: Sixth
World Congress’, Cambridge University Press, pp. 143–170.
Mincer, J. (1973), Schooling, Experience, and Earnings, NBER, New York.
Mroz, T. A. (1987), ‘The sensitivity of an empirical model of married women’s hours of work to
economic and statistical assumptions’, Econometrica 55(4), 765–799.
Mulligan, C. B. & Rubinstein, Y. (2005), ‘Selection, investment, and women’s relative wages since
1975’, NBER Working Paper.
Newey, W. K. (1991), ‘Two-step series estimation of sample selection models’, unpublished manuscript, M.I.T.
Powell, J. (1986), ‘Censored regression quantiles’, Journal of Econometrics 32, 143–155.
Powell, J. (1987), ‘Semiparametric estimation of bivariate latent variable models’. unpublished
manuscript, University of Wisconsin-Madison.
8
Appendix
If h(g) is known, ^ can be estimated in a GMM framework by the moment condition
(z 0 ; y; d; ; ; ) = d[
where r = (x; h(g)). Let
mately
0
= ( 0;
1=2 + 1=2 sgn(y
x0
h(g(z; )))]r
). The quantile regression estimator for
1=n
n
X
solves approxi-
(z 0 ; y; d; ^ ; ^ ) = 0
i=1
p
p
and su¢ cient smoothness of (z 0 ; y; d; ; ; ) w.r.t.
Assuming n-consistency of ^ , ^ !
, the mean value theorem applied to the …rst-order conditions of to get
0 = 1=n
n
X
( ;
)+
i=1
@ ( l; l)
(^
@ 0
)+
@ ( l; l)
(^
@ 0
0)
;
where l , l are on the line segments connecting ^ ; and ^ ; , respectively and where
is the short form of (z 0 ; y; d; ; ; ). It follows from equation (17) that
p
n
n(^
)=
1 X @ ( l; l)
n
@ 0
1
i=1
Buchinsky (1998a) shows that
p
=
n
1 X @ ( l; l) p
n(^
n
@ 0
0; 0)
i=1
L
n(^
1
fr [
n
1 X
p
(
n
) ! N (0;
(1
)
rr
+
,
(17)
( ;
)
0)
i=1
(18)
) and
f rx
0
f rx ]
1
fr
(19)
where
fr
= E[dfe (0jr)rr0 ];
f rx
= E[dfe (0jr)
30
@h(g)
@g
0
rz 0 ];
rr
= E[drr0 ]:
(20)
and f (=cdot) denotes the distribution function. The asymptotic covariance matrix
is the top
left k k matrix of
. By combining (18), (20) and (A4) of Buchinsky (1998a) one obtains
p
n
1 X
p
)! p
`
n
n(^
;
f rx
p
n(^
0)
(21)
i=1
where ` ; = d(
Ify < x0
(1993) into (21) to get
p
n(^
+
h(g(z; ))g)r. We now insert equation (49) of Klein & Spady
p
)!
n
1
fr
1 X
p
(`
n
;
f rx
p
1
k );
(22)
i=1
where
k =
@p(
@
0)
d p( 0 )
p( 0 )(1 p( 0 ))
and
p
=E @
31
P(
@
0
0 ) @P ( 0 )
@
1
p(
0 )(1
p(
0 ))
.
(23)