Econometrica, Vol. 68, No. 3 ŽMay, 2000., 575]603
SAMPLE SPLITTING AND THRESHOLD ESTIMATION
BY BRUCE E. HANSEN 1
Threshold models have a wide variety of applications in economics. Direct applications
include models of separating and multiple equilibria. Other applications include empirical
sample splitting when the sample split is based on a continuously-distributed variable such
as firm size. In addition, threshold models may be used as a parsimonious strategy for
nonparametric function estimation. For example, the threshold autoregressive model
ŽTAR. is popular in the nonlinear time series literature.
Threshold models also emerge as special cases of more complex statistical frameworks,
such as mixture models, switching models, Markov switching models, and smooth transition threshold models. It may be important to understand the statistical properties of
threshold models as a preliminary step in the development of statistical tools to handle
these more complicated structures.
Despite the large number of potential applications, the statistical theory of threshold
estimation is undeveloped. It is known that threshold estimates are super-consistent, but a
distribution theory useful for testing and inference has yet to be provided.
This paper develops a statistical theory for threshold estimation in the regression
context. We allow for either cross-section or time series observations. Least squares
estimation of the regression parameters is considered. An asymptotic distribution theory
for the regression estimates Žthe threshold and the regression slopes. is developed. It is
found that the distribution of the threshold estimate is nonstandard. A method to
construct asymptotic confidence intervals is developed by inverting the likelihood ratio
statistic. It is shown that this yields asymptotically conservative confidence regions. Monte
Carlo simulations are presented to assess the accuracy of the asymptotic approximations.
The empirical relevance of the theory is illustrated through an application to the multiple
equilibria growth model of Durlauf and Johnson Ž1995..
KEYWORDS: Confidence intervals, nonlinear regression, growth regressions, regime
shifts.
1.
INTRODUCTION
A ROUTINE PART OF AN EMPIRICAL ANALYSIS of a regression model such as
yi s b 9 x i q e i is to see if the regression coefficients are stable when the model is
estimated on appropriately selected subsamples. Sometimes the subsamples are
selected on categorical variables, such as gender, but in other cases the subsamples are selected based on continuous variables, such as firm size. In the latter
case, some decision must be made concerning what is the appropriate threshold
Ži.e., how big must a firm be to be categorized as ‘‘large’’. at which to split the
sample. When this value is unknown, some method must be employed in its
selection.
1
This research was supported by a grant from the National Science Foundation and an Alfred P.
Sloan Foundation Research Fellowship. Thanks go to Robert de Jong and James MacKinnon for
insightful comments and Mehmet Caner and Metin Celebi for excellent research assistance. Special
thanks go to two diligent referees whose careful readings of multiple drafts of this paper have
eliminated several serious errors.
575
576
B. E. HANSEN
Such practices can be formally treated as a special case of the threshold
regression model. These take the form
Ž1.
yi s u 1X x i q e i ,
qi F g ,
Ž2.
yi s u 2X x i q e i ,
qi ) g ,
where qi may be called the threshold variable, and is used to split the sample
into two groups, which we may call ‘‘classes,’’ or ‘‘regimes,’’ depending on the
context. The random variable e i is a regression error.
Formal threshold models arise in the econometrics literature. One example is
the Threshold Autoregressive ŽTAR. model of Tong Ž1983, 1990., recently
explored for U.S. GNP by Potter Ž1995.. In Potter’s model, yi is GNP growth
and x i and qi are lagged GNP growth rates. The idea is to allow important
nonlinearities in the conditional expectation function without over-parameterization. From a different perspective, Durlauf and Johnson Ž1995. argue that
models with multiple equilibria can give rise to threshold effects of the form
given in model Ž1. ] Ž2.. In their formulation, the regression is a standard Barro-styled cross-country growth equation, but the sample is split into
two groups, depending on whether the initial endowment is above a specific
threshold.
The primary purpose of this paper is to derive a useful asymptotic approximation to the distribution of the least-squares estimate g
ˆ of the threshold parameter g . The only previous attempt Žof which I am aware. is Chan Ž1993. who
derives the asymptotic distribution of g
ˆ for the TAR model. Chan finds that
nŽg
ˆy g 0 . converges in distribution to a functional of a compound Poisson
process. Unfortunately, his representation depends upon a host of nuisance
parameters, including the marginal distribution of x i and all the regression
coefficients. Hence, this theory does not yield a practical method to construct
confidence intervals.
We take a different approach, taking a suggestion from the change-point
literature, which considers an analog of model Ž1. ] Ž2. with qi s i. Let dn s u 2 y
u 1 denote the ‘‘threshold effect.’’ The proposed solution is to let dn ª 0 as
n ª `. ŽWe will hold u 2 fixed and thereby make u 1 approach u 2 as n ª `..
Under this assumption, it has been found Žsee Picard Ž1985. and Bai Ž1997.. that
the asymptotic distribution of the changepoint estimate is nonstandard yet free
of nuisance parameters Žother than a scale effect.. Interestingly, we find in the
threshold model that the asymptotic distribution of the threshold estimate g
ˆ is
of the same form as that found for the change-point model, although the scale
factor is different.
The changepoint literature has confined attention to the sampling distribution
of the threshold estimate. We refocus attention on test statistics, and are the
first to study likelihood ratio tests for the threshold parameter. We find that the
likelihood ratio test is asymptotically pivotal when dn decreases with sample
size, and that this asymptotic distribution is an upper bound on the asymptotic
distribution for the case that dn does not decrease with sample size. This allows
SAMPLE SPLITTING
577
us to construct asymptotically valid confidence intervals for the threshold based
on inverting the likelihood ratio statistic. This method is easy to apply in
empirical work. A GAUSS program that computes the estimators and test
statistics is available on request from the author or from his Web homepage.
The paper is organized as follows. Section 2 outlines the method of least
squares estimation of threshold regression models. Section 3 presents the
asymptotic distribution theory for the threshold estimate and the likelihood
ratio statistic for tests on the threshold parameter. Section 4 outlines methods to
construct asymptotically valid confidence intervals. Methods are presented for
the threshold and for the slope coefficients. Simulation evidence is provided to
assess the adequacy of the asymptotic approximations. Section 5 reports an
application to the multiple equilibria growth model of Durlauf and Johnson
Ž1995.. The mathematical proofs are left to an Appendix.
2.
ESTIMATION
n
The observed sample is yi , x i , qi 4is1
, where yi and qi are real-valued and x i is
an m-vector. The threshold ¨ ariable qi may be an element of x i , and is assumed
to have a continuous distribution. A sample-split or threshold regression model
takes the form Ž1. ] Ž2.. This model allows the regression parameters to differ
depending on the value of qi . To write the model in a single equation, define the
dummy variable d i Žg . s qi F g 4 where ?4 is the indicator function and set
x i Žg . s x i d i Žg ., so that Ž1. ] Ž2. equal
Ž3.
yi s u 9 x i q dnX x i Ž g . q e i
where u s u 2 . Equation Ž3. allows all of the regression parameters to switch
between the regimes, but this is not essential to the analysis. The results
generalize to the case where only a subset of parameters switch between regimes
and to the case where some regressors only enter in one of the two regimes.
To express the model in matrix notation, define the n = 1 vectors Y and e by
stacking the variables yi and e i , and the n = m matrices X and Xg by stacking
the vectors xXi and x i Žg .9. Then Ž3. can be written as
Ž4.
Y s Xu q Xg dn q e.
The regression parameters are Ž u , dn , g ., and the natural estimator is least
squares ŽLS.. Let
Ž5.
S n Ž u , d , g . s Ž Y y Xu y Xg d . 9 Ž Y y Xu y Xg d .
be the sum of squared errors function. Then by definition the LS estimators
uˆ, dˆ, g
ˆ jointly minimize Ž5.. For this minimization, g is assumed to be restricted
to a bounded set w g , g x s G . Note that the LS estimator is also the MLE when e i
is iid N Ž0, s 2 ..
The computationally easiest method to obtain the LS estimates is through
concentration. Conditional on g , Ž4. is linear in u and dn , yielding the condi-
578
B. E. HANSEN
tional OLS estimators uˆŽg . and dˆŽg . by regression of Y on XgU s w X Xg x. The
concentrated sum of squared errors function is
S n Ž g . s Sn Ž uˆŽ g . , dˆŽ g . , g . s Y9Y y Y 9 XgU Ž XgUX XgU .
y1
XgUX Y ,
and g
ˆ is the value that minimizes SnŽg .. Since SnŽg . takes on less than n
distinct values, g
ˆ can be defined uniquely as
gˆ s argmin Sn Ž g .
gg Gn
where Gn s G l q1 , . . . , qn4 , which requires less than n function evaluations. The
slope estimates can be computed via uˆs uˆŽg
ˆ . and dˆs dˆŽgˆ ..
If n is very large, G can be approximated by a grid. For some N - n, let
qŽ j. denote the Ž jrN .th quantile of the sample q1 , . . . , qn4 , and let GN s
G l qŽ1. , . . . , qŽ n.4 . Then gˆN s argming g G N SnŽg . is a good approximation to g
ˆ
which only requires N function evaluations.
From a computational standpoint, the threshold model Ž1. ] Ž2. is quite similar
to the changepoint model Žwhere the threshold variable equals time, qi s i ..
Indeed, if the observed values of qi are distinct, the parameters can be
estimated by sorting the data based on qi , and then applying known methods for
changepoint models. When there are tied values of qi , estimation is more
delicate, as the sorting operation is no longer well defined nor appropriate.
From a distributional standpoint, however, the threshold model differs considerably from the changepoint model. One way to see this is to note that if the
regressors x i contain qi , as is typical in applications, then sorting the data by qi
induces a trend into the regressors x i , so the threshold model is somewhat
similar to a changepoint model with trended data. The presence of trends is
known to alter the asymptotic distributions of changepoint tests Žsee Hansen
Ž1992, 2000. and Chu and White Ž1992... More importantly, the distribution of
changepoint estimates Žor the construction of confidence intervals. has not been
studied for this case. Another difference is that the stochastic process R nŽg . s
Ý nis1 x i e i qi F g 4 is a martingale Žin g . when qi s i Žthe changepoint model., but
it is not necessarily so in the threshold model Žunless the data are independent
across i .. This difference may appear minor, but it requires the use of a
different set of asymptotic tools.
3.
DISTRIBUTION THEORY
3.1. Assumptions
Define the moment functionals
Ž6.
Ž7.
M Ž g . s E Ž x i xXi qi F g 4 . ,
D Ž g . s E Ž x i xXi < qi s g . ,
and
V Ž g . s E Ž x i xXi e i2 < qi s g . .
SAMPLE SPLITTING
579
Let f Ž q . denote the density function of qi , g 0 denote the true value of g ,
D s DŽg 0 ., V s V Žg 0 ., f s f Žg 0 ., and Ms EŽ x i xXi ..
ASSUMPTION 1:
1. Ž x i , qi , e i . is strictly stationary, ergodic and r-mixing, with r-mixing coefficients satisfying Ý`ms 1 rm1r2 - `.
2. EŽ e i < Fiy1 . s 0.
3. E < x i < 4 - ` and E < x i e i < 4 - `.
4. For all g g G , EŽ< x i < 4e i4 < qi s g . F C and EŽ< x i < 4 ¬ qi s g . F C for some C `, and f Žg . F f - `.
5. f Žg ., DŽg ., and V Žg . are continuous at g s g 0 .
6. dn s cnya with c / 0 and 0 - a - 12 .
7. c9Dc ) 0, c9Vc ) 0, and f ) 0.
8. M) M Žg . ) 0 for all g g G .
Assumption 1.1 is relevant for time series applications, and is trivially satisfied
for independent observations. The assumption of stationarity excludes time
trends and integrated processes. The r-mixing assumption 2 controls the degree
of time series dependence, and is weaker than uniform mixing, yet stronger than
strong mixing. It is sufficiently flexible to embrace many nonlinear time series
processes including threshold autoregressions. Indeed, Chan Ž1989. has demonstrated that a strong form of geometric ergodicity holds for TAR processes and,
as discussed in the proof of Proposition 1 of Chan Ž1993., this implies rm s
O Ž r m . with < r < - 1, which implies Assumption 1.1.
Assumption 1.2 imposes that Ž1. ] Ž2. is a correct specification of the conditional mean. Assumptions 1.3 and 1.4 are unconditional and conditional moment
bounds. Assumption 1.5 requires the threshold variable to have a continuous
distribution, and essentially requires the conditional variance EŽ e i2 < qi s g . to be
continuous at g 0 , which excludes regime-dependent heteroskedasticity.
Assumption 1.6 may appear unusual. It specifies that the difference in
regression slopes gets small as the sample size increases. Conceptually, this
implies that we are taking an asymptotic approximation valid for small values of
dn . The parameter a controls the rate at which dn decreases to zero, i.e., how
small we are forcing dn to be. Smaller values of a are thus less restrictive. The
reason for Assumption 1.6 is that Chan Ž1993. found that with dn fixed,
nŽg
ˆy g 0 . converged to an asymptotic distribution that was dependent upon
nuisance parameters and thus not particularly useful for calculation of confidence sets. The difficulty is due to the Op Ž ny1 . rate of convergence. By letting
dn tend towards zero, we reduce the rate of convergence and find a simpler
asymptotic distribution.
Assumption 1.7 is a full-rank condition needed to have nondegenerate asymptotic distributions. While the restriction c9Dc ) 0 might appear innocuous, it
2
For a definition, see Ibragimov Ž1975. and Peligrad Ž1982..
580
B. E. HANSEN
excludes the interesting special case of a ‘‘continuous threshold’’ model, which is
Ž1. ] Ž2. with x i s Ž1 qi .9 and dnX g 0U s 0 where g 0U s Ž1 g 0 .9. In this case the
conditional mean takes the form of a continuous linear spline. From definition
Ž7. we can calculate for this model that c9Dc s c9EŽ x i xXi < qi s g 0 . c s c9g 0Ug 0UX c s
0. A recent paper that explores the asymptotic distribution of the least squares
estimates in this model is Chan and Tsay Ž1998..
Assumption 1.8 is a conventional full-rank condition which excludes multicollinearity. Note that this assumption restricts G to a proper subset of the
support of qi . This is a technical condition which simplifies our consistency
proof.
3.2. Asymptotic Distribution
A two-sided Brownian motion W Ž r . on the real line is defined as
¡W Žyr . ,
¢W Ž r . ,
W Ž r . s~0,
1
2
r - 0,
r s 0,
r ) 0,
where W1Ž r . and W2 Ž r . are independent standard Brownian motions on w0, `..
THEOREM 1: Under Assumption 1, n1y 2 a Žg
ˆy g 0 . ªd v T, where
c9Vc
vs
Ž c9Dc . 2 f
and
1
T s argmax y < r < q W Ž r . .
2
y`-r-`
Theorem 1 gives the rate of convergence and asymptotic distribution of the
threshold estimate g
ˆ. The rate of convergence is n1y 2 a , which is decreasing in
a . Intuitively, a larger a decreases the threshold effect dn , which decreases the
sample information concerning the threshold g , reducing the precision of any
estimator of g .
Theorem 1 shows that the distribution of the threshold estimate under our
‘‘small effect’’ asymptotics takes a similar form to that found for changepoint
Ž1991.,
estimates. For the latter theory, see Picard Ž1985., Yao Ž1987., Dumbgen
¨
Ž
.
and Bai 1997 . The difference is that the asymptotic precision of g
ˆ is proportional to the matrix EŽ x, xXi < qi s g 0 . while in the changepoint case the asymptotic precision is proportional to the unconditional moment matrix EŽ x i xXi .. It is
interesting to note that these moments are equal when x i and qi are independent, which would not be typical in applications.
The asymptotic distribution in Theorem 1 is scaled by the ratio v . In the
leading case of conditional homoskedasticity
Ž9.
E Ž e i2 < qi . s s 2 ,
581
SAMPLE SPLITTING
then V s s 2 D and v simplifies to
vs
s2
Ž c9Dc . f
.
The asymptotic distribution of g
ˆ is less dispersed when v is small, which occurs
when s 2 is small, f Žg 0 . is large Žso that many observations are near the
threshold., andror < c < is large Ža large threshold effect..
The distribution function for T is known. ŽSee Bhattacharya and Brockwell
Ž1976... Let F Ž x . denote the cumulative standard normal distribution function.
Then for xG 0,
P ŽT F x . s 1 q
(
x
2p
exp y
x
ž /
8
3
3'x
q exp Ž x . F y
2
2
ž
y
xq 5
F y
'x
/ ž /ž /
2
2
.
and for x- 0, P ŽT F x . s 1 y P ŽT F yx .. A plot of the density function of T is
given in Figure 1.
FIGURE 1.}Asymptotic density of threshold estimator.
582
B. E. HANSEN
3.3. Likelihood Ratio Test
To test the hypothesis H0 : g s g 0 , a standard approach is to use the
likelihood ratio statistic under the auxiliary assumption that e i is iid N Ž0, s 2 ..
Let
LR n Ž g . s n
Sn Ž g . y Sn Ž g
ˆ.
Sn Ž g
ˆ.
.
The likelihood ratio test of H0 is to reject for large values of LR nŽg 0 ..
THEOREM 2: Under Assumption 1,
LR n Ž g 0 . ªd h 2j ,
where
j s max w 2W Ž s . y < s <x
sgR
and
h2s
c9Vc
s 2 c9Dc
.
The distribution function of j is P Ž j F x . s Ž1 y eyx r2 . 2 .
If homoskedasticity Ž9. holds, then h 2 s 1 and the asymptotic distribution of
LR nŽg 0 . is free of nuisance parameters. If heteroskedasticity is suspected, h 2
must be estimated. We discuss this in the next section.
Theorem 2 gives the large sample distribution of the likelihood ratio test for
hypotheses on g . The asymptotic distribution is nonstandard, but free of
nuisance parameters under Ž9.. Since the distribution function is available in a
simple closed form, it is easy to generate p-values for observed test statistics.
Namely,
1
2
pn s 1 y 1 y exp y LR n Ž g 0 .
2
ž
ž
2
//
is the asymptotic p-value for the likelihood ratio test. Critical values can be
calculated by direct inversion of the distribution function. Thus a test of H0 :
g s g 0 rejects at the asymptotic level of a if LR nŽg 0 . exceeds cj Ž1 y a., where
cj Ž z . s y2 lnŽ1 y 'z .. Selected critical values are reported in Table I.
TABLE I
ASYMPTOTIC CRITICAL VALUES
PŽ j F x.
.80
.85
.90
.925
.95
.975
.99
4.50
5.10
5.94
6.53
7.35
8.75
10.59
SAMPLE SPLITTING
583
3.4. Estimation of h 2
The asymptotic distribution of Theorem 2 depends on the nuisance parameter
h 2 . It is therefore necessary to consistently estimate this parameter. Let r 1 i s
Ž dnX x i . 2 Ž e i2rs 2 . and r 2 i s Ž dnX x i . 2 . Then
Ž 10.
h2s
E Ž r 1 i < qi s g 0 .
E Ž r 2 i < qi s g 0 .
is the ratio of two conditional expectations. Since r 1 i and r 2 i are unobserved, let
ˆr1 i s Ž dˆX x i . 2 Ž ˆe i2rsˆ 2 . and ˆr 2 i s Ž dˆX x i . 2 denote their sample counterparts.
A simple estimator of the ratio Ž10. uses a polynomial regression, such as a
quadratic. For j s 1 and 2, fit the OLS regressions
ˆr ji s m
ˆ j0 q m
ˆ j1 qi q m
ˆ j2 qi2 q «ˆji ,
and then set
h
ˆ s
2
m
ˆ 10 q m
ˆ 11 gˆq m
ˆ 12 gˆ 2
m
ˆ 20 q m
ˆ 21 gˆq m
ˆ 22 gˆ 2
.
An alternative is to use kernel regression. The Nadaraya-Watson kernel
estimator is
h
ˆ2s
Ý nis1 K h Ž g
ˆy qi . tˆ1 i
Ý nis1 K h Ž g
ˆy qi . ˆt 2 i
where K hŽ u. s hy1 K Ž urh. for some bandwidth h and kernel K Ž u., such as the
Epanechnikov K Ž u. s 34 Ž1 y u 2 .< u < F 14 . The bandwidth h may be selected
according to a minimum mean square error criterion Žsee Hardle and Linton
Ž1994...
4.
CONFIDENCE INTERVALS
4.1. Threshold Estimate
A common method to form confidence intervals for parameters is through the
inversion of Wald or t statistics. To obtain a confidence interval for g , this
would involve the distribution T from Theorem 1 and an estimate of the scale
parameter v . While T is parameter-independent, v is directly a function of dn
and indirectly a function of g 0 Žthrough DŽg 0 ... When asymptotic sampling
distributions depend on unknown parameters, the Wald statistic can have very
poor finite sample behavior. In particular, Dufour Ž1997. argues that Wald
statistics have particularly poorly-behaved sampling distributions when the parameter has a region where identification fails. The threshold regression model
is an example where this occurs, as the threshold g is not identified when
584
B. E. HANSEN
dn s 0. These concerns have encouraged us to explore the construction of
confidence regions based on the likelihood ratio statistic LR nŽg ..
Let C denote the desired asymptotic confidence level Že.g. C s .95., and let
c s cj Ž C . be the C-level critical value for j Žfrom Table I.. Set
Gˆs g : LR n Ž g . F c 4 .
Theorem 2 shows that P Žg 0 g Gˆ . ª C as n ª ` under the homoskedasticity
assumption Ž9.. Thus Gˆ is an asymptotic C-level confidence region for g . A
graphical method to find the region Gˆ is to plot the likelihood ratio LR nŽg .
against g and draw a flat line at c. ŽNote that the likelihood ratio is identically
zero at g s g
ˆ.. Equivalently, one may plot the residual sum of squared errors
SnŽg . against g , and draw a flat line at SnŽg
ˆ . q sˆ 2 c.
If the homoskedasticity condition Ž9. does not hold, we can define a scaled
likelihood ratio statistic:
LRUn Ž g . s
LR n Ž g .
h
ˆ
2
s
Sn Ž g . y Sn Ž g
ˆ.
sˆ 2h
ˆ2
and an amended confidence region
Gˆ * s g : LRUn Ž g . F c 4 .
Since h
ˆ 2 is consistent for h 2 , P Žg 0 g Gˆ *. ª C as n ª ` whether or not Ž9.
holds, so Gˆ * is a heteroskedasticity-robust asymptotic C-level confidence region
for g .
These confidence intervals are asymptotically correct under the assumption
that dn ª 0 as n ª `, which suggests that the actual coverage may differ from
the desired level for large values of dn . We now consider the case of a s 0,
which implies that dn is fixed as n increases. We impose the stronger condition
that the errors e i are iid N Ž0, s 2 ., strictly independent of the regressors x i and
threshold variable qi .
THEOREM 3: Under Assumption 1, modifying part 6 so that a s 0, and the errors
e i are iid N Ž0, s 2 . strictly independent of the regressors x i and threshold ¨ ariable qi ,
then
P Ž LR n Ž g 0 . G x . F P Ž j G x . q o Ž 1 . .
Theorem 3 shows that at least in the case of iid Gaussian errors, the
likelihood ratio test is asymptotically conservative. Thus inferences based on the
confidence region Gˆ are asymptotically valid, even if dn is relatively large.
Unfortunately, we do not know if Theorem 3 generalizes to the case of
non-normal errors or regressors that are not strictly exogenous. The proof of
Theorem 3 relies on the Gaussian error structure and it is not clear how the
theorem would generalize.
585
SAMPLE SPLITTING
4.2. Simulation E¨ idence
We use simulation techniques to compare the coverage probabilities of the
confidence intervals for g . We use the simple regression model Ž3. with iid data
and x i s Ž1 z i .9, e i ; N Ž0, 1., qi ; N Ž2, 1., and g s 2. The regressor z i was either
iid N Ž0, 1. or z i s qi . The likelihood ratio statistic is invariant to u . Partitioning
dn s Ž d 1 d 2 .9 we set d 1 s 0 and assessed the coverage probability of the confidence interval Gˆ as we varied d 2 and n. We set d 2 s .25, .5, 1.0, 1.5, and 2.0 and
n s 50, 100, 250, 500, and 1000. Using 1000 replications Table II reports the
coverage probabilities for nominal 90% confidence intervals.
The results are quite informative. For all cases, the actual coverage rates
increase as n increases or d 2 increases, which is consistent with the prediction
of Theorem 3. For small sample sizes and small threshold effects, the coverage
rates are lower than the nominal 90%. As expected, however, as the threshold
effect d 2 increases, the rejection rates rise and become quite conservative.
4.3. Slope Parameters
Letting u s Ž u , dn . and uˆs Ž uˆ, dˆ.. Lemma A.12 in the Appendix shows that
Ž 11.
'n Ž uˆy u . ªd Z ; N Ž0, Vu .
where Vu is the standard asymptotic covariance matrix if g s g 0 were fixed. This
means that we can approximate the distribution of uˆ by the conventional
ˆ Žg . denote the
normal approximation as if g were known with certainty. Let Q
conventional asymptotic C-level confidence region for u constructed under the
ˆ Žgˆ .. ª C as n ª `.
assumption that g is known. Ž11. shows that P Ž u g Q
In finite samples, this procedure seems likely to under-represent the true
sampling uncertainty, since it is not the case that g
ˆs g 0 in any given sample. It
may be desirable to incorporate this uncertainty into our confidence intervals
for u . This appears difficult to do using conventional techniques, as uˆŽg . is not
differentiable with respect to g , and g
ˆ is non-normally distributed. A simple yet
constructive technique is to use a Bonferroni-type bound. For any r - 1, let
TABLE II
CONFIDENCE INTERVAL CONVERGENCE FOR g
AT
10% LEVEL
x i s qi
d2 s
.25
.5
n s 50
n s 100
n s 250
n s 500
n s 1000
.86
.82
.83
.90
.90
.87
.90
.93
.93
.93
1.0
.93
.96
.97
.97
.98
x i ; N Ž0, 1 .
1.5
.97
.98
.98
.98
.99
2.0
.99
.99
.99
.99
.99
.25
.5
.90
.84
.80
.81
.86
.87
.86
.92
.93
.93
1.0
.93
.92
.94
.95
.94
1.5
.93
.96
.96
.96
.96
2.0
.97
.95
.98
.98
.97
586
B. E. HANSEN
Gˆ Ž r . denote the confidence interval for g with asymptotic coverage r . For each
ˆ Žg . and then set
g g Gˆ Ž r . construct the pointwise confidence region Q
D Qˆ Ž g . .
ˆr s
Q
g g Gˆ Ž r .
ˆr > Qˆ Žgˆ ., it follows that P Ž u g Qˆr . G P Ž u g Qˆ Žgˆ .. ª C as n ª `.
Since Q
This procedure is assessed using a simple Monte Carlo simulation. In Table
III we report coverage rates of a nominal 95% confidence interval Ž C s .95. on
d 2 . The same design is used as in the previous section, although the results are
reported only for the case x i independent of qi and a more narrow set of n and
ˆ0
d 2 to save space. We tried r s 0, .5, .8, and .95. As expected, the simple rule Q
is somewhat liberal for small u 2 and n, but is quite satisfactory for large n or d 2 .
In fact, all choices for r lead to similar results for large d 2 . For small d 2 and n,
the best choice may be r s .8, although this may produce somewhat conservative
confidence intervals for small d 2 .
ˆ 0 s Qˆ Žgˆ . works fairly well for large n
In summary, while the naive choice Q
andror large threshold effects, it has insufficient coverage probability for small
n or threshold effect. This problem can be solved through the conservative
ˆr with r ) 0, and the choice r s .8 appears to work reasonably well
procedure Q
in a simulation.
5.
APPLICATION: GROWTH AND MULTIPLE EQUILIBRIA
Durlauf and Johnson Ž1995. suggest that cross-section growth behavior may
be determined by initial conditions. They explore this hypothesis using the
Summers-Heston data set, reporting results obtained from a regression tree
methodology. A regression tree is a special case of a multiple threshold
regression. The estimation method for regression trees due to Breiman et al.
Ž1984. is somewhat ad hoc, with no known distributional theory. To illustrate the
usefulness of our estimation theory, we apply our techniques to regressions
similar to those reported by Durlauf-Johnson.
TABLE III
CONFIDENCE INTERVAL CONVERGENCE FOR d 2
n s 100
d2s
.25
.5
ˆ0
Q
ˆ.5
Q
ˆ.8
Q
ˆ.95
Q
.90
.90
.93
.95
.96
.96
.95
.97
.99
.99
1.0
AT
5% LEVEL
n s 250
2.0
.25
.5
1.0
.95
.95
.90
.94
.95
.96
.95
.96
.97
.96
.97
.98
.95
.94
.99
.99
n s 500
2.0
.25
.5
1.0
2.0
.94
.94
.91
.94
.97
.98
.94
.94
.94
.95
.96
.94
.97
.98
.95
.95
.97
.94
.99
.99
.95
.95
SAMPLE SPLITTING
587
The model seeks to explain real GDP growth. The specification is
ln Ž YrL . i , 1985 y ln Ž YrL . i , 1960
s z q b ln Ž YrL . i , 1960 q p 1 ln Ž IrY . i
q p 2 ln Ž n i q g q d . q p 3 ln Ž SCHOOL . i q e i ,
where for each country i:
v
Ž YrL. i, t s real GDP per member of the population aged 15]64 in year t;
v
Ž IrY . i s investment to GDP ratio;
v
n i s growth rate of the working-age population;
Ž SCHOOL. i s fraction of working-age population enrolled in secondary
school.
v
The variables not indexed by t are annual averages over the period 1960]1985.
Following Durlauf-Johnson, we set g q d s 0.05.
Durlauf-Johnson estimate Ž12. for four regimes selected via a regression tree
using two possible threshold variables that measure initial endowment: per
capita output YrL and the adult literacy rate LR, both measured in 1960. The
authors argue that the error e i is heteroskedastic so present their results with
heteroskedasticity-corrected standard errors. We follow their lead and use
heteroskedasticity-consistent procedures, estimating the nuisance parameter h 2
using an Epanechnikov kernel with a plug-in bandwidth.
Since the theory outlined in this paper only allows one threshold and one
threshold variable, we first need to select among the two threshold variables,
and verify that there is indeed evidence for a threshold effect. We do so by
employing the heteroskedasticity-consistent Lagrange multiplier ŽLM. test for a
threshold of Hansen Ž1996.. Since the threshold g is not identified under the
null hypothesis of no threshold effect, the p-values are computed by a bootstrap
analog, fixing the regressors from the right-hand side of Ž12. and generating the
bootstrap dependent variable from the distribution N Ž0, ˆ
e i2 ., where ˆ
e i is the
OLS residual from the estimated threshold model. Hansen Ž1996. shows that
this bootstrap analog produces asymptotically correct p-values. Using 1000
bootstrap replications, the p-value for the threshold model using initial per
capital output was marginally significant at 0.088 and that for the threshold
model using initial literacy rate was insignificant at 0.214. This suggests that
there might be a sample split based on output.
Figure 2 displays a graph of the normalized likelihood ratio sequence LRUn Žg .
as a function of the threshold in output. The LS estimate of g is the value that
minimizes this graph, which occurs at g
ˆs $863. The 95% critical value of 7.35 is
also plotted Žthe dotted line., so we can read off the asymptotic 95% confidence
set Gˆ * s w$594, $1794x from the graph from where LRUn Žg . crosses the dotted
line. These results show that there is reasonable evidence for a two-regime
588
B. E. HANSEN
FIGURE 2.}First sample split: Confidence interval construction for threshold.
specification, but there is considerable uncertainty about the value of the
threshold. While the confidence interval for g might seem rather tight by
viewing Figure 2, it is perhaps more informative to note that 40 of the 96
countries in the sample fall in the 95% confidence interval, so cannot be
decisively classified into the first or second regime.
If we fix g at the LS estimate $863 and split the sample in two based on initial
GDP, we can Žmechanically . perform the same analysis on each subsample. It is
not clear how our theoretical results extend to such procedures, but this will
enable more informative comparisons with the Durlauf-Johnson results. Only 18
countries have initial output at or below $863, so no further sample split is
possible among this subsample. Among the 78 countries with initial output
above $863, a sample split based on initial output produces an insignificant
p-value of 0.152, while a sample split based on the initial literacy rate produces a
p-value of 0.078, suggesting a possible threshold effect in the literacy rate. The
point estimate of the threshold in the literacy rate is 45%, with a 95%
asymptotic confidence interval w19%, 57%x. The graph of the normalized likelihood ratio statistic as a function of the threshold in the literacy rate is displayed
in Figure 3. This confidence interval contains 19 of the 78 countries in the
subsample. We could try to further split these two subsamples, but none of the
bootstrap test statistics were significant at the 10% level.
SAMPLE SPLITTING
589
FIGURE 3.}Second sample split: Confidence interval construction for threshold.
Our point estimates are quite similar to those of Durlauf and Johnson Ž1995..
What is different are our confidence intervals. The confidence intervals for the
threshold parameters are sufficiently large that there is considerable uncertainty
regarding their values, hence concerning the proper division of countries into
convergence classes as well.
6.
CONCLUSION
This paper develops asymptotic methods to construct confidence intervals for
least-squares estimates of threshold parameters. The confidence intervals are
asymptotically conservative. It is possible that more accurate confidence intervals may be constructed using bootstrap techniques. This may be quite delicate,
however, since the sampling distribution of the likelihood ratio statistic appears
to be nonstandard and nonpivotal. This would be an interesting subject for
future research.
Dept. of Economics, Social Science Bldg., Uni¨ ersity of Wisconsin, Madison, WI
53706-1393, U.S.A.; [email protected];www.ssc.wisc.edur ; bhansen.
Manuscript recei¨ ed July, 1996; final re¨ ision recei¨ ed April, 1999.
590
B. E. HANSEN
APPENDIX : MATHEMATICAL PROOFS
Define h i Žg 1 , g 2 . s < x i e i < < d i Žg 2 . y d i Žg 1 .< and k i Žg 1 , g 2 . s < x i < < d i Žg 2 . y d i Žg 1 .<.
LEMMA A.1: There is a C1 - ` such that for g F g 1 F g 2 F g , and r F 4,
Ž 12.
Eh ri Ž g 1 , g 2 . F C1 < g 2 y g 1 < ,
Ž 13.
Ek ir Ž g 1 , g 2 . F C1 < g 2 y g 1 < .
PROOF: For any random variable Z,
Ž 14 .
d
dg
E Ž Zd i Ž g .. s E Ž Z < qi s g . f Ž g . .
Thus under Assumption 1.4,
d
dg
Ž < x i e i < rd i Ž g .. s E Ž < x i e i < r < qi s g . f Ž g .
F w E Ž < x i e i < 4 < qi s g .x
rr4
f Žg .
F C r r4 f F C1 ,
setting C1 s maxw1, C x f. Since d i Žg 2 . y d i Žg 1 . equals either zero or one,
Eh ri Ž g 1 , g 2 . s E Ž < x i e i < rd i Ž g 2 .. y E Ž < x i e i < rd i Ž g 1 .. F C1 < g 2 y g 1 < ,
by a first-order Taylor series expansion, establishing Ž12.. The proof of Ž13. is identical.
Q.E.D.
LEMMA A.2: There is a K- ` such that for all g F g 1 F g 2 F g ,
Ž 15.
E
Ž 16.
E
1
n
1
n
2
F K <g 2 y g1 <,
Ý Ž h2i Ž g 1 , g 2 . y Eh2i Ž g 1 , g 2 ..
'n is1
2
F K <g 2 y g1 <.
Ý Ž k i2 Ž g 1 , g 2 . y Ek i2 Ž g 1 , g 2 ..
'n is1
PROOF: Lemma 3.4 of Peligrad Ž1982. shows that for r-mixing sequences satisfying Assumption
1.1, there is a K 9 - ` such that
E
1
2
n
Ý
'n is1
Ž h2i Ž g 1 , g 2 . y Eh2i Ž g 1 , g 2 ..
F K 9E Ž h 2i Ž g 1 , g 2 . y Eh2i Ž g 1 , g 2 ..
2
F 2 K 9Eh4i Ž g 1 , g 2 .
F 2 K 9C1 < g 2 y g 1 <
where the final inequality is Ž12.. This is Ž15. with Ks 2 K 9C1 . The proof of Ž16. is identical.
Q.E.D.
591
SAMPLE SPLITTING
Let
Jn Ž g . s
1
n
Ý x i ei di Ž g . .
'n is1
LEMMA A.3: There are finite constants K 1 and K 2 such that for all g 1 , « ) 0, h ) 0, and d G ny1 , if
'n G K 2rh, then
P
ž
sup
g 1F gF g 1q d
K1 d 2
Jn Ž g . y Jn Ž g 1 . ) h F
/
h4
.
PROOF: Let m be an integer satisfying n dr2 F m F n d , which is possible since n d G 1. Set
dm s drm. For k s 1, . . . , m q 1, set g k s g 1 q dm Ž k y 1., h i k s h i Žg k , g kq1 ., and h i jk s h i Žg j , g k ..
Note that Ž12. implies Eh ri k F C1 dm and Eh ri jk F C1 < k y j < dm for r F 4. Letting Hn k s ny1 Ý nis1 h i k ,
observe that for g k F g F g kq1 ,
Jn Ž g . y Jn Ž g k . F 'n Hn k F 'n < Hn k y EHn k < q 'n EHn k .
Thus
Ž 17.
sup
g 1F gF g 1q d
Jn Ž g . y Jn Ž g 1 . F
Jn Ž g k . y Jn Ž g 1 .
max
2FkFmq1
q max
1FkFm
'n < Hn k y EHn k < q
max
1FkFm
'n EHn k .
For any 1 F j - k F m q 1, by Burkholder’s inequality Žsee, e.g., Hall and Heyde Ž1980, p. 23.. for
some C9 - `, Minkowski’s inequality, Eh2i k F C1 dm , Ž15., ny1 F dm and Ž k y j .1r 2 F Ž k y j .,
Ž 18.
4
E Jn Ž g k . y Jn Ž g j . s E
Ý x i ei Ž d i Ž g k . y d i Ž g j ..
'n is1
F C9E
F C9
4
n
1
1
n
2
n
Ý
h 2i jk
is1
EHi2jk q
ž
E
n
F C9 C1 Ž k y j . dm q
Ý
Ž h2i jk y Eh2i jk .
is1
ž
1r2 2
2
n
1
K Ž k y j . dm
n
/
1r2 2
/
2
2
F C9 w C1 q 'K x ŽŽ k y j . dm . .
The bound Ž18. and Theorem 12.2 of Billingsley Ž1968, p. 94. imply that there is a finite K 0 such that
Ž 19.
P
ž
max
2FkFmq1
Jn Ž g k . y Jn Ž g 1 . ) h F K 0
/
Ž m dm . 2
h4
which bounds the first term on the right-hand side of Ž17..
sK0
d2
h4
,
592
B. E. HANSEN
We next consider the second term. Lemma 3.6 of Peligrad Ž1982. shows there is a K - - ` such
that
Ž 20 .
E <'n Ž Hn k y EHGn k . < 4 F K - ny1 Eh4i k q Ž Eh2i k .
Ž
2
.
2
F K - Ž ny1 C1 dm q Ž C1 dm . .
F K - Ž C1 q C12 . dm2 ,
where the third inequality is ny1 F dm . Markov’s inequality and Ž20. yield
Ž 21 .
P
ž
max <'n Ž Hn k y EHn k . < ) h F m
/
1FkFm
F
K - Ž C1 q C12 . dm2
h4
K - Ž C1 q C12 . d 2
h4
,
where the final inequality uses m dm s d and dm F d .
Finally, note that
Ž 22.
'n EHn k s 'n Eh i k F 'n C1 dm F
2C1
'n
,
since dm F 2rn. Together, Ž17., Ž19., Ž21., and Ž22. imply that when 2C1 r 'n F h ,
P
ž
sup
g 1F gF g 1q d
< Jn Ž g . y Jn Ž g 1 < ) 3h F
/
w K 0 q K - Ž C1 q C12 .x d 2
h4
,
which implies the result setting K 1 s 3 4 w K 0 q K -Ž C1 q C12 .x and K 2 s 6C1.
Q.E.D.
Let ‘‘« ’’ denote weak convergence with respect to the uniform metric.
LEMMA A.4: JnŽg . « J Žg ., a mean-zero Gaussian process with almost surely continuous sample
paths.
PROOF: For each g , x i e i d i Žg . is a square integrable stationary martingale difference, so JnŽg .
converges pointwise to a Gaussian distribution by the CLT. This can be extended to any finite
collection of g to yield the convergence of the finite dimensional distributions.
Fix « ) 0 and h ) 0. Set d s «h 4rK 1 and n s maxw dy1 , K 22 rh 2 x, where K 1 and K 2 are defined
in Lemma A.3. Then by Lemma A.3, for any g 1 , if n G n,
P
ž
sup
g 1F gF g 1q d
Jn Ž g . y Jn Ž g . ) h F
/
K1 d 2
h4
s d« .
This establishes the conditions for Theorem 15.5 of Billingsley Ž1968.. ŽSee also the proof of
Theorem 16.1 of Billingsley Ž1968... It follows that JnŽg . is tight, so JnŽg . « J Žg ..
Q.E.D.
LEMMA A.5: g
ˆªp g 0 .
PROOF: Lemma 1 of Hansen Ž1996. shows that Assumption 1 is sufficient for
Ž 23 .
Mn Ž g . s
1
n
X
Xg Xg s
1
n
n
Ý x i xXi d i Ž g . ªp M Ž g . ,
is1
uniformly over g g R. Observe that Ž23. implies Mn s Ž1rn. X 9 X ªp M.
593
SAMPLE SPLITTING
Let X 0 s Xg 0 . Then since Y s Xu q X 0 dn q e and X lies in the space spanned by Pg s
XgU Ž XgUX XgU .y1 XgUX ,
X
X
X
X
Sn Ž g . y e9e s Y9 Ž I y Pg . Y y e9e s ye9Pg e q 2 dn X 0 Ž I y Pg . e q dn X 0 Ž I y Pg . X 0 dn .
Using Assumption 1.6, Lemma A.4, and Ž23., we see that uniformly over g g G ,
X
ny1 q2 a Ž Sn Ž g . y e9e . s ny1 c9 Ž X 0 Ž I y Pg . X 0 . c q op Ž 1 . .
The projection Pg can be written as the projection onto w Xg , Zg x, where Zg s X y Xg is a matrix
X
X
X
X
X
whose ith row is x i Ž1 y d i Žg ... Observe that Xg Zg s 0, and for g G g 0 , X 0 Xg s X 0 X 0 and X 0 Zg s 0.
Then using Ž23. we calculate that uniformly over g g w g 0 , g x
X
ny1 c9 Ž X 0 Ž I y Pg . X 0 . c s c9 Ž Mn Ž g 0 . y Mn Ž g 0 . Mn Ž g .
ªp c9 Ž M Ž g 0 . y M Ž g 0 . M Ž g .
y1
y1
Mn Ž g 0 .. c
M Ž g 0 .. c ' b1Ž g . ,
say. Note that M Žg .y1 exists under Assumption 1.8 so b1Žg . is well defined.
A calculation based on Ž14. shows
Ž 24.
d
dg
M Žg . s D Žg . f Žg . ,
where M Žg . and DŽg . are defined in Ž6. and Ž7.. Using Ž24.,
d
dg
b1Ž g . s c9M Ž g 0 . M Ž g .
y1
DŽg . f Žg . M Žg .
y1
M Žg 0 . c G 0
so b1Žg . is continuous and weakly increasing on w g 0 , g x. Additionally,
d
dg
b1Ž g 0 . s c9Dfc ) 0
by Assumption 1.7, so b1Žg . is uniquely minimized at g 0 on w g 0 , g x.
X
Symmetrically, we can show that uniformly over g g w g , g 0 x, ny1 c9Ž X 0 Ž I y Pg . X 0 . c ªp b 2 Žg ., a
weakly decreasing continuous function which is uniquely minimized at g 0 . Thus uniformly over
g g G , ny1 q2 a Ž SnŽg . y e9e . ªp b1Žg .g G g 0 4 q b 2 Žg .g - g 0 4, a continuous function with unique
minimizer g 0 . Since g
ˆ minimizes SnŽg . y e9e, it follows that gˆªp g 0 Žsee, e.g., Theorem 2.1 of
Newey and McFadden Ž1994...
Q.E.D.
LEMMA A.6: n a Ž uˆy u . s op Ž1. and n a Ž dˆy dn . s op Ž1..
PROOF: We show the result for dˆ, as the derivation for uˆ is similar. Let PX s I y X Ž X 9 X .y1 X 9.
Then on g g G , using Ž23.,
1
n
X
Ž .
Ž .
Ž . y1 M Ž g . s M* Ž g . ,
Xg PX Xg s Mn Ž g . y Mn Ž g . My1
n Mn g « M g y M g M
say. Observe that for g g G , M*Žg . is invertible since My M Žg . is invertible by Assumption 1.8.
Also,
1
n
X
Xg PX X 0 « M Ž g n g 0 . y M Ž g . My1 M Ž g 0 . s M0U Ž g . ,
594
B. E. HANSEN
say. Thus
n a Ž dˆŽ g . y dn . s
« M* Ž g .
ž
y1
1
y1
X
Xg PX Xg
n
/ ž
1
n
X
Xg PX ŽŽ X 0 y Xg . c q en a .
/
Ž M0U Ž g . y M* Ž g .. c
s d Žg . ,
M0U Žg 0 . s M*Žg 0 .
so d Žg 0 . s 0. Since M Žg . is continuous and M*Žg . is invertible,
say. Note that
d Žg . is continuous in G . Lemma A.5 allows us to conclude that
n a Ž dˆy dn . s n a Ž dˆŽ g
ˆ . y dn . ªp d Ž g 0 . s 0.
Q.E.D.
Define GnŽg . s ny1 Ý nis1Ž c9 x i . 2 < d i Žg . y d i Žg 0 .< and K nŽg . s ny1 Ý nis1 k i2 Žg 0 , g .. Set a n s n1y 2 a .
LEMMA A.7: There exist constants B ) 0, 0 - d- `, and 0 - k - `, such that for all h ) 0 and
« ) 0, there exists a ¨ - ` such that for all n,
Ž 25.
P
P
ž
¨
an
ž
Gn Ž g .
inf
F < g y g 0 <FB
K nŽg .
sup
¨
an
<g y g 0 <
F < g y g 0 <FB
<g y g 0 <
/
- Ž1 y h . d F « ,
/
) Ž1 q h . k F « .
PROOF: We show Ž25., as the proof of Ž26. is similar. First, note that for g G g 0 ,
Ž 27.
EGn Ž g . s c9 Ž M Ž g . y M Ž g 0 .. c,
so by Ž24., dEGnŽg .rdg s c9DŽg . f Žg . c, and the sign is reversed if g - g 0 . Since c9DŽg 0 . f Žg 0 . c ) 0
ŽAssumption 1.8. and c9DŽg . f Žg . c is continuous at g 0 ŽAssumption 1.5., then there is a B
sufficiently small such that
ds
min
< g y g 0 <FB
c9D Ž g . f Ž g . c ) 0.
Since EGnŽg 0 . s 0, a first-order Taylor series expansion about g 0 yields
Ž 28.
inf
< g y g 0 <FB
EGn Ž g . G d < g y g 0 < .
Lemma A.2 Ž16. yields
Ž 29.
2
2
E Gn Ž g . y EGn Ž g . F < c < 4E K n Ž g . y EK n Ž g . F < c < 4 ny1 K < g y g 0 < .
For any h and « , set
Ž 30.
bs
1 y hr2
1yh
)1
and
Ž 31.
¨s
8 < c < 4K
h d 1 y 1rb . «
2 2Ž
.
595
SAMPLE SPLITTING
We may assume that n is large enough so that ¨ ra n F B, else the inequality Ž25. is trivial. For
j s 1, 2, . . . , N q 1, set g j s g 0 q ¨ b jy 1ra n , where N is the integer such that g N y g 0 s ¨ b Ny1 ra n F B
and g Nq 1 y g 0 ) B. ŽNote that N G 1 since ¨ ra n F B..
Markov’s inequality, Ž29., and Ž28. yield
P
ž
Gn Ž g j .
sup
y1 )
EGn Ž g j .
1FjFN
h
2
2 N
2
E Gn Ž g j . y EGn Ž g j .
/ ž /Ý
F
F
h
h
Ý
js1
s ny2 a
d 2 < gj y g 0 < 2
4 < c < 4K
h d ¨
2 2
4< c< K
4
F
2
< c < 4Kny1 < g j y g 0 <
N
4
2
EGn Ž g j .
js1
2
`
Ý
js1
1
h d ¨ 1 y 1rb
2 2
1
b jy 1
s «r2
where the final equality is Ž31.. Thus with probability exceeding 1 y «r2,
Ž 32.
Gn Ž g j .
EGn Ž g j .
y1 F
h
2
for all 1 F j F N.
So for any g such that Ž ¨ ra n . F g y g 0 F B, there is some j F N such that g j - g - g jq1 , and on
the event Ž32.,
Ž 33.
Gn Ž g .
<g y g 0 <
G
Gn Ž g j .
EGn Ž g j .
EGn Ž g j . < g jq1 y g 0 <
G 1y
ž
h
2
/
d <gj y g 0 <
< g jq 1 y g 0 <
s Ž 1 y h . d,
using Ž28., the construction Žg j y g 0 .rŽg jq1 y g 0 . s 1rb, and Ž30.. Since this event has probability
exceeding 1 y «r2, we have established
P
ž
¨
an
inf
Fg y g 0 FB
Gn Ž g .
<g y g 0 <
/
- Ž 1 y h . d F «r2.
A symmetric argument establishes a similar inequality where the infimum is taken over yŽ ¨ ra n . G g
y g 0 G yB, establishing Ž25..
Q.E.D.
LEMMA A.8: For all h ) 0 and « ) 0, there exists some ¨ - ` such that for any B - `,
Ž 34.
P
sup
¨
an
F < g y g 0 <FB
Jn Ž g . y Jn Ž g 0 .
'a
n
<g y g 0 <
0
)h F« .
596
B. E. HANSEN
PROOF: Fix h ) 0. For j s 1, 2, . . . , set g j y g 0 s ¨ 2 jy 1ra n , where ¨ - ` will be determined later.
A straightforward calculation shows that
Ž 35.
Jn Ž g . y Jn Ž g 0 .
sup
¨
'a
F < g y g 0 <FB
an
n
<g y g 0 <
Jn Ž g j . y Jn Ž g 0 .
F 2 sup
'a
j)0
q 2 sup
n
<gj y g 0 <
Jn Ž g . y Jn Ž g j .
sup
'a
j)0 g jF g F g jq 1
n
< gj y g 0 <
.
Markov’s inequality, the martingale difference property, and Ž12. imply
Ž 36.
ž
Jn Ž g j . y Jn Ž g 0 .
P 2 sup
'a
j)0
n
<gj y g 0 <
/
)h F
s
F
s
`
4
h
an < g j y g 0 <
js1
4
`
h2
js1
js1
C1 < g j y g 0 <
an < g j y g 0 < 2
`
4C1
h
an < g j y g 0 < 2
`
Ý
2
1
Ý
2
2
2
E Ž < x i ei < 2 di Ž gj . y di Ž g 0 . .
Ý
4
h
E Jn Ž g j . y Jn Ž g 0 .
Ý
2
js1
¨2
jy1
s
8C1
h2¨
.
Set d j s g jq1 y g j and hj s a n < g j y g 0 <h. Then
'
Ž 37.
ž
P 2 sup
Jn Ž g . y Jn Ž g j .
sup
j)0 g jF g F g jq 1
'a
n
)h
<gj y g 0 <
/
`
F2
ÝP
js1
ž
sup
g jF g F g jq d j
Jn Ž g . y Jn Ž g j . ) hj .
/
y1
Observe that if ¨ G 1, then d j G ay1
. Furthermore, if ¨ G K 2 rh , then hj s ay1r2
2 jy1 ¨ h G
n Gn
n
y1 r2
y1r2
K 2 an
G K2 n
. Thus if ¨ G maxw1, K 2 rh x the conditions for Lemma A.3 hold, and the
right-hand side of Ž37. is bounded by
`
Ž 38.
2
Ý
js1
K 1 d j2
hj4
`
s
Ý
js1
K 1 < g jq1 y g j < 2
a2n < g j y g 0 < 4h 4
s
8 K1
3h 4 ¨ 2
.
Ž35., Ž36., Ž37., and Ž38. show that if ¨ G maxw1, K 2rh x,
P
sup
¨
an
F < g y g 0 <FB
Jn Ž g . y Jn Ž g 0 .
'a
<
<
n gyg0
0
) 2h F
8C1
h2¨
q
8 K1
3h 4 ¨ 2
,
which can be made arbitrarily small by picking suitably large ¨ .
Q.E.D.
LEMMA A.9: a nŽg
ˆy g 0 . s Op Ž1..
PROOF: Let B, d, and k be defined as in Lemma A.7. Pick h ) 0 and k ) 0 small enough so that
Ž 39.
Ž 1 y h . dy 2 Ž < c < q k . h y 2 Ž < c < q k . k Ž 1 q h . k y Ž 2 < c < q k . k Ž 1 q h . k ) 0.
597
SAMPLE SPLITTING
Let En be the joint event that < g
ˆy g 0 < F B, n a < uˆy u 0 < F k , n a < dˆy d 0 < F k ,
Ž 40.
¨
an
Ž 41.
inf
F < g y g 0 <FB
sup
¨
an
F < g y g 0 <FB
Gn Ž g .
G Ž 1 y h . d,
<g y g 0 <
K nŽg .
F Ž1 q h . k ,
<g y g 0 <
and
Ž 42.
sup
¨
an
F < g y g 0 <FB
Jn Ž g . y Jn Ž g 0 .
'a
n
<g y g 0 <
-h.
Fix « ) 0, and pick ¨ and n so that P Ž En . G 1 y « for all n G n, which is possible under Lemmas
A.5]A.8.
Let SUn Žg . s SnŽ uˆ, dˆ, g ., where SnŽ?, ? , ? . is the sum of squared errors function Ž5.. Since
Y s Xu q X 0 dn q e,
Y y Xuˆy Xg dˆs Ž e y X Ž uˆy u . y X 0 Ž dˆy dn .. y D Xg dˆ,
where D Xg s Xg y X 0 . Hence
Ž 43.
SUn Ž g . y SUn Ž g 0 . s Y y Xuˆy Xg dˆ 9 Y y Xuˆy Xg dˆ
Ž
.Ž
.
y Ž Y y Xuˆy X 0 dˆ . 9 Ž Y y Xuˆy X 0 dˆ .
X
X
X
s dˆ9 D Xg D Xg dˆy 2 dˆ9 D Xg e q 2 dˆ9 D Xg D Xg Ž uˆy u .
X
X
X
s d 9n D Xg D Xg dn y 2 dˆ9 D Xg e q 2 dˆ9 D Xg D Xg Ž uˆy u .
X
q Ž dn q dˆ . 9 D Xg D Xg Ž dˆy dn . .
Suppose g g w g 0 q ¨ ra n , g 0 q B x and suppose En holds. Let ˆ
c s n adˆ so that < ˆ
c y c < F k . By
Ž39. ] Ž43.,
SUn Ž g . y SUn Ž g 0 .
an Ž g y g 0 .
X
s
c9 D Xg D Xg c
nŽg y g 0 .
q
G
X
y
Ž c q c . 9 D XgX
ˆ
Gn Ž g .
Žg y g 0 .
X
2ˆ
c D Xg e
n1y a Ž g y g 0 .
X
q
2ˆ
c9 D Xg D Xg n a Ž uˆy u .
nŽg y g 0 .
D Xg Ž ˆ
c y c.
nŽg y g 0 .
y2<ˆ
c<
y< cqˆ
c< < ˆ
cyc<
Jn Ž g . y Jn Ž g 0 .
'a
n
Žg y g 0 .
y2<ˆ
c < < n a Ž uˆy u . <
K nŽg .
Žg y g 0 .
K nŽg .
Žg y g 0 .
G Ž 1 y h . dy 2 Ž < c < q k . h y 2 Ž < c < q k . k Ž 1 q h . k
y Ž2 < c < q k . k Ž1 q h . k
) 0.
We have shown that on the set En , if g g w g 0 q ¨ ra n , g 0 q B x, then SUn Žg . y SUn Žg 0 . ) 0. We can
similarly show that if g g w g 0 y B, g 0 y ¨«ra n x then SUn Žg . y SUn Žg 0 . ) 0. Since SUn Žg
ˆ . y SUn Žg 0 . F 0,
598
B. E. HANSEN
this establishes that En implies < g
ˆy g 0 < F ¨ ran . As P Ž En . G 1 y « for n G n, this shows that
P Ž an < g
Q.E.D.
ˆy g 0 < ) ¨ . F « for n G n, as required.
Ž .
Ž
.
Let GnU Ž ¨ . s a n GnŽg 0 q ¨ ra n . and K U
n ¨ s a n K n g 0 q ¨ ra n .
LEMMA A.10: Uniformly on compact sets C ,
Ž 44 .
GnU Ž ¨ . ªp m < ¨ < ,
and
Ž 45 .
Ž .
< << <
KU
n ¨ ªp Df ¨ ,
where m s c9Dcf.
PROOF: We show Ž44.. Fix ¨ g C . Using Ž27.,
Ž 46 .
EGnU Ž ¨ . s a n c9 Ž M Ž g 0 q ¨ ra n . y M Ž g 0 .. c ª < ¨ < c9Dfc s < ¨ < m
as n ª `. By Ž29.,
Ž 47.
E GnU Ž ¨ . y EGnU Ž ¨ . s a2n E Gn Ž g 0 q ¨ ra n . y EGn Ž g 0 q ¨ ra n .
2
F
a2n
n
< c < 4K
¨
an
2
s < c < 4K < ¨ < ny2 a ª 0.
Markov’s inequality, Ž46. and Ž47. show that GnU Ž ¨ . ªp m < ¨ <.
Suppose C s w0, ¨ x. Since GnU Ž ¨ . is monotonically increasing on C and the limit function is
continuous, the convergence is uniform over C . To see this, set GŽ ¨ . s m¨ . Pick any « ) 0, then set
J s ¨ mr« and for j s 0, 1, . . . , J, set ¨ j s ¨ m jrJ. Then pick n large enough so that max j F J < GnU Ž ¨ j . y
GŽ ¨ j .< F « with probability greater than 1 y « , which is possible by pointwise consistency. For any
j G 1, take any ¨ g Ž ¨ jy 1 , ¨ j .. Both GnU Ž ¨ . and GŽ ¨ . lie in the interval w GŽ ¨ jy1 . y « , GŽ ¨ j . q « x Žwith
probability greater than 1 y « ., which has length bounded by 3 « . Since ¨ is arbitrary, < GnU Ž ¨ . y GŽ ¨ .<
F 3 « uniformly over C .
An identical argument yields uniformity over sets of the form wy¨ , 0x, and thus for arbitrary
compact sets C .
Q.E.D.
Let R nŽ ¨ . s a n Ž JnŽg 0 q ¨ ra n . y JnŽg 0 ...
'
LEMMA A.11: On any compact set C ,
RnŽ ¨ . « BŽ¨ .
where B Ž ¨ . is a ¨ ector Brownian motion with co¨ ariance matrix EŽ B Ž1. B Ž1.9. s Vf.
PROOF: Our proof proceeds by establishing the convergence of the finite dimensional distributions of R nŽ ¨ . to those of B Ž ¨ . and then showing that R nŽ ¨ . is tight.
Fix ¨ g C . Define dUi Ž ¨ . s d i Žg 0 q ¨ ra n . y d i Žg 0 . and u n i Ž ¨ . s a n x i e i dUi Ž ¨ . so that R nŽ ¨ . s
ny1 r2 Ý nis 1 u n i Ž ¨ ., and let VnŽ ¨ . s ny1 Ý nis1 u n i Ž ¨ . u n i Ž ¨ .9. Under Assumption 1.2, u n i Ž ¨ ., Fi 4 is a
martingale difference array ŽMDA.. By the MDA central limit theorem Žfor example, Theorem 24.3
of Davidson Ž1994, p. 383.. sufficient conditions for R nŽ ¨ . ªd N Ž0, < ¨ < Vf . are
'
Ž 48.
Vn Ž ¨ . ªp < ¨ < Vf
599
SAMPLE SPLITTING
and
Ž 49.
ny1 r2 max
1FiFn
u n i Ž ¨ . ªp 0.
X
First, a calculation based on Ž14. shows dEŽ x i x i e i2 d i Žg ..rdg s V Žg . f Žg ., where V Žg . is defined
Ž
in 8.. Thus by the continuity of V Žg . f Žg . at g 0 ŽAssumption 1.5., if ¨ ) 0,
Ž 50.
X
EVn Ž ¨ . s a n E Ž x i x i e i2 dUi Ž ¨ . .
X
¨
s a n E x i x i e i2 d i g 0 q
ž
ž
an
X
y E Ž x i x i e i2 d i Ž g 0 ..
//
ª ¨ Vf ,
with the sign reversed if ¨ - 0. By Ž15.,
Ž 51.
2
E Vn Ž ¨ . y EVn Ž ¨ . s
F
a2n
n
a2n
n
E
K
n
1
Ý
'n is1
¨
2
Ž < x i e i < 2 dUi Ž ¨ . y E Ž < x i e i < 2 dUi Ž ¨ . ..
s ny2 a K¨ ª 0
an
as n ª `, which with Ž50. combines to yield Ž48.. By Ž12.,
E ny1r2 max
1FiFn
uni Ž ¨ .
4
F
s
F
1
n
E uni Ž ¨ .
a2n
n
a2n
n
4
E Ž < x i e i < 4 dUi Ž ¨ . .
C1
<¨ <
an
s ny2 a C1 < ¨ < ª 0,
which establishes Ž49. by Markov’s inequality. We conclude that R nŽ ¨ . ªd N Ž0, < ¨ < Vf .. This argument can be extended to include any finite collection w ¨ 1 , . . . , ¨ k x to yield the convergence of the
finite dimensional distributions of R nŽ ¨ . to those of B Ž ¨ ..
We now show tightness. Fix « ) 0 and h ) 0. Set d s «h 4 rK 1 and n s Žmaxw dy1r2 , K 2rh x.1r a ,
where K 1 and K 2 are defined in Lemma A.3. Set g 1 s g 0 q ¨ 1ra n . By Lemma A.3, for n G n,
P
ž
sup
¨ 1 F¨ F¨ 1 qd
RnŽ ¨ . y RnŽ ¨ 1 . ) h s P
/ ž
K1
F
sup
g 1 F g F g 1 q d ra n
d
ž /
an
y2 4
an h
Jn Ž g . y Jn Ž g 1 . )
h
a1r2
n
/
2
F d« .
1r2
ŽThe conditions for Lemma A.3 are met since dran G ny1 when n a G dy1r2 , and hra1r2
n G K 2 rn
when n a G K 2 rh , and these hold for n G n.. As discussed in the proof of Lemma A.4, this shows
Q.E.D.
that R nŽg . is tight, so R nŽ ¨ . « B Ž ¨ ..
600
B. E. HANSEN
LEMMA A.12:
'n Ž uˆy uˆŽg 0 .. ªp 0, and 'n Ž uˆŽg 0 . y u . ªd Z ; NŽ0, Vu ..
PROOF: Let Jn s ny1 r2 X 9e, and observe that Lemma A.4 implies
1
XgUX e s
'n
Jn
J
,
J Žg .
ž / ž /
«
Jn Ž g .
a Gaussian process with continuous sample paths. Thus on any compact set C ,
Jn
JnU Ž ¨ . s
ž 0
Jn g 0 q
¨
«
J
J Žg 0 .
/ ž /
an
and Ž23. implies
MnU Ž ¨ . s
ž
Mn g 0 q
Mn
Mn g 0 q
¨
an
ž
/ ž
Mn g 0 q
¨
an
¨
an
/
/
0
ªp
M
ž
M Žg 0 .
M Žg 0 .
M Žg 0 .
/
uniformly on C . Also
AnŽ ¨ . s
an
n
Ž .
D XgX 0q¨ r a nD Xg 0q¨ r a n F K U
n ¨ ªp 0
uniformly on C by Lemma A.10, and hence
Ž 52.
'n uˆ
žž
g0 q
¨
an
y u s MnU Ž ¨ .
/ /
ž
«
y1
ž
JnU Ž ¨ . y
M Žg 0 .
M Žg 0 .
M
M Žg 0 .
I
A n Ž ¨ . cay1r2
n
I
ž/
y1
/
J
s Z,
J Žg 0 .
/ ž /
which establishes 'n Ž uˆŽg 0 . y u . ªd Z as stated.
Pick « ) 0 and d ) 0. Lemma A.9 and Ž52. show that there is a ¨ - ` and n - ` so that for all
n G n, P Ž a n < g
ˆy g 0 < ) ¨ . F « and
P
ž
sup
'n uˆ
y¨ F¨ F¨
ž
g0 q
¨
an
/
y uˆŽ g 0 . ) d F « .
/
Hence, for n G n,
P Ž 'n < uˆy uˆŽ g 0 . < ) d . F P
ž
sup
'n uˆ
y¨ F¨ F¨
ž
g0 q
¨
an
/
y uˆŽ g 0 . ) d
/
qP Ž a n < g
ˆy g 0 < ) ¨ . F 2 « ,
establishing that 'n < uˆy uˆŽg 0 .< ªp 0 as stated.
Q.E.D.
Let Q nŽ ¨ . s SnŽ uˆ, dˆ, g 0 . y SnŽ uˆ, dˆ, g 0 q ¨ ra n ..
LEMMA A.13: On any compact set C , Q nŽ ¨ . « QŽ ¨ . s ym < ¨ < q 2'l W Ž ¨ ., where l s c9Vcf.
X
PROOF: From Ž43. we find Q nŽ ¨ . s yGnU Ž ¨ . q 2 c R nŽ ¨ . q L nŽ ¨ ., where
Ž .
L n Ž ¨ . F 2'n < dˆy dn < R n Ž ¨ . q Ž 2 n a < ˆ
c < < uˆy u < q < c q ˆ
c< < ˆ
c y c <. K U
n ¨
«0
601
SAMPLE SPLITTING
' <ˆ <
Ž .
Ž .
Ž .
Ž . ' <ˆ
<
Ž .
Ž .
since Žuniformly on C . K U
n ¨ s Op 1 , R n ¨ s Op 1 , n u y u s Op 1 , and n d y d n s Op 1 , by
Lemmas A.10, A.11, and A.12. Applying again Lemmas A.10 and A.11, Q nŽ ¨ . « ym < ¨ < q 2 c9B Ž ¨ ..
The process c9B Ž ¨ . is a Brownian motion with variance l s c9Vcf, so can be written as 'l W Ž ¨ .,
Q.E.D.
where W Ž ¨ . is a standard Brownian motion.
PROOF OF THEOREM 1: By Lemma A.9, a nŽg
ˆy g 0 . s argmax ¨ QnŽ ¨ . s Op Ž1., and by Lemma
A.13, Q nŽ ¨ . « QŽ ¨ .. The limit functional QŽ ¨ . is continuous, has a unique maximum, and
lim < ¨ < ª ` QŽ ¨ . s y` almost surely Žwhich is true since lim ¨ ª ` W Ž ¨ .r¨ s 0 almost surely.. It
therefore satisfies the conditions of Theorem 2.7 of Kim and Pollard Ž1990. which implies
an Ž g
ˆy g 0 . ªd argmax Q Ž ¨ . .
¨ gR
Making the change-of-variables ¨ s Ž lrm2 . r, noting the distributional equality W Ž a2 r . ' aW Ž r .,
and setting v s lrm2 , we can re-write the asymptotic distribution as
argmax w ym < ¨ < q 2'l W Ž ¨ .x s
y`- n -`
l
y`-r-`
' v argmax
y
s v argmax
y
y`-r-`
y`-r-`
PROOF
OF
y
argmax
m2
l
m
<r<
2
l
m
< r < q 2'l W
< r <q2
l
m
l
ž /
m2
r
WŽr.
qWŽ r . .
Q.E.D.
THEOREM 2: Lemma A.12 and Ž23. show that
ˆ , gˆ .. y Ž Sn Ž uˆ, g 0 . y Sn Ž uˆ, gˆ ..
sˆ LR n Ž g 0 . y Q n Ž ¨ˆ. s Ž Sn Ž uˆŽ g 0 . , g 0 . y Sn Ž Q
s Sn Ž uˆŽ g 0 . , g 0 . y Sn Ž uˆ, g 0 .
s Ž uˆŽ g 0 . y uˆ. 9 XgUX XgU Ž uˆŽ g 0 . y uˆ. ªp 0.
Now applying Lemma A.13 and the continuous mapping theorem,
LR n Ž g 0 . s
Qn Ž ¨ .
sˆ 2
q op Ž1. s
supn Q n Ž ¨ .
sˆ 2
q op Ž 1 . ªd
supn Q Ž ¨ .
s2
.
This limiting distribution equals
1
s2
sup w ym < ¨ < q 2'l W Ž ¨ .x s
n
'
1
s2
sup ym
l
s 2m
r
l
m2
r q 2'l W
l
ž /
m2
r
sup w y< r < q 2W Ž r .x s h 2j
r
by the change-of-variables ¨ s Ž lrm2 . r, the distributional equality by W Ž a2 r . ' aW Ž r ., and the fact
h 2 s lrŽ s 2m ..
To find the distribution function of j , note that j s 2 maxw j 1 , j 2 x, where j 1 s sup s F 0 w W Ž s . y 12 < s <x
and j 2 s sup 0 F s w W Ž s . y 21 < s <x. j 1 and j 2 are iid exponential random variables with distribution
function P Ž j 1 F x . s 1 y eyx Žsee Bhattacharya and Brockwell Ž1976... Thus
2
P Ž j F x . s P Ž 2 max w j 1 , j 2 x F x . s P Ž j 1 F xr2. P Ž j 2 F xr2. s Ž 1 y eyx r2 . .
PROOF OF THEOREM 3: Note that by the invariance property of the likelihood ratio test, LR nŽg 0 .
is invariant to reparameterizations, including those of the form g ª g * s FnŽg .. Since the threshold
variable qi only enters the model through the indicator variables qi F g 4, by picking FnŽ x . to be the
602
B. E. HANSEN
empirical distribution function of the qi , we see that
qi F g 4 s Fn Ž qi . F Fn Ž g .4 s
½
j
n
Fg*
5
for some 1 F j F n. Without loss of generality, we therefore will assume that qi s irn for the
remainder of the proof. Let j0 be the largest integer such that j0 rn- g 0 . Without loss of generality,
we can also set s 2 s 1.
If we set a s 0, the proof of Lemma A.13 shows that uniformly in ¨
Ž 53.
Q n Ž g 0 q ¨ rn . s yGnU Ž ¨ . q 2 d 9R n Ž ¨ . q op Ž 1 .
where for ¨ ) 0
GnU Ž ¨ . s
n
Ý Ž d 9 x i . 2 g 0 - qi F g 0 q ¨ rn4
is1
n
s
Ý Ž d 9 x i .2
is1
½
j0
n
- qi F
j0 q ¨
n
5
and
n
RnŽn . s
Ý x i ei
is1
½
j0
n
- qi F
j0 q n
n
5
.
While Lemma A.13 assumed that a ) 0, it can be shown that Ž53. continues to hold when a s 0.
X
Note that the processes GnU Ž ¨ . and dn R nŽ ¨ . are step functions with steps at integer-valued ¨ . Let
Nq denote the set of positive integers and DnŽ ¨ . be any continuous, strictly increasing function such
that GnU Ž k . s DnŽ k . for k g Nq. Let NnŽ ¨ . be a mean-zero Gaussian process with covariance kernel
E Ž Nn Ž ¨ 1 . Nn Ž ¨ 2 .. s Dn Ž ¨ 1 n ¨ 2 . .
Since d 9R nŽ ¨ . is a mean-zero Gaussian process with covariance kernel s 2 GnU Ž ¨ 1 n ¨ 2 ., the restriction of d 9R nŽ ¨ . to the positive integers has the same distribution as NnŽ ¨ ..
Since DnŽ ¨ . is strictly increasing, there exists a function ¨ s g Ž s . such that DnŽ g Ž s .. s s. Note
that NnŽ g Ž s .. s W Ž s . is a standard Brownian motion on Rq. Let Gqs s: g Ž s . g Nq4 .
It follows from Ž53. that
max
g) g 0
Qn Ž g .
sˆ 2
s max w yGnU Ž k . q 2 d 9R n Ž k .x q op Ž 1 .
kgN q
' max w yDn Ž k . q 2 Nn Ž k .x q op Ž 1 .
kgN q
s max w yDn Ž g Ž s .. q 2 Nn Ž g Ž s ..x q op Ž 1 .
sgG q
s max w ys q 2W Ž s .x q op Ž 1 .
sgG q
F max w ys q 2W Ž s .x q op Ž 1 . .
sgR q
We conclude that
LR n Ž g 0 . s max max
g) g 0
Qn Ž g .
sˆ
2
, max
g- g 0
Qn Ž g .
sˆ 2
F max max w y< s < q 2W Ž s .x , max w y< s < q 2W Ž s .x q op Ž 1 .
sG0
ªd
max
y`-s-`
sF0
w y< s < q 2W Ž s .x s j .
This shows that P Ž LR nŽg 0 . G x . F P Ž j G x . q oŽ1., as stated.
Q.E.D.
SAMPLE SPLITTING
603
REFERENCES
BAI, J. Ž1997.: ‘‘Estimation of a Change Point in Multiple Regression Models,’’ Re¨ iew of Economics
and Statistics, 79, 551]563.
BHATTACHARYA, P. K, and P. J. Brockwell Ž1976.: ‘‘The Minimum of an Additive Process with
Applications to Signal Estimation and Storage Theory,’’ Z. Wahrschein. Verw. Gebiete, 37, 51]75.
BILLINGSLEY, P. Ž1968.: Con¨ ergence of Probability Measures. New York: Wiley.
BREIMAN, L., J. L. FRIEDMAN, R. A. OLSHEN, AND C. J. STONE Ž1984.: Classification and Regression
Trees. Belmont, CA: Wadsworth.
CHAN, K. S. Ž1989.: ‘‘A Note on the Ergodicity of a Markov Chain,’’ Ad¨ ances in Applied Probability,
21, 702]704.
}}} Ž1993.: ‘‘Consistency and Limiting Distribution of the Least Squares Estimator of a
Threshold Autoregressive Model,’’ The Annals of Statistics, 21, 520]533.
CHAN, K. S., AND R. S. TSAY Ž1998.: ‘‘Limiting Properties of the Least Squares Estimator of a
Continuous Threshold Autoregressive Model,’’ Biometrika, 85, 413]426.
CHU, C. S., AND H. WHITE Ž1992.: ‘‘A Direct Test for Changing Trend,’’ Journal of Business and
Economic Statistics, 10, 289]299.
DAVIDSON, J. Ž1994.: Stochastic Limit Theory: An Introduction for Econometricians. Oxford: Oxford
University Press.
DUFOUR, J. M. Ž1997.: ‘‘Some Impossibility Theorems in Econometrics with Applications to
Structural Variables and Dynamic Models,’’ Econometrica, 65, 1365]1387.
D¨
UMBGEN, L. Ž1991.: ‘‘The Asymptotic Behavior of Some Nonparametric Change Point Estimators,’’
The Annals of Statistics, 19, 1471]1495.
DURLAUF, S. N., AND P. A. JOHNSON Ž1995.: ‘‘Multiple Regimes and Cross-Country Growth
Behavior,’’ Journal of Applied Econometrics, 10, 365]384.
HALL, P., AND C. C. HEYDE Ž1980.: Martingale Limit Theory and Its Application. New York: Academic
Press.
HANSEN, B. E. Ž1992.: ‘‘Tests for Parameter Instability in Regressions with IŽ1. Processes,’’ Journal
of Business and Economic Statistics, 10, 321]335.
}}} Ž1996.: ‘‘Inference when a Nuisance Parameter is not Identified under the Null Hypothesis,’’
Econometrica, 64, 413]430.
}}} Ž2000.: ‘‘Testing for Structural Change in Conditional Models,’’ Journal of Econometrics,
forthcoming.
HARDLE, W., AND O. LINTON Ž1994.: ‘‘Applied Nonparametric Methods,’’ in Handbook of Econometrics, Vol. IV, ed. by R. F. Engle and D. L. McFadden. Amsterdam: Elsevier Science, pp.
2295]2339.
IBRAGIMOV, I. A. Ž1975.: ‘‘A Note on the Central Limit Theorem for Dependent Random Variables,’’
Theory of Probability and Its Applications, 20, 135]141.
KIM, J., AND D. POLLARD Ž1990.: ‘‘Cube Root Asymptotics,’’ Annals of Statistics, 18, 191]219.
NEWEY, W. K., AND D. L. MCFADDEN Ž1994.: ‘‘Large Sample Estimation and Hypothesis Testing,’’
Handbook of Econometrics, Vol. IV, ed. by R. F. Engle and D. L. McFadden. New York: Elsevier,
pp. 2113]2245.
PELIGRAD, M. Ž1982.: ‘‘Invariance Principles for Mixing Sequences of Random Variables,’’ The
Annals of Probability, 10, 968]981.
PICARD, D. Ž1985.: ‘‘Testing and Estimating Change-Points in Time Series,’’ Ad¨ ances in Applied
Probability, 17, 841]867.
POTTER, S. M. Ž1995.: ‘‘A Nonlinear Approach to U.S. GNP,’’ Journal of Applied Econometrics, 2,
109]125.
TONG, H. Ž1983.: Threshold Models in Nonlinear Time Series Analysis: Lecture Notes in Statistics, 21.
Berlin: Springer.
}}} Ž1990.: Non-Linear Time Series: A Dynamical System Approach. Oxford: Oxford University
Press.
YAO, Y.-C. Ž1987.: ‘‘Approximating the Distribution of the ML Estimate of the Changepoint in a
Sequence of Independent R.V.’s,’’ The Annals of Statistics, 3, 1321]1328.