Estimation/Inference in Binary Choice Models

Qualitative Dependent
Variable Models
 Except for the LPM, estimation of binary

choice models is usually based on
maximum likelihood techniques.
Each observation is treated as if it was a
single draw from a Bernoulli distribution
 JHGLL, p. 44; Greene, p. 1026
 Consider an experiment in which only
two outcomes are possible, 0 or 1
 y, the outcome of the experiment, is
said to have a Bernoulli distribution
 P(y = 1) = p and P(y = 0) = 1 – p
where 0 ≤ p ≤ 1
 The PDF of y can be represented as:
p y 1  p 1 y
y  0,1
f  y | p  
 0 otherwise
y can either be 0 or 1
1
Qualitative Dependent
Variable Models
 Maximum likelihood estimation of
the
discrete choice model
 Define Pi* as the probability of
observing whatever value of y that
actually occurs for a given observation
*
Pi
 Pr  yi =1|X i  if yi = 1 observed

1  Pr  yi =1|X i  if yi = 0 observed
 Pr(yi=1|Xi)=F(Xiβ)
Pr(yi=0|Xi)=1 − F(Xiβ)
General CDF
This is the result
obtained from the latent
variable interpretation
2
Qualitative Dependent
Variable Models
 The model

From above we represent the success
probability by F(Xβ)
Assume we have independent
“draws”
The joint probability of sample size T is:
Pr  y1, y 2 ,..., y T | X  
combination
of 0’s and 1’s
P(yi = 1)
 1  F  Xiβ     F  Xiβ  
yi  0
yi 1
Product for those
obs. with yi = 0
Product for those
obs. with yi = 1
Bernoulli Result:
 p y 1  p 1 y
f  y | p  
 0 otherwise
y  0,1
Total of T obs: i.e., an
observation makes
only 1 contribution to
the joint PDF
3
Qualitative Dependent
Variable Models
 p y 1  p 1y y  0,1
f  y | p  
0 otherwise
 The total sample likelihood function for
General CDF
T observations is:
l β|X,y    F  Xβ  1-F  Xβ 
i1
T
yi
1-yi 
 Notice what happens when yi = 1 or 0
 The contribution of an observation to
the sample likelihood function will
depend on whether it has an observed
0 or 1 value
An observation makes only one
contribution to the likelihood
function: F(Xβ) or 1-F(Xβ)
 The total sample log-likelihood function:

i 1
T

L   yi ln  F  Xβ   1  yi  ln 1  F  Xβ 
4
Qualitative Dependent
Variable Models
 The FOC for maximization of the total
sample LLF are:
Note what happens
when y = 0 or 1
L T  f i
f i 
   yi  1-yi 
 Xi  0
β i 1  Fi
1  Fi  
where fi is the PDF ≡ dF/d(Zi)
X iβ
In general, the partial derivative
of a CDF wrt its argument is the
PDF evaluated at that argument’s
value:
CDF  Z 
 PDF  Z 
Z
With Zi  X iβ
CDF  Z 
 PDF  Zi  X i
β
5
Qualitative Dependent
Variable Models

i 1

T
L   yi ln  F  Xβ   1  yi  ln 1  F  Xβ 
 Choice of a particular functional form


for F(·) generates the empirical model
Remember if the distribution is
symmetric (e.g. normal, logistic) then
1− F(Xβ) = F(−Xβ)
If F(Xβ) is symmetric: qi ≡ 2yi – 1
When yi = 0 → qi = −1
When yi = 1 → qi = 1
→ you can simplify the above LLF:
T
L   ln F  q i Xiβ 
i 1
6
Qualitative Dependent
Variable Models
T
L   ln F  q i X iβ  q i  2yi  1
i 1
Fi  F  X iβ  f i  f  X iβ 
f i 
L T  f i
   yi  1-yi 
 Xi  0
β i 1  Fi
1  Fi  
 The FOC’s are a system of K nonlinear
functions of β!!
 Under most likely conditions this
likelihood function is globally concave
 Ensures uniqueness of the ML
parameter estimates
 Properties of Maximum Likelihood
parameter estimates
 Consistent, asymptotically efficient
and asymptotically normally
distributed around β
7
Qualitative Dependent
Variable Models
 How can we estimate
βl?
As we derived under the general ML
section:
1
2

 L 
1
In general: βl    E
  I β 
 ββ 
1
  L 
Newton-Raphson: βl   

 ββ β βl
2
1
 Li Li 
BHHH: βl   



β

β
i 1
 β β l
T
The NR and BHHH are asymptotically
equivalent but in small samples they
often provide different covariance
estimates for the same model
8
Qualitative Dependent
Variable Models
 The above shows the relationship
between the curvature of the LF and
βl.
 The size of the variance is
inversely related to the 2nd
derivative.
 The smaller the derivative the
larger the variance.
Smaller 2nd derivatives
• → flatter LF
• → the harder to find a max
and the less confidence in the
solution
9
Qualitative Dependent
Variable Models

 p y 1  p 1 y
f  y | p  
 0 otherwise
L  β|y,X  
y  0,1
  yi ln F  X iβ   1-yi  ln 1-F  X iβ  
T
i=1
 The above LLF does not assume a
particular CDF functional form
Standard Normal CDF: Probit Model
Logistic CDF: Logit Model
10
Qualitative Dependent
Variable Models
 If RV s is distributed standard normal:
s~N(0,1)
 s2 
1
exp   
 2
2


 PDF(s):   s  
s
 CDF(s):   s   

 t2 
1
exp    dt
 2
2


 If RV s is distributed logistic:
s~logistic(0,π2/3)
 PDF(s):   s  
exp  -s 
1+exp(s)
2
exp  s 
1
CDF(s)
:

s




1+ exp(s) 1+ exp( s)
11
Qualitative Dependent
Variable Models
 Remember for both the normal and
logistic distributions we have:
CDF  s 
 PDF  s 
s
 Lets compare the logit vs. standard
normal distributions
General Functional Forms
PDF’s
CDF’s
12
Qualitative Dependent
Variable Models
13
Qualitative Dependent
Variable Models
14
Qualitative Dependent
Variable Models
 Probit Model CDF:
Φ(Xβ)
Std. Normal PDF
d  Xβ 
   Xβ  βi via chain rule

dXi
In general:
  (s)
 s   s 
s
d  Xβ 
  βi Xβ   Xβ 

dX i
Z  Xβ 
  (Z)   (Z)  Z

  Z  Z  βi
 Xi
 Z  Xi
   Xβ   1    Xβ 
15
Qualitative Dependent
Variable Models
 Logit Model CDF:
 PDF:   Xβ  


d  Xβ 
dXi
d  Xβ 
dXi
 βi
  Xβ  
e
 Xβ 
1
 Xβ
1  e 
(  Xβ)  2
1+ e


 Xβ
e 
1+ e

(  Xβ)  2


 Xβ 

  βi   Xβ    Xβ  1  e
    Xβ   1    Xβ   1 

1
1+ e(  Xβ)
 Xβ
 Xβ
1  e 
1
e 



(  Xβ)
(  Xβ)
1+ e
1+ e
1+ e(  Xβ)
16
Qualitative Dependent
Variable Models
 Under the probit model we have:
y*=Xβ+ε
εi~N(0,1) → Pr(yi = 1) = Φ(Xiβ)
 Probit total sample log-likelihood
function
L  β|y,X;ε ~ N(0,1)  
std. normal CDF
  yi ln  Φ  Xiβ    1-yi  ln 1-Φ  Xiβ   
T
i 1
17
Qualitative Dependent
Variable Models
 The FOC for maximizing L(β|y,X) are
  X iβ 
  X iβ 
L
 
Xi  
Xi  0
β yi 0 1    X iβ 
yi 1   X i  
with λ 0i 
λ1i 
  X iβ 
1    X iβ 

  X iβ 
   X iβ 
  X iβ 
  X iβ 
Eq. 19.2.19 in JHGLL
L
  λ 0i X i   λ1i X i  0,
β yi 0
yi 1
L T q i  q i X iβ 
q i  (2yi  1) 

Xi  0
β i 1   q i X iβ 
In general:
  Xβ     Xβ 
1    Xβ      Xβ 
18
Qualitative Dependent
Variable Models
 We can further simplify the FOC’s for the
probit model to be the following:
T
L T qi  qi Xiβ 

Xi   λi Xi  0
β i 1   qi Xiβ 
i 1
qi  qi Xiβ 
where: λi  
i 1   q i X iβ 
T
qi  (2yi  1)
19
Qualitative Dependent
Variable Models
 One can show that the Hessian for the
probit model is:
Eq. 19.2.21 in JHGLL
2L
H

ββ
   X i β   X i β  X i β 

 yi

2
T
  X iβ 


Xi Xi
   X iβ  

i 1
  1  yi    Xiβ   Xiβ  Xiβ  
2




X
β


i


T
  λi  λi + Xiβ Xi Xi
i 1
T
where: λi  
i 1
qi  qi Xiβ 
  q i X iβ 
qi   2yi  1
20
Qualitative Dependent
Variable Models
 One can show that the above Hessian is

negative definite for all values of β
 → the LLF is globally concave
The asymptotic covariance matrix for the
ML estimator of β can be obtained from:
 The inverse of the Hessian evaluated at
βML (NR method):
T


λ
λ

X
β
X
X
i  i 
  i i
 i 1


2
iT1 λi Xi Xi

1
1
 BHHH estimator: 
 Based on the inverse of the expected
value of the Hessian evaluated at βML
T


E  H  =E   λ i  λ i + X iβ X i X i 
 i 1

Note there are yi’s in the Hessian
T
λi  
i 1
qi  qi Xiβ 
  q i X iβ 
qi  (2yi  1)
21
Qualitative Dependent
Variable Models
y*=Xβ+ε
 Under the logit model we have
εi ~logistically →


Pr y*i  0  Pr  yi  1| X i  
eXiβ
1 e
Xiβ
Logistic
CDF
   X iβ 
 Logit sample log-likelihood function:
L  β|y,X;ε ~ logistic  
 yiln    Xiβ   



i 1 
1-yi  ln 1-  Xiβ  

Similar
to probit
T
1    Xβ  
e
-Xβ 
1+ e(-Xβ)
 The FOC for maximizing logistic L are
L T
   yi  i Xi  0
β i 1
22
Qualitative Dependent
Variable Models
 One can show that the Hessian for the
logistic model is:
T
 2L
H
      Xiβ  1    Xiβ   Xi Xi 
ββ
i 1
 Note that the LLF Hessian does not
involve the RV yi unlike the Hessian
for the standard normal LLF
 One can show that the Hessian is
always positive definite
 →The LLF is globally concave
23
Qualitative Dependent
Variable Models
 Similar to the probit model, the
asymptotic covariance matrix for the ML
estimator of β can be obtained from our
ML-based methods:
 Based on the inverse of the Hessian
evaluated at βML

NR
 



      Xiβ  1    Xiβ   Xi Xi  
 i 1

T
1
 BHHH estimator:
1
T
2

B 

Σ     yi    Xiβ   Xi Xi 
 i 1

 With y is not in the Hessian,
ΣNR = Σ GN
24
Qualitative Dependent
Variable Models
L  β|y,X     yi ln F  Xiβ   1-yi  ln 1-F  Xiβ   
T
i=1
 In summary, probit and logit functional

forms are used to implement the above
log-likelihood based-models
In terms of hypothesis testing:
 Asymptotic characteristic of
parameter estimates obtained from
both logit and probit models:

 
βˆ ~ N β,cov βˆ
  2L 
 



β

β


1
 Functional form varies and depends
on whether estimating a Logit or
Probit model
25
Qualitative Dependent
Variable Models
 Statistical significance of single
parameter
β̂ k  β k0
z
~ N(0,1)
V βˆ
 k
 Tests of general hypotheses
H0: Rβ = r
J indep. hypotheses

 
H1: Rβ ≠ r where βˆ ~ N β,cov βˆ
Remember that that above implies:
 Rβˆ  r  ~ N  Rβ  r,R cov βˆ  R
True, unknown value
26
Qualitative Dependent
Variable Models
 Under H0
∂H0/∂β



H0

1

w  Rβˆ  r  Rcov βˆ R  Rβˆ  r


Cov(Rβ̂)
If λw > J → reject H0
λw/J ≈ FJ,T-K
χ2
~

2
χJ
 Likelihood Ratio Test

λ LR  2  L βˆ  L β R  ~ χ J2


unrestricted
coefficients
restricted
coefficients
27
Qualitative Dependent
Variable Models




1

w  Rβˆ  r  Rcov βˆ R  Rβˆ  r



 Lets test joint hypothesis that a subset of
coefficients (say the last M) are zero
 R=[0(K-M)|IM]
 r=0M
(MxM) submatrix
1 ˆ
W  βˆ M βM
βM
(Mx1) matrix
 ΣβM created first and then inverted
=Rcov(β)R′
28
Qualitative Dependent
Variable Models
 Joint test of all slope coefficients being 0

 Equivalent to the test that the model
explains significantly more of the
variability of the dependent variable
then naïve model of no exogenous
variable impacts
 For both logit and probit models the
restricted model can be represented as:
L(βR) = T[P*ln(P*)+(1-P*)ln(1-P*)]
where P* is the sample proportion of
observations that have y = 1
 No need to actually estimate a
restricted model
As Greene notes (p. 704), don’t use the
LR test to test the Probit versus Logit
model → no parameter restrictions
available to go from one to the other
29
Qualitative Dependent
Variable Models
 As noted above:

λ LR  2  L βˆ  L β R  ~ χ J2


 Similar to the overall equation Fstatistic in the CRM, you should
include the χ2 test statistic for jointly 0
exogenous variable coefficients as you
do not actually need to estimate a 2nd
model
 L(βR) = T[P*ln(P*) + (1-P*)ln(1-P*)]
sample proportion = 1
30
Qualitative Dependent
Variable Models
 Whatever the assumed error term
distribution, the parameters of the
discrete choice model are not the
marginal effects
 In terms of the observed dependent (0/1)
variable with the general CDF [F]:
E(y|X) = 0[1-F(Xβ)] + 1F(Xβ) = F(Xβ)
Two possible
values of Y
Z ≡ Xβ
 In general we have:
PDF
E  y|X  F  Xβ 

β  f  Xβ  β
X
  Xβ 
F  Z  Z
Z X
F  Z 
Z
31
Qualitative Dependent
Variable Models
 For the standard normal distribution
E  y|X    Xβ 

β=  Xβ  β
X
  Xβ 
 For the logistic distribution
PDF>0
  Xβ 
   Xβ  1    Xβ 
  Xβ 
E  Y|X 

   Xβ  1    Xβ  β
X
Greene, p. 775
 The above implies the marginal effect
 Has the same shape as the associated
PDF
 Inflated or deflated by βi
32
Qualitative Dependent
Variable Models
 One can evaluate the above expressions
 At sample means of your data
 At every observation and use sample
average of individual marginal effects
 For small/moderate sized samples you
may get different marginal effects
depending on method used.
 Train(2003) has a good discussion of
predictions and the pros/cons of each
method
 When one has a dummy variable (D)
 In general the derivative is with
respect to a small continuous change
 →It is not appropriate to apply when
analyzing effect of a dummy variable33
Qualitative Dependent
Variable Models
 For discrete exogenous variables the
appropriate marginal effect (ME) is:
E(y)
ME 
 Pr  y  1| X d ,D  1 
D
Pr  y  1| X d ,D  0 
Mean of other
exogenous variables
Pr(y = 1)= F(Xβ|X*,D,β)
34
Qualitative Dependent
Variable Models
 Under logit and probit specifications of
the discrete choice model:
 predicted probabilities and estimated
marginal effects (for either continuous
or discreet cases) are nonlinear
functions of parameters.
 To compute standard errors one can
use the linear approximation (delta)
method (Greene, p.68)
 Can use numerical methods or use
analytical results when implementing
the delta method
35
Qualitative Dependent
Variable Models
 For example, the variance of the
predicted probability of undertaking the
activity:
X* is a point
  
Asy.Var Fˆ X*βˆ 
 
of evaluation
Asy.Var(β)
 
 Fˆ X*βˆ   Fˆ X*βˆ 
Predicted

  

Probability
 β  β  β 




estimated coeff.
F̂ X*βˆ
with
 fˆ X*βˆ X*
β
via chain rule
 
Predicted PDF
 
(1 x 1)
    
*ˆ
 Asy.Var Fˆ X β  fˆ X*βˆ
2
X*β X*
Note that the above variance depends
on the point of evaluation, X*
36
Qualitative Dependent
Variable Models
 What are the variances of changes
associated with dummy variables?
ˆ = 1)  (F|D
ˆ = 0)
Fˆ  (F|D
Predicted Probability
 
  
 Fˆ X*βˆ   Fˆ X*βˆ
  
Asy.Var.Fˆ  

 β
β
β



F̂ X*βˆ
 fˆ1 X*D=1βˆ  X*D=1 


β K
 






βˆ    X


fˆ X*
 0 D=0
X*D=1   X D 1 , X*D=0   X D
*
D=0
0
Mean of variables except D
37
Qualitative Dependent
Variable Models
 What is the variance of the marginal
effects of a change in X on the
probability of occurrence where X is
continuous?
 i.e., is the Prob. of event related to X?
 Lets define ∂F(Xβ)/∂X as γ where
PDF
ˆ ˆ
γ̂  f(Xβ)β
Marginal impact of X
 The variance of the above marginal
effects can be obtained from:
 γˆ   γˆ 
Asy.Var. γˆ     β  
ˆ
ˆ
 β   β 
We are taking the derivative of
the marginal effect wrt the β’s
What do these derivative look like?
38
Qualitative Dependent
Variable Models
ˆ
F(Xβ)
ˆ ˆ
γ̂ 
 f(Xβ)β
X
X* is a point
of evaluation
 With z ≡ X*β
we have via the chain rule:
γ̂
βˆ  df  z  z  ˆ
 f z

β

βˆ 
βˆ   dz βˆ  
df  z  ˆ *
 f  z  IK 
βX 
dz
Note at X*
f(z) is a scaler and
this is a vector
Functional form will vary
across error specification,
i.e., probit or logit model
39
Qualitative Dependent
Variable Models
df  z  ˆ *
γ̂
*ˆ
 f  X β  IK 
βX 
dz
βˆ 
γ is the marginal effect
on Pr(y=1) of a change
in X
 γˆ   γˆ 
Asy.Var. γˆ     β  
ˆ
ˆ
 β   β 
 For the probit model we have:
z ≡ Xβ
df/dz=dφ/dz= -zφ(X*β)
standard normal PDF
 Asy.Var. γˆ  

 
 
 X*βˆ I   X*βˆ βˆ X* X*β  
K

 
 
a scalar
 
*ˆ
 X β
 



ˆ

  βX  X β

 X*βˆ I   X*βˆ βˆ X* X*β 
K


2


 I  βˆ X* X*β  
 K
 
I
 K
*
*
With K parameters, the above
covariance matrix will be (KxK)
40
Qualitative Dependent
Variable Models
df  z  ˆ *
γ̂
*ˆ
 f  X β  IK 
βX 
1
dz
βˆ 
 z 

γ is the marginal effect
1  exp(-z)
 γˆ   γˆ 
Asy.Var. γˆ     β  
ˆ
ˆ
 β   β 
 For the logit model we have:

z ≡ X* β

ˆ z 1  Λ
ˆ z 
fˆ  z   Λ
ˆ z
dΛ
dfˆ
ˆ z
 1  2Λ
dz
dz
ˆ z Λ
ˆ z 1 Λ
ˆ z 
 1  2Λ








ˆ z 1 Λ
ˆ z 
Asy.Var  γˆ    Λ


2
f(z)2
ˆ  z  X*β    I  1  2Λ
ˆ  z  X*β 
 IK  1  2Λ

  K





41
Qualitative Dependent
Variable Models
 Goodness-of-Fit measures for discrete
choice models
 Dependent variable is not continuous
so the use of R2 not appropriate
 Should always report:
 The LLF value for the full model
 The LLF value with only the
intercept which can be calculated
directly from the data
L(βR) = T[P ln(P) + (1-P)ln(1-P)]
where P is sample proportion with
y = 1 = average value of y
 →you only have to run 1 model to
obtain both LLF values
42
Qualitative Dependent
Variable Models
 Likelihood Ratio Index (Pseudo R2 Value)
LU less negative
than L0
Unrestricted L-value
L U L-value when all exogenous
LRI  1 
L0 variable coefficients set to 0
 Bounded between 0 & 1
 If all slope coefficients are indeed 0
→ LU=L0→LRI=0
 No way to make LRI=1 although one
can come close.
Estimated Φi(Xiβ) = 1 when y=1
Φi(Xiβ) = 0 when yi = 0
→LU = 0 (perfect fit) → LRI=1
 Again, you only need to estimate the
original model, LU and L0:
L0 = T[P ln(P) + (1-P)ln(1-P)]
43
Qualitative Dependent
Variable Models
 Likelihood Ratio Index (Pseudo R2 Value)
 The value has no natural interpretation
with a value between 0 and 1 like R2
 However when comparing 2 models
estimated with the same data and with
same 0/1 choice (e.g., L0 the same
under both) it is usually valid to say
that the model with the higher LRI
value is preferred
 e.g. ↑ the value of L(•) is preferred
 →2 models estimated using nonidentical samples or w/different
alternatives can’t be compared using
LRI
44
Qualitative Dependent
Variable Models
 Ben-Akiva and Lerman(1985) define a
measure of explanatory power based on
the probability of correct prediction:
R 2BL



1 T ˆ
  yi Fi  1  yi  1  Fˆi 

T i 1 
predicted
prob of occur.
 One problem is that for unbalanced
data, less frequent outcomes are
usually predicted poorly which may
not be readily apparent in the above
Cramer(1999) has suggested a measure
that addresses the above shortcoming:


 C  Fˆ | yi  1  Fˆ | yi  0
avg. pred.
probability



Higher values
are better

 1-Fˆ | yi  0  1-Fˆ | yi  1
 Penalizes incorrect prediction
 Given conditional means, not impacted
by disproportionate sample sizes
45
Qualitative Dependent
Variable Models
 Contingency Table used to summarize

results
 Categorizes hits/misses via the
following prediction rule
*
ˆ
ŷ

1
if
F
>
F
 i
, some critical value,
i
0 otherwise
 F* usually set to 0.5
Note, under the naïve model (everyone
has predicted value of 1)
 One always predicts 100P percent of
observations correctly where P is the
sample proportion with y=1
 → naïve predictor never has a zero fit
 If the sample is unbalanced (e.g. many
more 0’s or 1’s) you may not predict
either a 1 or 0 using the above rule
ŷi  1 if Fˆi > F*
46
Qualitative Dependent
Variable Models
 Contingency Table
Fi < F *
TA
TB
Fi ≥ F*
Predicted Values
Observed Values
y=0
y=1
TC
TD
TA+TB+TC+TD=T
Correctly “predicted”
You do not need to use 0.5 for F*
May want to set F* relatively high (e.g. 0.75)
47
Qualitative Dependent
Variable Models
 Adjusted Count R2 (Long)
 With a binary choice (0/1) model you
will always have a “good” prediction
as it is possible to predict at least 50%
of the cases by choosing the outcome
category w/the largest % of observed
cases (0’s or 1’s)
For example, 57% of sample is in
the paid labor force
→ if your model predicts all
individuals are working you will be
correct 57% of the time
 The Adjusted Count R2 controls for
this
48
Qualitative Dependent
Variable Models
 Adjusted Count R2
No. of correct quesses
NC
 n jj  Max  N r+ 
R 2AdjCount 
j=1
T -Max  N r+ 
 NC = number of choices
 Nr+= marginal count of contingency
table for rth row (e.g., sum of rth row)
 Max(Nr+) ≡ maximum Nr+ value
 njj= count with the jth row and column
(e.g., correct prediction)
 R2AdjCount = proportion of correct
“guesses” beyond the number that
would be correctly guessed by choose
the largest marginal (e.g. predicting
everyone works)
Max(Nr+)
49
Qualitative Dependent
Variable Models
 Akaike’s Information Criterion (AIC)
(ah-kah-ee-kay’s)
-2L U +2K
AIC =
T
Number of RHS
including intercept
 -2LU ranges from 0 to +∞ w/smaller
values indicating a better fitting model
Remember LU is ≤ 0 (i.e., sum of
ln(Prob)) L β|y,X   T  yi ln F  Xiβ   1-yi  ln 1-F  Xiβ 
i=1
 As the number of parameters ↑
−2LU becomes smaller (better fit)
2K is added as a penalty for ↑ number
of parameters
 Since number of obs.impacts total sample
log-likelihood function value, LU, is
divided by T to obtain per obs. value
50
Qualitative Dependent
Variable Models
 Akaike’s Information Criterion (AIC)
Used to compare models with different
specifications
Used to compare models with different
sample sizes
Why we divide by T
Compare non-nested models that
cannot be evaluated with the LR test
In general, the model with the smaller
AIC value is considered to be the
better fitting model specification
Remember, model “fit” is only one
criteria in evaluation
e.g., individual marginal effect
results
51
Qualitative Dependent
Variable Models
 An example discrete choice problem
Choice of Auto or Bus for commuting
Net utility obtained from using an auto
to commute
Determined by the difference in
commute time (TD) between the
two modes
TD ≡ Bus Time - Auto Time
Hypothesize a positive impact of an
increase in TD on the probability of
choosing to use an auto to commute
Data for 21 commuters
52
Qualitative Dependent
Variable Models
 Example Discrete Choice problem
Auto
Time
Bus
Time
TD
Auto
Auto
Time
Bus
Time
TD
Auto
53
4.4
-49
0
19
84
66
1
4.1
29
24
0
82
38
-44
1
4.1
87
83
1
8.6
1.6
-7
0
56
32
-25
0
23
74
52
1
52
20
-32
0
51
84
32
1
0.2
91
91
1
81
19
-62
0
28
80
52
1
51
85
34
1
90
2.2
-88
0
62
90
28
1
42
25
-17
0
95
22
-73
0
95
44
-52
0
42
92
50
1
99
8.4
-91
0
RHS variable
≡ Bus Time –
Auto Time
Observed Dependent
Variable Values
Note that the displayed
TD values are rounded
53
Qualitative Dependent
Variable Models
 Commuting choice problem summary

 yi*(Net Utility) = β1 + β2TDi + εi
Log-likelihood function with εi~N(0,1)
Std. Normal CDF
Assume errors are normally distributed,
homoscedastic and non-autocorrelated
L  β     yi ln    Xiβ    1  yi  ln 1    Xiβ   
i 1
T
Given data, a plot of L(β) over
all possible values of β1 and β2
54
Qualitative Dependent
Variable Models
ML Estimation of Probit Model
Proc Defining
Likelihood Func.
CRM
Starting Values
Dep. &
Exog.
Variables
Analytical
Gradients
Maximum Likelihood
Procedure (NR)
Analytical
Hessian
Estimates of
, , LLF
Likelihood Ratio Test:
2= 3=…= K = 0
Marginal
Effects
55
Qualitative Dependent
Variable Models
 An example discrete choice problem
 Overview of MATLAB code
 Summary of Probit results
 Marginal impacts on probability of
auto use as bus commute time
standard normal PDF.
increases
 Under the probit model:
dΦ  Xβ 
 βTD  Xβ 
dTD
Probability of auto use
 Takes the same shape as the pdf
 The variance of the marginal effect
calculated via delta method since
these are nonlinear functions of the
parameters
56
Qualitative Dependent
Variable Models
 Heteroscedasticity in the binary choice

model
 As noted by Greene (p.787-790),
unlike the CRM, when we have
heteroscedasticity in the binary choice
model → ML estimators are
 Inconsistent
 Traditionally evaluated covariance
matrix is inappropriate
Lets look at the example given in Greene
(p. 789-790) were he incorporates
multiplicative heteroscedasticity
 This framework can be applied to
both the logit and probit specifications
 Latent regression: y* = Xβ + ε where
Var(ε) =[exp(Zγ|y,X)]2
E(ε) = 0
Note square term
57
Qualitative Dependent
Variable Models Latent model
e ~N(0,σ )
t
t
2
 Given this assumption we have:
result
P  y t  1  P  e t  X tβ  

et
X tβ  Given exp(Ztγ)>0
P


 exp  Zt γ  exp  Z t γ  
 X tβ 
et
 
~ N  0,1
 given
exp  Zt γ 
 exp  Zt γ  
Std. Normal CDF
Std. deviation
 The resulting sample log-likelihood
function for the heteroscedastic probit
model is: 

 X tβ 
 y t ln  


T 

 exp  Zt γ  
L 


t 1 
 X tβ  
1  y t  ln 1   
 


 exp  Z t γ    

Qualitative Dependent
Variable Models
 Given the normality assumption, the
gradients of the sample log-likelihood are
(Greene p. 789):
L T    X t β   y t    X tβ   
 
 exp   Z t γ  Xt
β t 1    X tβ  1    X tβ   
L T    X t β   y t  Φ  X tβ   
 
 exp   Z t γ  Z  X tβ 
γ t 1    X tβ  1  Φ  X tβ   
 The above gradients imply that the loglikelihood could be difficult to maximize
Greene (p. 789) notes that for
identification purposes, in order to
estimate all model parameters, the Z
matrix cannot have a constant term
Remember the restricted model has
σ2=1 (i.e., γj=0, for all j)
59
Qualitative Dependent
Variable Models
P  yt  1    X t β exp  Zt γ  
 Note what happens when variable wk is
in both X and Z
Unclear as to sign
 Xβ  β k   Xβ  γ k
Prob  y=1|X,Z 


w k
 exp  Zγ   exp  Zγ 
?
+
 Only the 1st term applies if wk appears
Same sign as βk
in X but not Z
 Xβ  β k
Prob  y=1|X,Z 


w k
 exp  Zγ   exp  Zγ 
 Only the 2nd term applies if wk
appears in Z but not X
Prob  y=1|X,Z 
w k
Note the “–” sign
 Xβ   Xβ  γ k
  

 exp  Zγ   exp  Zγ 
Unclear as to sign
60
Qualitative Dependent
Variable Models
 Lets look at the labor supply example

presented in Greene (p. 789-790)
 Can motivate this model using the
latent variable approach
Individuals make decision whether
to accept an offer of employment
depending on whether wage offer
exceeds reservation wage (offered
wage – reservation wage > 0?)
 Reservation wage determined by
VMP in home production:
educational attainment, age,
presence of children in the
household, other sources of income
and marginal tax rates
We use Mroz’s (1987) data on labor
supply of married women from 1975
61
Qualitative Dependent
Variable Models
 Under the heteroscedastic discrete choice
model we developed earlier we have:

y*t  X tβ+ε t where ε t ~ N 0, exp  Z t γ  
2

 X tβ 
εt
P  y t  1   
~ N  0,1
,
 exp  Z t γ   exp  Z t γ 
Std. Normal CDF
 As we saw earlier, the probit model
sample log-likelihood function is:


 X tβ 
 y t ln  


T 

 exp  Zt γ  
L 


t 1 
 X tβ   
1  y t  ln 1   

 

exp  Zt γ   





62
Qualitative Dependent
Variable Models
 753 observations/428 were participants in
the formal labor market
 Prob(LFP=1)=F(Age, Age2,HHIncome,
Educ)
 LFP ≡ 1 if woman worked in 1975
 Age ≡ Wife’s age (years)
 HHIncome ≡ Family income in 1975$
 Educ ≡ Wife’s educational attainment
(years)
63
Qualitative Dependent
Variable Models
 Lets review the MATLAB program used

to estimate this model
Initially, homoscedastic error term
Estimate 3 models
 Full sample
 Kids present sub-sample
 Kids not present sub-sample
Evaluate marginal effects
→ the marginal effect of the age
variable using traditional formula is
not correct given presence of Age2
 In the program we use a separate
function to estimate marginal
impacts given quadratic age effect
64
Qualitative Dependent
Variable Models
 After we estimate the full homoscedastic


model, lets test the following null
hypothesis using above MATLAB Code:
 H0: No statistical difference in
coefficients whether or not children are
present (i.e., at least one is different)
Restricted model based on pooled (full)
sample with rhs variables of Age, Age2,
HHIncome, and Educ
LLFR= -496.87
Unrestricted model obtained from
estimating the probit for two mutually
exclusive subgroups
 Kids=1 (524 obs)
Kids=0 (229 obs) 753 Obs.
 → all estimated coefficients could
differ depending on kid status
65
Qualitative Dependent
Variable Models
 Unrestricted Model

 LLFKid=1 = – 347.87 (524 obs)
LLFKid=0 = – 141.61 (229 obs)
 Total sample LLF for unrestricted
model is the sum of the above LLFs
since all observations accounted for
only once → LLFU = – 347.87 –
141.61 = – 489.48 (753 obs)
χ2 for testing the 5 restrictions is:
LR=2[– 489.48 – (– 496.87)]=14.77
Unrestricted LLF
Restricted LLF
from full model
 Critical χ2.05,5=11.07
 Reject null hypothesis of no statistical
difference in estimated coefficients
across child present sub-groups
66
Qualitative Dependent
Variable Models
 Lets now estimate a probit model of labor
supply where we allow for the possibility
of heteroscedastic errors in the latent
regression equation
 y* = Xβ + ε, where εt ~ N(0,σ2t)
=1 if #of Kids > 0

 Var(εt)=[exp(γ1Kidt + γ2HHIncomet)]2
 X matrix composed of Intercept, Age,
Age2, HHIncome, Educ, and Kid
 Note no intercept term in variance
function
γ1 = γ2 = 0→ Var(εt)=e0=1
→homoscedastic
I developed some MATLAB code to
estimate the heteroscedastic probit models
 Estimate both homoscedastic and
heteroscedastic specifications
67
Qualitative Dependent
Variable Models
 Prob(LFP=1)=F(Age, Age2,HHIncome,
Educ, Kid)
 LFP ≡ 1 if woman worked in 1975
 Age ≡ Wife’s age (years)
 HHIncome ≡ Family income in 1975 $
 Educ ≡ Wife’s educational attainment
(years)
 Kid ≡ 1 if children < 18 years old
 Var(εt)=[exp(γ1Kidt + γ2HHIncomet)]2
68
Qualitative Dependent
Variable Models
 Comparison of Results:
Homoscedastic
Heteroscedastic
Variable Coef. Std Err Coef. Std Err
Inter. -4.1568 1.4021 -6.0299 2.4984
Age 0.1854 0.0660 0.2643 0.1182
Age2 -0.0024 0.0008 -0.0036 0.0014
Income
Edu
0.0458 0.0421
0.0982 0.0230
0.4244
0.1402
0.2219
0.0519
Kid -0.4490 0.1309 -0.8791 0.3028
Heteroscedastic Component
Kid --------- -0.1407 0.3237
Income
LLF
---------490.847
0.3129 0.1228
-487.636
β=0: χ2(5)= 48.05 γ=0: χ2(2)= 6.42
LR Test: 2
χ (.05,5)Crit=11.07 χ2(.05,2)Crit = 5.99
69
Qualitative Dependent
Variable Models
 Homoscedastic probit:

marginal impacts
of age on probability of LFP
 Use a separate procedure to estimate
marginal impacts given quadratic
function wrt age
 Negative and significant age impact
“+” age variable
“–” age2 variable
Use discrete change effect formulas to
evaluate whether or not the presence of
KIDS impacts LFP under the
homoscedastic probit model
 Eq. 23-25, 23-26 p. 781 in Greene
 Negative and significant impact
70
Qualitative Dependent
Variable Models
 Lets test heteroscedasticity versus
homoscedasticity
 H0: Homescedastic Probit
(i.e., γ1 = γ2 = 0)
H1: Heteroscedastic Probit
 The 95% χ2 critical value for 2
restrictions is 5.99
 χ2 tests of whether our Probit is
homoscedastic versus heteroscedastic
LR value: 6.42
Wald Statistic: : 6.43
71
Qualitative Dependent
Variable Models
 χ2 tests of whether our Probit is
homoscedastic versus heteroscedastic
LM value: 2.24
LM ≡ g′Σβg where g is the 1st deriv.
of the sample LLF of the
unrestricted (heteroscedastic) model
evaluated at restricted parameter
values (homoscedastic)
I use BHHH estimate of parameter
variance (evaluated at restricted
values)
As noted by Greene, the covariance
estimate based on E(H) may be
preferred
 LM inconsistent with LR and Wald
Our use of the BHHH estimator may
be the reason
72
Qualitative Dependent
Variable Models
 Separate procedure used to calculate
marginal effects of variables under
heteroscedastic formulation
 Using formulas from p. 788 in Greene
 Xβ  β k   Xβ  γ k
Prob  y=1|X,Z 


w k
 exp  Zγ   exp  Zγ 

 Used variables “in_x” and “in_z” to
determine correct formula
 Note: the AGE impacts are not correct
due to nonlinear relationship nor is the
KIDS impacts due to discrete nature
Natural extension of the
homoscedastic probit results
Marginal income effect increases from
0.017 under homo. and not significant to
0.038 and a Z-stat of 1.8 under hetero. 73
Qualitative Dependent
Variable Models
 These marginal impacts are not really

meaningful given the nature of the
dependent variable and units of income for
example
 Need to evaluate these impacts in
elasticity terms, ξw
How would one go about this?
 X tβ 
et
P  yt  1   
,
~ N  0,1
 exp  Z γ   exp  Z γ 
t
t


ˆˆ
*
Prob y=1|X* ,Z* ,β,γ
w
k
ˆw k 
ˆˆ
w k
Prob y=1|X* ,Z* ,β,γ




Predicted value
 X*,Z* usually set at sample mean
values but can vary depending on study
being undertaken
74