Summary Statistical model Information geometry of

 Information geometry of
Statistical inference
with selective sample
S. Eguchi, ISM & GUAS
Local Sensitivity Approximation
for Selectivity Bias.
J. Copas and S. Eguchi
J. Royal Statist. Soc. B, 63 (2001), 871-895.
(http://www.ism.ac.jp/~eguchi/recent_preprint.html)
This talk is a part of co-work with
J. Copas, University of Warwick
Summary
• Ignorability
• Selection bias
• Sensitivity analysis
⋅ Tubular neiborhood # of model M
⋅ Compare the inference
for the near model M ε of # with for M
⋅ Get interpretable bounds
Near parametric
( X 1 , ...., X n ) ~ g ( x1 ,..., xn )
min KL( g , f ( ⋅ ,θ )) ≤ O ( n − β )
θ ∈Θ
β = 0 ⇐Ÿ
non-parametric
near-parametric
β = ∞ ⇐Ÿ
exact parametric
Statistical model
Probability model
P ( X ∈ B ) =³ p( x ) µ (dx )
B
Statistical model
iid
( X 1 , ...., X n ) ~ p( x,θ )
x ∈ X ⊂ R p, θ ∈ Θ ⊂ Rd
Ignorability
Y : response variable
X : covariate variable
Z : observational status
Def. (X, Y, Z) is ignorable
⇔ Y
Z X
1
Observational status
Binary response with missing
Exm.1. Z is missing(censering) indicator.
observed
if Z = 1
­
Y is ®
¯missing (censoring) if Z = 0
Exm 2. Z is group label for allocation process.
Y | Z ∼ f ( y,θ z ) for z = 1, , G
Non-ignorable cases
pr(Y = 1 | Z = 1) =
θ p
θ p + (1 − θ ) p
1
1
model
y =1
y=0
1
1
0
θ
1 −θ
where
Exm ' 2
Y
§
pr (Z = 1 | y ) ·
η 2 = Var ¨¨ log
¸
(Z = 0 | y ) ¸¹
pr
©
0
~
θ =
y=0
f
?
n− f
?
n
N
?
?
N
Distributional expression
Def 2
f
θˆ(1 − θˆ)
MLE θˆ =
, variance s 2 =
n
n
Small degree of non-ignorability
data
0
N = sample size , n : nb of observed
f = frequency of Y = 1
( py = P(Z = 1| y) )
y =1
(1−θ ) p
θp
z = 0 θ (1− p ) (1−θ )(1− p )
z =1
Y be a binary response with missing index Z
θ = pr(Y = 1)
Z | X
⇔
fYZ ( y , z | x ) = fY ( y ,θ ) f Z ( z ,ψ )
p = dim θ , q = dim ψ
f YZ = θ y (1 − θ )1− yψ z (1 − ψ )1− z
fp0
N −n
~
, θη = θˆ ± n s
η + O (η 2 )
fp0 + (n − f ) p1
N
~
Compare θˆ with θη !
How small is " small η " ?
Tangent space and neighborhood
Def. 3
# = ­® gYZ : min KL( gYZ , f YZ ) < ε ½¾
θ ,ψ
¯
¿
Ε f Y {λ (Y , z )} = Ε f Z {λ ( y , Z )} = 0
­°
Λ = ® λ ( y, z ) :
Var f YZ {λ ( y , z )} = 1
°¯
½°
¾
°¿
2
Exponential map
Tubular Neighborhood
g YZ ( y , z ) := f YZ ( y,z )exp {ελ ( y , z ) − κ (ε )}
(∀ε << ε 0 )
λ ∈ Λ Ÿ g YZ ∈ #
KL ( g YZ , f YZ ) =
#ε
ε
+ O (ε 3 )
2
2
M
Mε
Decomposition
tangent and normal
λ ( y, z ) = ¦ λ ij ui ( y,θ ) v j ( z,ψ ) = λ ⋅⋅ u⋅ v⋅
i, j
where
−
1
2
i =1
(u1, , u p ) = Iθ SY ( y,θ )
T
−
T
p
λ ( y , z ) = ¦ λi . ui v. +
q
(v1 , , vq ) = Iψ S Z ( z , x)
(
¦λ
2
ij
j =1
i
¦ λ.
j
∞
u. v j
j = q +1
Ε f z (v j vk ) = δ jk
=1 )
Conditional Distribution
−
1
gY |Z ( y | z ) = f Y ( y ,θ + ε Iθ 2ω z )(1 + ε
1
−
2
ψ
g Z |Y ( z | y ) = f Z ( z , ψ + ε I ω )(1 + ε
when
i
i = p +1
= ¦ λ . j u. v j +
1
2
Ε fY (ui u j ) = δ ij
∞
¦ λ . u v.
*
y
∞
¦λ
i
.ui v.)
i = p +1
∞
¦λ
j
.u.v j )
j = q +1
ω z = (λ 1.v.,, λ p.v.)T
ω = (λ .1 u.,, λ .q u.)
*
y
1
η z2 = Var f (log g Z |Y ( z | y ))
Def. 4.
Υ
•
1
2
η ≅ KL( fΥ , gΥ |Z )
2
z
• Ε f Z (η z2 ) ≅ ε 2
T
{
−
~
• θ = θ + ε Iθ 2ω z = argsolve E gY |Z S (y, θ
θ*
Calibration
*
)= 0 }
• ε 2ω zT ω z ≤ η z2
§
·
¨¨ " =" ⇔ λi = 0 ¸¸
v
=
p
+
1
,..
i
©
¹
3
Rosenbaum’s log odd ratio
­ Ρ( r > 0 | y ) Ρ(r ≤ 0) ½
η = log ®
¾
¯ Ρ (r > 0) Ρ( r ≤ 0 | y ) ¿
Counterfactual
Testing a hypothesis : H 0 : ε = 0
Score test
≅ const ε
T =N
−
1 N
2
¦
λ ( y k , z k ) ∼ N ( 0 , 1)
k =1
Reject H 0 if | T |> 2 .
Guide line
Non-ignorable missing
Testing a hypothesis : H 0 : ε = 0
1
1
§
·
§
·
Φ ¨¨ − 2 + N 2 ε ¸¸ + Φ ¨¨ − 2 − N 2 ε ¸¸
©
¹
©
¹
local power
ε=
­
½
N
⇔ η = 2®
¾
n
N
−
n
(
)
¯
¿
2
N
1
2
log - likelihood
L(θ , ε ) =
1
2
~
MLE θ ε = argmax L(θ , ε )
θ |ε
•
•
n
¦ log f ( y ,θ )
n
k
−
1
k =1
∞
n
¦ λ u ( y ,θ ) v (1,ψ )
i1 i
k
1
k =1 i = p +1
+ log ψ + (N − n ) log (1 −ψ ) + O(ε 2 )
Can we estimateε ?
max L(θ , ε ) = L(θˆ,0)
θ |ε
2
~
θ ε = θˆ − ε Ιθˆ ω1 + O(ε 2 )
1
−
2
·
§
~
~
(θ ε − θˆ) Τ Ιθˆ (θ ε − θˆ) ≅ ε 2ω1Τω1 ≤ η12 ¨¨ " =" ⇔ λi = 0 ¸¸
(i = p −1,.. ) ¹
©
1
N − n − 12
~
If p = 1 , θ ε = θˆ ± n 2
Ι ˆη
N θ
Υ
k =1
+ ε ¦ ω1Τ Iθ 2 S ( y K ,θ ) + ε ¦
Selectivity region
•
z −ψ
ε2
, η2 =
ψ (1 −ψ )
ψ (1 −ψ )
v1 (z ,ψ ) =
−
nσ *2 §
u* · nu*2
¨¨ ε − *2 ¸¸ + *2 + O(ε 3 )
2 © σ ¹ 2σ
where
u* =
1 ∞
¦ λi1ui ( yk ,θˆ) v1 (1,ψˆ ), σ *2 = VarfY (u* )
n i= p+1
4
Unstable or Misspecifying
εˆ =
u*
σˆ *2
∞
¦λ
is unstable if
2
i1
Regression formulation
{ xk ; k = 1, , N } fully observed
=0
θ k = θ ( β T xk ), ψ k = ψ (α T x k ), p k = f z (1, ψ k )
i =1
n
L (α , β ) = ¦ log f Y ( y k , θ ( β T x k ))
Otherwise
k =1
n
2
Heckman model
R>0
R≤0
R > 0 ⇔ Z =1
R≤0 ⇔ Z =0
n
n
k =1
k =1
+ ¦ log ψ (α T x k ) + ¦ log (1 − ψ (α T x k )) + O (ε 2 )
Likelihood
§§ 0· § 1 ε ··
§ e1 ·
¨¨ ¸¸ ∼ N ¨¨ ¨¨ ¸¸ ,¨¨
¸¸ ¸¸
e
© 2¹
©© 0¹ ©ε 1 ¹¹
R = γ T X + e2
−
k =1
λ31 ≠ 0 Ÿ u * includes skewness statistics
Y = β T X + σ e1
1
+ ¦ ε k ω1k I 2 (θ k ) S ( y k , θ ( β T xk ))
f ( y ,θ ) ∼ N( µ , σ ), θ = ( µ , σ )
u3 ( y ) ← Hermitte poly
2
Y is observed
Y is missing
1
y−βTx
φ(
)
σ
σ
T
γ x
ε
y − βTx
×Φ (
+
)/Φ (γ T x)
σ
1− ε 2
1− ε 2
f ( y | x, r > 0) =
E( y | x, r > 0) = β T x + σε λ (γ T x)
§ γTx
ε
y− βTx·
¸
P( r > 0 | x, y ) = Φ ¨¨
+
2
σ ¸¹
1− ε 2
© 1− ε
Likelihood analysis
L ( β ,σ ,γ ,ε ) = −n log σ −
n
1 n
¦ ( yi − β T xi )2
2σ 2 i =1
+ ¦ log Φ (ui ) +
i =1
where
ui =
1
1− ε
2
γ 2 xi +
Profile likelihood of ε
N
¦ log Φ (−γ
T
xi ),
L* (ε ) = max
L *′ (0) = 0 ,
i = n +1
L *′′ (0) = 0
n
ε
1− ε
L ( β ,σ , γ , ε )
β ,σ ,γ |ε
2
yi − β T xi
σ
L *′′′ (0) = K1
¦(
i =1
n
L *′′′′ (0) = K 2
yi − βˆ T xi 3
)
σˆ
ª yi − βˆ T xi 4 º
) − 3»
σˆ
¼
¦ «¬(
s =1
5
Coventry work audit data
y = income,
Skin cancer data
x = ( 1, sex , age, age 2 )
case
control
N = 1435, n = 1323
nevi > 10
f 1 = 323
f 2 = 259
f 3 = 130
f 4 = 288
Table 2 : Melanoma Data
ε
Group comparison
Various pattern of bias
± bound
p2
p3
p4
.1
.2
.1
.2
0
0
.011
.1
.2
.2
.1
.391
1.719
1.580
.1 .15 .15 .1
.222
.840
.812
.1 .1
.2
.391
1.761
.002
.5 .533 .533 .5
.063
1.02
.698
.2
Y | Z ∼ f Y ( y ,θ z )
average
p1
( z = 1, , G )
random effect model
fY ( y,θ z ) = ³ fY |T ( y, t + θ z ) fT (t )dt
dependence model
gTZ = fT (t ) f Z ( z,ψ )eελ (t , z )−κ (ε )
Table 3 : Simulation results
Non-random allocation
g Y | z ( y | z ) = Ε Τ |Z f Y |T ( y , t + θ z )
≅ f Y ( y , θ z ){ 1 − ε λ* ( y , z )}
where λ ( y , z ) = Ε T +θ |Y λ (t , z )
*
Selection bias
N
~
β − βˆ ~ Iˆ −1 ¦
T
ψ zk = ψ z ( d z x k )
zk
ε k*γ zk )d z
k =1 z =1
{¦ p (1 − p ) } η
=
{2 ¦ p (ζ − ζ ) }
1
2
G
ε k*γ zk
z ′k
z ′ =1
z ′k
G
Y | z , x k ∼ f Y ( y , θ zk )
θ zk = β T d z + δ T x k
G
¦ (a
z ′ =1
Plot
z ′k
z ′k
{
1
2
I θ z k (ζ z − ζ zk )
~
S βˆ = β : (ζ 1 ,..., ζ G ) ∈ R G
}
6
Logistic model
logit { E (Y | X , Z )} = β 1 + d 2 z β 2 + d 3 z β 3 + X β 4
1
η = 0.1
η = 0.2
η = 0.5
0
95%C.I.
‐1
Z = 1 prison
Z = 2 community service
Z = 3 probation
Y = ratio of reconviction
Selectivity regions
Probation effect
Effect of sentence
‐1
0
1
Community service effect
two-group comparison
y = µ + sgn ( r ) δ + σ e1
r = e2
Likelihood
n
− N log σ − ¦
i =1
r>0
z =1
r≤0
z =1
y1 ,..., yn1
z=2
( r ≥ 0)
z = 2 ( r < 0)
yn1 +1 ,..., y N
Analysis
1
δˆ = ( y1 − y 2 )
2
Ε(δˆ ) = δ + ε σ
σ2
var (δˆ ) =
N
2
π
§ 2ε 2 ·
¨¨1 −
¸¸ + O ( N − 2 )
π
©
¹
−
( yi − µ − δ ) 2 n
yi − µ − δ
ε
+ ¦ log Φ (
)
2
2σ 2
σ
i =1
1+ ε
n
ε
yi − µ − δ
( yi − µ − δ ) 2
+ ¦ log Φ (−
)
2
2
σ
σ
2
i = n +1
i = n +1
1− ε
n
¦
2
δˆ (ε ) = δˆ −
π
ε
1- ε 2
σˆ + Ο ( ε 3 )
UK National Hearing Survey
The effect of occupational noise
Case (high level noise) n0 = 67
n1 = 144
Control Response Y is threshold of 3kHz sound
7
Conventional result
Case mean
Control mean Pooled s. d.
y0 = 3.893
y1 = 3.710
s = 0.351
(209 d. f.)
t = 3.52
t-statistic
Standard analysis supports high significance
Future problem
• Ignorability
• Selection bias
• Sensitivity analysis
⋅ Tubular neiborhood # of model M
⋅ Compare the inference
for the near model M ε of # with for M
⋅ Get interpretable bounds
Cornfeld,J.,Haenszel,W.,Hammond,E.C.,Lilien eld,A.M.,Shimkin,M.B.and Wynder,E.L.(1959) Smoking and lung cancer:recent evidence and a discussion of some
questions. J.Nat.Cancer Institute, 22, 173-203.
Davis,A.C.(1995) Hearing in Adults. London:Whurr.
Foster, J.J.and Smith,P.W.F.(1998) Model based inference for categorical survey data
subject to nonignorable nonresponse. J. Roy. Statist. Soc, B, 60, 57-70.
Heckman, J.J.(1976) The common structure of statistical models of truncation,sample
selection and limited dependent variables,and a simple estimator for such models. Ann.
Economic and Social Measurement, 5, 475-492.
Heckman, J.J. (1979) Sample selection bias as a specifcation error. Econometrica, 47,
153-161.
Kershaw, C. (1999) Reconvictions of offenders sentenced or discharged from prison in
1994, England and Wales. Home Office Statistical Bulletin, 5/99. London: HMSO.
Non-random allocation
~
t (δ ) = 3.52 − 5.39η + Ο ( ε 3 )
~
t (δ ) < z0.05 = 1.96 if η > 0.29
t − z0.05 = n1n2 N η0.05
η0.05 = 0.23 < 0.30
References
Arnold,B.C. and Strauss, D.J. (1991) Bivariate distributions with conditionals in
prescribed exponential families. J.Roy.Statist.Soc., B, 53, 365-376.
Begg,C.B., Satagopan, J.M. and Berwick, M.(1998) A new strategy for evaluating the
impact of epidemiologic risk factors for cancer with application to melanoma. J. Am.
Statist. Assoc., 93, 415-426.
Bowater, R.J.,Copas, J.B., Machado, O.A. and Davis, A.C. (1996) Hearing impairment
and the log-normal distribution. Applied Statistics, 45, 203-217.
Chambers, R.L.and Welsh, A.H. (1993) Log-linear models for survey data with nonignorable non-response. J.Roy.Statist.Soc., B, 55, 157-170.
Copas, J. B.and Li, H. G. (1997) Inference for non-random samples (with discussion).
J. Roy. Statist. Soc.,B, 59 ,55-95.
Copas, J.B. and Marshall, P. (1998) The offender group reconviction scale:a statistical
reconviction score for use by probation offers. Applie Statistics, 47, 159-171.
Lin, D.Y., Pasty, B.M.and Kronmal, R.A.(1998) Assessing the sensitivity of
regression results to unmeasured confounders in observational studies. Biometrics,
54 ,948-963.
Little, R. J. A. (1985) A note about models for selectivity bias. Econometrica, 53,
1469-1474.
Little,R.J.A. (1995) Modelling the dropout mechanism in repeated-measures studies J.
Am. Statist. Assoc., 90, 1112-1121.
Little,R.J.A. and Rubin, D.A.(1987) Statistical Analysis with Missing Data. New
York: Wiley.
McCullagh, P. and Nelder, J.A. (1989) Generalize Linear Models. 2nd ed. London:
Chapman and Hall.
Rosenbaum, P.R. (1987) Sensitivity analysis for certain permutation inferences in
matched observational studies. Biometrika, 74 ,13-26.
Rosenbaum, P.R. (1995) Observational Studies. New York: Springer
8
Rosenbaum, P.R. and Krieger,A.M.(1990) Sensitivity of two-sample permutation
inferences in observational studies.J.Am.Statist.Assoc., 85, 493-498.
Rosenbaum, P.R. and Rubin,D.B.(1983)Assessing sensitivity to an unobserved
binary covariate in an observational study with binary outcome. J. Roy. Statist.
Soc., B, 45, 212-218.
Scharfstein, D,O., Rotnitzy, A. and Robins, J. M. (1999) Adjusting for nonignorable drop-out using semiparametric nonresponse models (with discussion). J.
Amer. Statist.Assoc.,94, 1096-1146.
Schlesselman,J.J.(1978)Assessing effects of confounding variables. Am. J.
Epidemiology, 108, 3-8.
White,H.(1982)Maximum likelihood estimation of misspecified models.
Econometrica, 50, 1-26.
9