Information geometry of Statistical inference with selective sample S. Eguchi, ISM & GUAS Local Sensitivity Approximation for Selectivity Bias. J. Copas and S. Eguchi J. Royal Statist. Soc. B, 63 (2001), 871-895. (http://www.ism.ac.jp/~eguchi/recent_preprint.html) This talk is a part of co-work with J. Copas, University of Warwick Summary • Ignorability • Selection bias • Sensitivity analysis ⋅ Tubular neiborhood # of model M ⋅ Compare the inference for the near model M ε of # with for M ⋅ Get interpretable bounds Near parametric ( X 1 , ...., X n ) ~ g ( x1 ,..., xn ) min KL( g , f ( ⋅ ,θ )) ≤ O ( n − β ) θ ∈Θ β = 0 ⇐ non-parametric near-parametric β = ∞ ⇐ exact parametric Statistical model Probability model P ( X ∈ B ) =³ p( x ) µ (dx ) B Statistical model iid ( X 1 , ...., X n ) ~ p( x,θ ) x ∈ X ⊂ R p, θ ∈ Θ ⊂ Rd Ignorability Y : response variable X : covariate variable Z : observational status Def. (X, Y, Z) is ignorable ⇔ Y Z X 1 Observational status Binary response with missing Exm.1. Z is missing(censering) indicator. observed if Z = 1 Y is ® ¯missing (censoring) if Z = 0 Exm 2. Z is group label for allocation process. Y | Z ∼ f ( y,θ z ) for z = 1, , G Non-ignorable cases pr(Y = 1 | Z = 1) = θ p θ p + (1 − θ ) p 1 1 model y =1 y=0 1 1 0 θ 1 −θ where Exm ' 2 Y § pr (Z = 1 | y ) · η 2 = Var ¨¨ log ¸ (Z = 0 | y ) ¸¹ pr © 0 ~ θ = y=0 f ? n− f ? n N ? ? N Distributional expression Def 2 f θˆ(1 − θˆ) MLE θˆ = , variance s 2 = n n Small degree of non-ignorability data 0 N = sample size , n : nb of observed f = frequency of Y = 1 ( py = P(Z = 1| y) ) y =1 (1−θ ) p θp z = 0 θ (1− p ) (1−θ )(1− p ) z =1 Y be a binary response with missing index Z θ = pr(Y = 1) Z | X ⇔ fYZ ( y , z | x ) = fY ( y ,θ ) f Z ( z ,ψ ) p = dim θ , q = dim ψ f YZ = θ y (1 − θ )1− yψ z (1 − ψ )1− z fp0 N −n ~ , θη = θˆ ± n s η + O (η 2 ) fp0 + (n − f ) p1 N ~ Compare θˆ with θη ! How small is " small η " ? Tangent space and neighborhood Def. 3 # = ® gYZ : min KL( gYZ , f YZ ) < ε ½¾ θ ,ψ ¯ ¿ Ε f Y {λ (Y , z )} = Ε f Z {λ ( y , Z )} = 0 ° Λ = ® λ ( y, z ) : Var f YZ {λ ( y , z )} = 1 °¯ ½° ¾ °¿ 2 Exponential map Tubular Neighborhood g YZ ( y , z ) := f YZ ( y,z )exp {ελ ( y , z ) − κ (ε )} (∀ε << ε 0 ) λ ∈ Λ g YZ ∈ # KL ( g YZ , f YZ ) = #ε ε + O (ε 3 ) 2 2 M Mε Decomposition tangent and normal λ ( y, z ) = ¦ λ ij ui ( y,θ ) v j ( z,ψ ) = λ ⋅⋅ u⋅ v⋅ i, j where − 1 2 i =1 (u1, , u p ) = Iθ SY ( y,θ ) T − T p λ ( y , z ) = ¦ λi . ui v. + q (v1 , , vq ) = Iψ S Z ( z , x) ( ¦λ 2 ij j =1 i ¦ λ. j ∞ u. v j j = q +1 Ε f z (v j vk ) = δ jk =1 ) Conditional Distribution − 1 gY |Z ( y | z ) = f Y ( y ,θ + ε Iθ 2ω z )(1 + ε 1 − 2 ψ g Z |Y ( z | y ) = f Z ( z , ψ + ε I ω )(1 + ε when i i = p +1 = ¦ λ . j u. v j + 1 2 Ε fY (ui u j ) = δ ij ∞ ¦ λ . u v. * y ∞ ¦λ i .ui v.) i = p +1 ∞ ¦λ j .u.v j ) j = q +1 ω z = (λ 1.v.,, λ p.v.)T ω = (λ .1 u.,, λ .q u.) * y 1 η z2 = Var f (log g Z |Y ( z | y )) Def. 4. Υ • 1 2 η ≅ KL( fΥ , gΥ |Z ) 2 z • Ε f Z (η z2 ) ≅ ε 2 T { − ~ • θ = θ + ε Iθ 2ω z = argsolve E gY |Z S (y, θ θ* Calibration * )= 0 } • ε 2ω zT ω z ≤ η z2 § · ¨¨ " =" ⇔ λi = 0 ¸¸ v = p + 1 ,.. i © ¹ 3 Rosenbaum’s log odd ratio Ρ( r > 0 | y ) Ρ(r ≤ 0) ½ η = log ® ¾ ¯ Ρ (r > 0) Ρ( r ≤ 0 | y ) ¿ Counterfactual Testing a hypothesis : H 0 : ε = 0 Score test ≅ const ε T =N − 1 N 2 ¦ λ ( y k , z k ) ∼ N ( 0 , 1) k =1 Reject H 0 if | T |> 2 . Guide line Non-ignorable missing Testing a hypothesis : H 0 : ε = 0 1 1 § · § · Φ ¨¨ − 2 + N 2 ε ¸¸ + Φ ¨¨ − 2 − N 2 ε ¸¸ © ¹ © ¹ local power ε= ½ N ⇔ η = 2® ¾ n N − n ( ) ¯ ¿ 2 N 1 2 log - likelihood L(θ , ε ) = 1 2 ~ MLE θ ε = argmax L(θ , ε ) θ |ε • • n ¦ log f ( y ,θ ) n k − 1 k =1 ∞ n ¦ λ u ( y ,θ ) v (1,ψ ) i1 i k 1 k =1 i = p +1 + log ψ + (N − n ) log (1 −ψ ) + O(ε 2 ) Can we estimateε ? max L(θ , ε ) = L(θˆ,0) θ |ε 2 ~ θ ε = θˆ − ε Ιθˆ ω1 + O(ε 2 ) 1 − 2 · § ~ ~ (θ ε − θˆ) Τ Ιθˆ (θ ε − θˆ) ≅ ε 2ω1Τω1 ≤ η12 ¨¨ " =" ⇔ λi = 0 ¸¸ (i = p −1,.. ) ¹ © 1 N − n − 12 ~ If p = 1 , θ ε = θˆ ± n 2 Ι ˆη N θ Υ k =1 + ε ¦ ω1Τ Iθ 2 S ( y K ,θ ) + ε ¦ Selectivity region • z −ψ ε2 , η2 = ψ (1 −ψ ) ψ (1 −ψ ) v1 (z ,ψ ) = − nσ *2 § u* · nu*2 ¨¨ ε − *2 ¸¸ + *2 + O(ε 3 ) 2 © σ ¹ 2σ where u* = 1 ∞ ¦ λi1ui ( yk ,θˆ) v1 (1,ψˆ ), σ *2 = VarfY (u* ) n i= p+1 4 Unstable or Misspecifying εˆ = u* σˆ *2 ∞ ¦λ is unstable if 2 i1 Regression formulation { xk ; k = 1, , N } fully observed =0 θ k = θ ( β T xk ), ψ k = ψ (α T x k ), p k = f z (1, ψ k ) i =1 n L (α , β ) = ¦ log f Y ( y k , θ ( β T x k )) Otherwise k =1 n 2 Heckman model R>0 R≤0 R > 0 ⇔ Z =1 R≤0 ⇔ Z =0 n n k =1 k =1 + ¦ log ψ (α T x k ) + ¦ log (1 − ψ (α T x k )) + O (ε 2 ) Likelihood §§ 0· § 1 ε ·· § e1 · ¨¨ ¸¸ ∼ N ¨¨ ¨¨ ¸¸ ,¨¨ ¸¸ ¸¸ e © 2¹ ©© 0¹ ©ε 1 ¹¹ R = γ T X + e2 − k =1 λ31 ≠ 0 u * includes skewness statistics Y = β T X + σ e1 1 + ¦ ε k ω1k I 2 (θ k ) S ( y k , θ ( β T xk )) f ( y ,θ ) ∼ N( µ , σ ), θ = ( µ , σ ) u3 ( y ) ← Hermitte poly 2 Y is observed Y is missing 1 y−βTx φ( ) σ σ T γ x ε y − βTx ×Φ ( + )/Φ (γ T x) σ 1− ε 2 1− ε 2 f ( y | x, r > 0) = E( y | x, r > 0) = β T x + σε λ (γ T x) § γTx ε y− βTx· ¸ P( r > 0 | x, y ) = Φ ¨¨ + 2 σ ¸¹ 1− ε 2 © 1− ε Likelihood analysis L ( β ,σ ,γ ,ε ) = −n log σ − n 1 n ¦ ( yi − β T xi )2 2σ 2 i =1 + ¦ log Φ (ui ) + i =1 where ui = 1 1− ε 2 γ 2 xi + Profile likelihood of ε N ¦ log Φ (−γ T xi ), L* (ε ) = max L *′ (0) = 0 , i = n +1 L *′′ (0) = 0 n ε 1− ε L ( β ,σ , γ , ε ) β ,σ ,γ |ε 2 yi − β T xi σ L *′′′ (0) = K1 ¦( i =1 n L *′′′′ (0) = K 2 yi − βˆ T xi 3 ) σˆ ª yi − βˆ T xi 4 º ) − 3» σˆ ¼ ¦ «¬( s =1 5 Coventry work audit data y = income, Skin cancer data x = ( 1, sex , age, age 2 ) case control N = 1435, n = 1323 nevi > 10 f 1 = 323 f 2 = 259 f 3 = 130 f 4 = 288 Table 2 : Melanoma Data ε Group comparison Various pattern of bias ± bound p2 p3 p4 .1 .2 .1 .2 0 0 .011 .1 .2 .2 .1 .391 1.719 1.580 .1 .15 .15 .1 .222 .840 .812 .1 .1 .2 .391 1.761 .002 .5 .533 .533 .5 .063 1.02 .698 .2 Y | Z ∼ f Y ( y ,θ z ) average p1 ( z = 1, , G ) random effect model fY ( y,θ z ) = ³ fY |T ( y, t + θ z ) fT (t )dt dependence model gTZ = fT (t ) f Z ( z,ψ )eελ (t , z )−κ (ε ) Table 3 : Simulation results Non-random allocation g Y | z ( y | z ) = Ε Τ |Z f Y |T ( y , t + θ z ) ≅ f Y ( y , θ z ){ 1 − ε λ* ( y , z )} where λ ( y , z ) = Ε T +θ |Y λ (t , z ) * Selection bias N ~ β − βˆ ~ Iˆ −1 ¦ T ψ zk = ψ z ( d z x k ) zk ε k*γ zk )d z k =1 z =1 {¦ p (1 − p ) } η = {2 ¦ p (ζ − ζ ) } 1 2 G ε k*γ zk z ′k z ′ =1 z ′k G Y | z , x k ∼ f Y ( y , θ zk ) θ zk = β T d z + δ T x k G ¦ (a z ′ =1 Plot z ′k z ′k { 1 2 I θ z k (ζ z − ζ zk ) ~ S βˆ = β : (ζ 1 ,..., ζ G ) ∈ R G } 6 Logistic model logit { E (Y | X , Z )} = β 1 + d 2 z β 2 + d 3 z β 3 + X β 4 1 η = 0.1 η = 0.2 η = 0.5 0 95%C.I. ‐1 Z = 1 prison Z = 2 community service Z = 3 probation Y = ratio of reconviction Selectivity regions Probation effect Effect of sentence ‐1 0 1 Community service effect two-group comparison y = µ + sgn ( r ) δ + σ e1 r = e2 Likelihood n − N log σ − ¦ i =1 r>0 z =1 r≤0 z =1 y1 ,..., yn1 z=2 ( r ≥ 0) z = 2 ( r < 0) yn1 +1 ,..., y N Analysis 1 δˆ = ( y1 − y 2 ) 2 Ε(δˆ ) = δ + ε σ σ2 var (δˆ ) = N 2 π § 2ε 2 · ¨¨1 − ¸¸ + O ( N − 2 ) π © ¹ − ( yi − µ − δ ) 2 n yi − µ − δ ε + ¦ log Φ ( ) 2 2σ 2 σ i =1 1+ ε n ε yi − µ − δ ( yi − µ − δ ) 2 + ¦ log Φ (− ) 2 2 σ σ 2 i = n +1 i = n +1 1− ε n ¦ 2 δˆ (ε ) = δˆ − π ε 1- ε 2 σˆ + Ο ( ε 3 ) UK National Hearing Survey The effect of occupational noise Case (high level noise) n0 = 67 n1 = 144 Control Response Y is threshold of 3kHz sound 7 Conventional result Case mean Control mean Pooled s. d. y0 = 3.893 y1 = 3.710 s = 0.351 (209 d. f.) t = 3.52 t-statistic Standard analysis supports high significance Future problem • Ignorability • Selection bias • Sensitivity analysis ⋅ Tubular neiborhood # of model M ⋅ Compare the inference for the near model M ε of # with for M ⋅ Get interpretable bounds Cornfeld,J.,Haenszel,W.,Hammond,E.C.,Lilien eld,A.M.,Shimkin,M.B.and Wynder,E.L.(1959) Smoking and lung cancer:recent evidence and a discussion of some questions. J.Nat.Cancer Institute, 22, 173-203. Davis,A.C.(1995) Hearing in Adults. London:Whurr. Foster, J.J.and Smith,P.W.F.(1998) Model based inference for categorical survey data subject to nonignorable nonresponse. J. Roy. Statist. Soc, B, 60, 57-70. Heckman, J.J.(1976) The common structure of statistical models of truncation,sample selection and limited dependent variables,and a simple estimator for such models. Ann. Economic and Social Measurement, 5, 475-492. Heckman, J.J. (1979) Sample selection bias as a specifcation error. Econometrica, 47, 153-161. Kershaw, C. (1999) Reconvictions of offenders sentenced or discharged from prison in 1994, England and Wales. Home Office Statistical Bulletin, 5/99. London: HMSO. Non-random allocation ~ t (δ ) = 3.52 − 5.39η + Ο ( ε 3 ) ~ t (δ ) < z0.05 = 1.96 if η > 0.29 t − z0.05 = n1n2 N η0.05 η0.05 = 0.23 < 0.30 References Arnold,B.C. and Strauss, D.J. (1991) Bivariate distributions with conditionals in prescribed exponential families. J.Roy.Statist.Soc., B, 53, 365-376. Begg,C.B., Satagopan, J.M. and Berwick, M.(1998) A new strategy for evaluating the impact of epidemiologic risk factors for cancer with application to melanoma. J. Am. Statist. Assoc., 93, 415-426. Bowater, R.J.,Copas, J.B., Machado, O.A. and Davis, A.C. (1996) Hearing impairment and the log-normal distribution. Applied Statistics, 45, 203-217. Chambers, R.L.and Welsh, A.H. (1993) Log-linear models for survey data with nonignorable non-response. J.Roy.Statist.Soc., B, 55, 157-170. Copas, J. B.and Li, H. G. (1997) Inference for non-random samples (with discussion). J. Roy. Statist. Soc.,B, 59 ,55-95. Copas, J.B. and Marshall, P. (1998) The offender group reconviction scale:a statistical reconviction score for use by probation offers. Applie Statistics, 47, 159-171. Lin, D.Y., Pasty, B.M.and Kronmal, R.A.(1998) Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics, 54 ,948-963. Little, R. J. A. (1985) A note about models for selectivity bias. Econometrica, 53, 1469-1474. Little,R.J.A. (1995) Modelling the dropout mechanism in repeated-measures studies J. Am. Statist. Assoc., 90, 1112-1121. Little,R.J.A. and Rubin, D.A.(1987) Statistical Analysis with Missing Data. New York: Wiley. McCullagh, P. and Nelder, J.A. (1989) Generalize Linear Models. 2nd ed. London: Chapman and Hall. Rosenbaum, P.R. (1987) Sensitivity analysis for certain permutation inferences in matched observational studies. Biometrika, 74 ,13-26. Rosenbaum, P.R. (1995) Observational Studies. New York: Springer 8 Rosenbaum, P.R. and Krieger,A.M.(1990) Sensitivity of two-sample permutation inferences in observational studies.J.Am.Statist.Assoc., 85, 493-498. Rosenbaum, P.R. and Rubin,D.B.(1983)Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J. Roy. Statist. Soc., B, 45, 212-218. Scharfstein, D,O., Rotnitzy, A. and Robins, J. M. (1999) Adjusting for nonignorable drop-out using semiparametric nonresponse models (with discussion). J. Amer. Statist.Assoc.,94, 1096-1146. Schlesselman,J.J.(1978)Assessing effects of confounding variables. Am. J. Epidemiology, 108, 3-8. White,H.(1982)Maximum likelihood estimation of misspecified models. Econometrica, 50, 1-26. 9
© Copyright 2024