Bootstrapping Bayes Estimators in Complex Sample Designs Michael Elliott1,2 1 Survey 2 Methodology Program, Institute for Social Research Department of Biostatistics, University of Michigan 1 / 42 Outline Overview of weights in models. Difficulties with weights in Bayesian setting. Development of nonparametric synthetic population generation to generate pseudo simple random samples. Reweighting of simulated posterior draws to obtain “design-based” bootstrap estimators of posterior distributions. Simulation study. 2 / 42 Using sampling weights in models Why do we care about incorporating sampling weights in models? If the model is correct, weights are unnecessary. Model parameters will be consistently estimated, weights will only add variability and increase mean square error. 3 / 42 Using sampling weights in models Consider a simple no-intercept model: iid Yi = β Xi + εi , εi ∼ N(0, σ 2 ). Population MLE of β is B = ∑N i=1 Xi Yi 2 . ∑N i=1 Xi B model-unbiased for β if mean model correctly specified: 2 Xi Eζ (Yi ) ∑N ∑N i=1 Xi β Eζ (B) = i=1 N = =β 2 ∑i=1 Xi2 ∑N i=1 Xi 4 / 42 Using sampling weights in models: Unweighted estimator Unweighted estimator of β is ∑n xi yi ∑N i=1 Ii Xi Yi βˆ = i=1 = n 2 2 ∑i=1 xi ∑N i=1 Ii Xi for sampling indicator I. Taking expectation with respect to both model and design ! ! N I X i i Eζ p (βˆ ) = Eζ (Ep (βˆ )) = Eζ ∑ Ep Yi N 2 ∑ i=1 i=1 Ii Xi 5 / 42 Using sampling weights in models: Correct model specification, ignorable sampling If sampling is ignorable and model is correctly specified, taking expectation with respect to both model and design yields ! N Ii Xi ˆ Eζ p (β ) = ∑ Ep Eζ (Yi ) = 2 ∑N i=1 i=1 Ii Xi N ∑ Ep i=1 Ii Xi N ∑i=1 Ii Xi2 ! Xi β = β Ep 2 ∑N i=1 Ii Xi 2 ∑N i=1 Ii Xi ! =β 6 / 42 3 Using sampling weights in models: Correct model specification, ignorable sampling −1 0 Y 1 2 ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ●● ● ● ● ●● ● ● ●●●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ●●● ●●● ● ●● ●● ●● ● ● ● ● ●● ● ● ●● ● ●●●● ● ● ● ●●● ●●● ●● ● ●● ● ● ● ●● ● ● ●● ●● ● ●● ● ●● ● ● ●● ● ●●● ●●● ● ●● ●● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ●●● ●● ● ● ●●● ● ●● ●●●●● ● ●● ● ● ● ●●● ● ●●●● ●● ●●● ● ● ● ● ● ●● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●●●●● ● ● ●●●● ● ●● ● ●● ●● ●● ● ● ●● ● ●●●● ● ● ● ●● ●●●●● ●●●●● ● ●● ●● ● ●● ●● ●● ● ● ● ●● ● ●● ●● ●●●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ● ●●●● ●● ● ● ● ●●●● ●● ● ●●●● ●● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ●●●● ●● ●● ●●●● ●●●●●● ● ●● ●●● ● ● ● ●●● ● ●● ●●● ● ●● ● ● ● ● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●●● ●●● ● ●● ●● ●● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ●●● ● ●● ● ● ● ● ●●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ●● ● ●● ●●●● ● ● ●● ●● ● ● ●●●●●●●● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ●●●● ●●● ● ●●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ●●● ● ● ● ●●●● ● ● ● ● ● ●● ● ● ● ●● ● ●● ●● ● ● ●●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●● ●● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ● ●●● ● ● ● ● ●●● ●● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ●● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● 0.0 Population: Xi ∼ UNI(0, 1), Yi | Xi ∼ N(Xi , .25), i = 1, ..., 1000 Model y | x ∼ N(α + β x, σ 2 ) ● 0.2 0.4 0.6 0.8 1.0 X 7 / 42 3 Using sampling weights in models: Correct model specification, ignorable sampling −1 0 Y y 1 2 ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ●● ● ● ● ●● ● ● ●●●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ●●● ●●● ● ●● ●● ●● ● ● ● ● ●● ● ● ●● ● ●●●● ● ● ● ●●● ●●● ●● ● ●● ● ● ● ●● ● ● ●● ●● ● ●● ● ●● ● ● ●● ● ●●● ●●● ● ●● ●● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ●●● ●● ● ● ●●● ● ●● ●●●●● ● ●● ● ● ● ●●● ● ●●●● ●● ●●● ● ● ● ● ● ●● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●●●●● ● ● ●●●● ● ●● ● ●● ●● ●● ● ● ●● ● ●●●● ● ● ● ●● ●●●●● ●●●●● ● ●● ●● ● ●● ●● ●● ● ● ● ●● ● ●● ●● ●●●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ● ●●●● ●● ● ● ● ●●●● ●● ● ●●●● ●● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ●●●● ●● ●● ●●●● ●●●●●● ● ●● ●●● ● ● ● ●●● ● ●● ●●● ● ●● ● ● ● ● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●●● ●●● ● ●● ●● ●● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ●●● ● ●● ● ● ● ● ●●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ●● ● ●● ●●●● ● ● ●● ●● ● ● ●●●●●●●● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ●●●● ●●● ● ●●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ●●● ● ● ● ●●●● ● ● ● ● ● ●● ● ● ● ●● ● ●● ●● ● ● ●●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●● ●● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ● ●●● ● ● ● ● ●●● ●● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ●● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● 0.0 ● 0.2 0.4 0.6 0.8 Population: Xi ∼ UNI(0, 1), Yi | Xi ∼ N(Xi , .25), i = 1, ..., 1000 Model y | x ∼ N(α + β x, σ 2 ) P(Ii = 1 | Xi ) ∝ Xi for sampling indicator Ii ; sample WOR n = 50 (—) Unweighted ˆ + βˆxi α 1.0 X x 8 / 42 3 Using sampling weights in models: Correct model specification, ignorable sampling −1 0 Y y 1 2 ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ●● ● ● ● ●● ● ● ●●●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ●●● ●●● ● ●● ●● ●● ● ● ● ● ●● ● ● ●● ● ●●●● ● ● ● ●●● ●●● ●● ● ●● ● ● ● ●● ● ● ●● ●● ● ●● ● ●● ● ● ●● ● ●●● ●●● ● ●● ●● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ●●● ●● ● ● ●●● ● ●● ●●●●● ● ●● ● ● ● ●●● ● ●●●● ●● ●●● ● ● ● ● ● ●● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●●●●● ● ● ●●●● ● ●● ● ●● ●● ●● ● ● ●● ● ●●●● ● ● ● ●● ●●●●● ●●●●● ● ●● ●● ● ●● ●● ●● ● ● ● ●● ● ●● ●● ●●●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ● ●●●● ●● ● ● ● ●●●● ●● ● ●●●● ●● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ●●●● ●● ●● ●●●● ●●●●●● ● ●● ●●● ● ● ● ●●● ● ●● ●●● ● ●● ● ● ● ● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●●● ●●● ● ●● ●● ●● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ●●● ● ●● ● ● ● ● ●●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ●● ● ●● ●●●● ● ● ●● ●● ● ● ●●●●●●●● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ●●●● ●●● ● ●●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ●●● ● ● ● ●●●● ● ● ● ● ● ●● ● ● ● ●● ● ●● ●● ● ● ●●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●● ●● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ● ●●● ● ● ● ● ●●● ●● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ●● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● 0.0 ● 0.2 0.4 0.6 X x 0.8 1.0 Population: Xi ∼ UNI(0, 1), Yi | Xi ∼ N(Xi , .25), i = 1, ..., 1000 Model y | x ∼ N(α + β x, σ 2 ) P(Ii = 1 | Xi ) ∝ Xi for sampling indicator Ii ; sample WOR n = 50 (—) Unweighted ˆ + βˆxi α (—) Weighted ˆw + βˆw xi α 9 / 42 Using sampling weights in models If the model is misspecified, weighted estimator is consistent for the misspecified model fit to the population. 10 / 42 Using sampling weights in models: Misspecified model, ignorable sampling If mean model is misspecified, B provides closest approximation of β˜ relating Y to X through the intercept: e.g., if Eζ (Yi | Xi ) = β0 Xi + β1 Xi2 : Eζ (B) = 3 2 ∑N ∑N i=1 Xi i=1 Xi (β0 Xi + β1 Xi ) = β + β = β˜ 0 1 N 2 ∑N ∑i=1 Xi2 i=1 Xi 11 / 42 Using sampling weights in models: Misspecified model, ignorable sampling Unweighted estimator biased unless I ⊥ X as well as I ⊥ Y | X : ! ! N 3 N I X I X ∑ i i i i=1 2 i Eζ p (βˆ ) = ∑ Ep (β0 Xi +β1 Xi ) = β0 +β1 Ep ≈ 2 2 ∑N ∑N i=1 i=1 Ii Xi i=1 Ii Xi ! 3 ∑N i=1 πi Xi β0 + β1 , πi = P(Ii = 1) 2 ∑N i=1 πi Xi Weighted estimator βˆw = ∑ni=1 wi xi yi ∑ni=1 wi xi2 for wi = P(Ii = 1)−1 will be asymptotically unbiased for β˜ Eζ p (βˆw ) = N ∑ Ep i=1 β0 + β1 Ii Xi N ∑i=1 Ii Xi2 ! 3 ∑N i=1 πi wi Xi 2 ∑N i=1 πi wi Xi ! (β0 Xi + β1 Xi2 ) ≈ = β0 + β1 3 ∑N i=1 Xi 2 ∑N i=1 Xi ! = β˜ 12 / 42 Using sampling weights in models: Misspecified model, ignorable sampling ● −1 0 Y 1 2 3 ● ● ●● ● ●●● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●●● ● ●●●● ● ● ● ●● ● ● ●●●● ● ● ●● ● ● ● ● ●●●● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ● ●● ● ● ●●● ● ● ●●● ● ● ●● ● ●●● ●● ● ●● ● ●● ● ● ●● ●● ● ●●● ● ● ● ● ●● ●● ●●● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ●●●● ● ●● ● ● ● ●●● ● ●● ● ● ●●● ●●● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●● ●● ● ●● ● ● ●●● ●●●● ● ●●● ● ● ● ●●● ● ● ● ●● ●● ●● ● ●●● ● ●●● ● ●● ● ● ●●● ● ● ●● ●● ● ●●●● ● ●● ● ● ● ● ● ● ●●●●● ●● ●● ● ● ● ●●●● ● ●● ●● ● ●● ● ● ●● ● ● ●● ● ● ●●●● ●●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ●●● ●● ● ●●● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●●● ● ● ● ●● ●● ● ● ● ●●● ●●●● ● ●●● ● ● ●●● ● ●● ●●● ● ● ●●● ●●●●●● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ●●● ● ● ●● ● ●● ●●● ● ● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ●●●● ● ● ● ● ● ● ●●●●●● ● ●●● ● ● ●● ●● ● ●● ●●●● ● ● ● ● ●● ●● ● ●●●● ●●● ● ●● ● ●● ●● ●●● ● ●●●●● ● ● ●●●● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ● ● ● ●● ●●● ● ● ●● ● ● ●●● ●● ●●● ●● ● ●● ● ● ● ●● ● ● ● ●●● ●●●●● ●● ●●●● ● ●●●●●●● ●●● ●● ● ● ●● ●● ● ● ●● ● ● ●●● ● ● ● ●● ●●● ● ●● ● ●● ● ● ●●●●● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●●● ● ● ● ● ●● ● ●● ●● ●● ● ● ●● ● ● ● ●● ● ●●●● ●●● ●●● ● ● ● ● ●● ● ●● ● ● ● ●● ●● ●● ● ● ● ●● ●● ● ● ● ● ●●● ● ● ●●● ● ●● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ●●●● ● ● ● ● ●● ● ●● ●●●●●● ● ● ●● ● ● ● ●● ● ● ●● ●●●●●● ●● ●● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●●●● ● ● ●●● ● ● ● ● ●●●● ●●● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 Population: Xi ∼ UNI(0, 1), Yi | Xi ∼ N(Xi + 2Xi2 , .25), i = 1, ..., 1000 Model y | x ∼ N(α + β x, σ 2 ) 1.0 X 13 / 42 Using sampling weights in models: Misspecified model, ignorable sampling ● −1 0 Y y 1 2 3 ● ● ●● ● ●●● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●●● ● ●●●● ● ● ● ●● ● ● ●●●● ● ● ●● ● ● ● ● ●●●● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ● ●● ● ● ●●● ● ● ●●● ● ● ●● ● ●●● ●● ● ●● ● ●● ● ● ●● ●● ● ●●● ● ● ● ● ●● ●● ●●● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ●●●● ● ●● ● ● ● ●●● ● ●● ● ● ●●● ●●● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●● ●● ● ●● ● ● ●●● ●●●● ● ●●● ● ● ● ●●● ● ● ● ●● ●● ●● ● ●●● ● ●●● ● ●● ● ● ●●● ● ● ●● ●● ● ●●●● ● ●● ● ● ● ● ● ● ●●●●● ●● ●● ● ● ● ●●●● ● ●● ●● ● ●● ● ● ●● ● ● ●● ● ● ●●●● ●●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ●●● ●● ● ●●● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●●● ● ● ● ●● ●● ● ● ● ●●● ●●●● ● ●●● ● ● ●●● ● ●● ●●● ● ● ●●● ●●●●●● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ●●● ● ● ●● ● ●● ●●● ● ● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ●●●● ● ● ● ● ● ● ●●●●●● ● ●●● ● ● ●● ●● ● ●● ●●●● ● ● ● ● ●● ●● ● ●●●● ●●● ● ●● ● ●● ●● ●●● ● ●●●●● ● ● ●●●● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ● ● ● ●● ●●● ● ● ●● ● ● ●●● ●● ●●● ●● ● ●● ● ● ● ●● ● ● ● ●●● ●●●●● ●● ●●●● ● ●●●●●●● ●●● ●● ● ● ●● ●● ● ● ●● ● ● ●●● ● ● ● ●● ●●● ● ●● ● ●● ● ● ●●●●● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●●● ● ● ● ● ●● ● ●● ●● ●● ● ● ●● ● ● ● ●● ● ●●●● ●●● ●●● ● ● ● ● ●● ● ●● ● ● ● ●● ●● ●● ● ● ● ●● ●● ● ● ● ● ●●● ● ● ●●● ● ●● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ●●●● ● ● ● ● ●● ● ●● ●●●●●● ● ● ●● ● ● ● ●● ● ● ●● ●●●●●● ●● ●● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●●●● ● ● ●●● ● ● ● ● ●●●● ●●● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 Population: Xi ∼ UNI(0, 1), Yi | Xi ∼ N(Xi + 2Xi2 , .25), i = 1, ..., 1000 Model y | x ∼ N(α + β x, σ 2 ) P(Ii = 1 | Xi ) ∝ Xi for sampling indicator Ii ; sample WOR n = 50 (—) Unweighted ˆ + βˆxi α 1.0 X x 14 / 42 Using sampling weights in models: Misspecified model, ignorable sampling ● −1 0 Y y 1 2 3 ● ● ●● ● ●●● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●●● ● ●●●● ● ● ● ●● ● ● ●●●● ● ● ●● ● ● ● ● ●●●● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ● ●● ● ● ●●● ● ● ●●● ● ● ●● ● ●●● ●● ● ●● ● ●● ● ● ●● ●● ● ●●● ● ● ● ● ●● ●● ●●● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ●●●● ● ●● ● ● ● ●●● ● ●● ● ● ●●● ●●● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●● ●● ● ●● ● ● ●●● ●●●● ● ●●● ● ● ● ●●● ● ● ● ●● ●● ●● ● ●●● ● ●●● ● ●● ● ● ●●● ● ● ●● ●● ● ●●●● ● ●● ● ● ● ● ● ● ●●●●● ●● ●● ● ● ● ●●●● ● ●● ●● ● ●● ● ● ●● ● ● ●● ● ● ●●●● ●●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ●●● ●● ● ●●● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●●● ● ● ● ●● ●● ● ● ● ●●● ●●●● ● ●●● ● ● ●●● ● ●● ●●● ● ● ●●● ●●●●●● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ●●● ● ● ●● ● ●● ●●● ● ● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ●●●● ● ● ● ● ● ● ●●●●●● ● ●●● ● ● ●● ●● ● ●● ●●●● ● ● ● ● ●● ●● ● ●●●● ●●● ● ●● ● ●● ●● ●●● ● ●●●●● ● ● ●●●● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ● ● ● ●● ●●● ● ● ●● ● ● ●●● ●● ●●● ●● ● ●● ● ● ● ●● ● ● ● ●●● ●●●●● ●● ●●●● ● ●●●●●●● ●●● ●● ● ● ●● ●● ● ● ●● ● ● ●●● ● ● ● ●● ●●● ● ●● ● ●● ● ● ●●●●● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●●● ● ● ● ● ●● ● ●● ●● ●● ● ● ●● ● ● ● ●● ● ●●●● ●●● ●●● ● ● ● ● ●● ● ●● ● ● ● ●● ●● ●● ● ● ● ●● ●● ● ● ● ● ●●● ● ● ●●● ● ●● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ●●●● ● ● ● ● ●● ● ●● ●●●●●● ● ● ●● ● ● ● ●● ● ● ●● ●●●●●● ●● ●● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●●●● ● ● ●●● ● ● ● ● ●●●● ●●● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● 0.0 0.2 0.4 0.6 X x 0.8 1.0 Population: Xi ∼ UNI(0, 1), Yi | Xi ∼ N(Xi + 2Xi2 , .25), i = 1, ..., 1000 Model y | x ∼ N(α + β x, σ 2 ) P(Ii = 1 | Xi ) ∝ Xi for sampling indicator Ii ; sample WOR n = 50 (—) Unweighted ˆ + βˆxi α (—) Weighted ˆw + βˆw xi α 15 / 42 Using sampling weights in models If the sampling is nonignorable, only weighted estimator is consistent. 16 / 42 Using sampling weights in models: Misspecified model, nonignorable sampling In zero-intercept example, if P(Ii = 1) = Eζ p (βˆ ) ≈ β + nYi , ∑N i=1 Yi can show that σ 2 ∑N i=1 Xi . N β ∑i=1 Xi3 However, same results as before show weighted estimator asymptotically unbiased. 17 / 42 −1 0 Y 1 2 3 Using sampling weights in models: Correct model specification, nonignorable sampling ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ●● ●● ● ●●●●● ● ●● ● ● ● ● ●●●● ● ● ● ● ● ● ● ●●● ● ● ● ●●● ●●●● ●●● ● ● ●●● ● ●● ● ● ● ●●● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●●● ●● ● ●●● ● ● ● ● ●●● ●●●●● ● ● ●● ● ● ●●●● ●●● ● ●● ● ● ●●● ● ●● ●●●● ● ● ●●● ● ● ●● ● ●● ● ●● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●●●● ● ● ●●●● ● ●● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ● ● ● ● ●●●●● ● ●● ●● ● ●● ● ●●● ●●●● ● ●● ● ● ● ● ●●● ●●●● ● ● ● ●● ●●● ● ●●● ● ●●● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ● ●● ●● ●● ●● ● ●●● ●● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ●● ●●●●●● ●● ● ● ● ●●● ● ●● ●●● ● ● ● ● ●●●●●●● ● ●● ● ●● ●● ●●● ● ●● ●● ● ●● ● ● ● ●●●● ● ● ● ● ● ● ●● ●●● ● ● ●● ●● ●● ●●●●●● ●● ●●●● ●●●●●● ● ● ●●● ● ● ●● ● ●● ● ●● ● ● ●●● ● ●● ●● ●● ●● ●●● ● ●● ● ● ● ● ●●● ●● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ●●●● ● ● ● ● ●● ● ●●● ●● ● ● ●● ●● ●● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ●●● ●● ● ● ●● ● ●● ●●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ●●●● ●● ● ●● ● ● ● ●● ● ● ●●● ●● ● ●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ●● ● ●● ● ●● ● ●● ● ●● ● ● ●● ● ● ● ●● ● ●● ● ●● ●● ●● ●● ●● ●●● ● ●● ● ● ● ●●● ● ●● ● ●● ●●● ●● ● ●● ● ●● ●● ● ● ● ● ●● ●● ● ● ●● ● ● ● ●● ●● ●●● ● ●● ● ● ●● ● ●●●● ● ● ● ●● ●● ● ● ● ● ●●● ● ●● ● ● ● ●● ●● ● ●● ● ●●●● ● ● ● ● ●●●●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ●●● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ●● ●●● ● ● ●● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●●● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 Population: Xi ∼ UNI(0, 1), Yi | Xi ∼ N(Xi , .25), i = 1, ..., 1000 Model y | x ∼ N(α + β x, σ 2 ) 1.0 X 18 / 42 Using sampling weights in models: Correct model specification, nonignorable sampling −1 0 Y y 1 2 3 Population: Xi ∼ UNI(0, 1), Yi | Xi ∼ N(Xi , .25), i = 1, ..., 1000 ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ●● ●● ● ●●●●● ● ●● ● ● ● ● ●●●● ● ● ● ● ● ● ● ●●● ● ● ● ●●● ●●●● ●●● ● ● ●●● ● ●● ● ● ● ●●● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●●● ●● ● ●●● ● ● ● ● ●●● ●●●●● ● ● ●● ● ● ●●●● ●●● ● ●● ● ● ●●● ● ●● ●●●● ● ● ●●● ● ● ●● ● ●● ● ●● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●●●● ● ● ●●●● ● ●● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ● ● ● ● ●●●●● ● ●● ●● ● ●● ● ●●● ●●●● ● ●● ● ● ● ● ●●● ●●●● ● ● ● ●● ●●● ● ●●● ● ●●● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ● ●● ●● ●● ●● ● ●●● ●● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ●● ●●●●●● ●● ● ● ● ●●● ● ●● ●●● ● ● ● ● ●●●●●●● ● ●● ● ●● ●● ●●● ● ●● ●● ● ●● ● ● ● ●●●● ● ● ● ● ● ● ●● ●●● ● ● ●● ●● ●● ●●●●●● ●● ●●●● ●●●●●● ● ● ●●● ● ● ●● ● ●● ● ●● ● ● ●●● ● ●● ●● ●● ●● ●●● ● ●● ● ● ● ● ●●● ●● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ●●●● ● ● ● ● ●● ● ●●● ●● ● ● ●● ●● ●● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ●●● ●● ● ● ●● ● ●● ●●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ●●●● ●● ● ●● ● ● ● ●● ● ● ●●● ●● ● ●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ●● ● ●● ● ●● ● ●● ● ●● ● ● ●● ● ● ● ●● ● ●● ● ●● ●● ●● ●● ●● ●●● ● ●● ● ● ● ●●● ● ●● ● ●● ●●● ●● ● ●● ● ●● ●● ● ● ● ● ●● ●● ● ● ●● ● ● ● ●● ●● ●●● ● ●● ● ● ●● ● ●●●● ● ● ● ●● ●● ● ● ● ● ●●● ● ●● ● ● ● ●● ●● ● ●● ● ●●●● ● ● ● ● ●●●●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ●●● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ●● ●●● ● ● ●● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●●● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 Model y | x ∼ N(α + β x, σ 2 ) Zi ∼ N(Yi − Xi , .1) P(Ii = 1 | Xi ) ∝ (3 ∗ I(Zi > 0) + 1) ∗ Xi for sampling indicator Ii ; sample WOR n = 50 (—) Unweighted ˆ + βˆxi α X x 19 / 42 Using sampling weights in models: Correct model specification, nonignorable sampling 2 3 Population: Xi ∼ UNI(0, 1), Yi | Xi ∼ N(Xi , .25), i = 1, ..., 1000 ● ● ● ● ● ●● ● ● ●● ●● ● ●● ● ●● ● ●● ●●● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ●●● ● ● ● ●●● ●●●● ●●● ● ● ●●● ● ● ● ● ●●● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ●● ● ● ● ●● ● ●●● ● ● ●●● ● ● ● ● ●●● ●●●●● ● ● ●● ● ● ● ●●●● ●●● ● ●● ● ● ●●● ● ●● ●●●● ● ● ●●● ● ● ●● ● ●● ● ●● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ●● ●● ●● ● ●● ●●●●●●●●● ● ● ● ● ● ●● ● ● ● ● ●●● ●● ● ●●●● ● ● ● ● ● ●●● ●● ● ● ● ● ●● ●●● ●● ●● ● ●●● ● ● ● ●● ●● ● ● ● ●● ● ●● ● ● ● ● ●● ● ●● ●● ● ● ●● ●●● ● ● ● ●● ● ●● ● ● ● ● ●● ●●●●●● ●● ● ● ● ●●●● ● ●● ●● ● ● ●● ●● ● ● ●● ● ●● ●●●● ● ●●●● ● ●● ●● ● ●● ● ● ● ●●●● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ●● ●● ●● ● ●● ●●●●●● ● ●● ●●●● ●●●●●● ● ●●● ● ● ●● ● ● ●● ● ● ●●● ● ●● ●● ●● ●● ●●● ● ●● ● ● ● ● ●●● ●● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ●●● ●● ● ● ●●●● ●● ●● ●● ● ● ● ●● ●● ● ●● ●●● ● ● ● ● ●●● ● ●● ● ●● ●●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ●●●● ●● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ●● ● ●●● ● ●● ● ●● ● ●● ● ● ●● ● ●● ● ● ●●● ● ● ● ● ●● ● ●● ● ●● ●● ●● ●● ●● ●●● ● ●● ● ● ● ●●● ● ●● ● ●● ●●● ●● ● ●● ● ●● ●● ● ● ● ● ●● ●● ● ● ●● ● ● ● ●● ●● ●●● ● ●● ● ● ●● ● ●●●● ● ● ● ●● ●● ● ● ● ● ●●● ● ●● ● ● ● ●● ●● ● ●● ● ●●●● ● ● ● ● ●●●●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ●●● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ●● ●●● ● ● ●● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●●● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● −1 0 Y y 1 ● ● ● 0.0 0.2 ● ● ● 0.4 0.6 X x 0.8 1.0 Model y | x ∼ N(α + β x, σ 2 ) Zi ∼ N(Yi − Xi , .1) P(Ii = 1 | Xi ) ∝ (3 ∗ I(Zi > 0) + 1) ∗ Xi for sampling indicator Ii ; sample WOR n = 50 (—) Unweighted ˆ + βˆxi α (—) Weighted ˆw + βˆw xi α 20 / 42 Pseudo maximum likelihood estimator This idea of using design-adjusted estimators for models is encapsulated in the “pseudo-maxmimum likelihood estimator” (PMLE) (Binder 1983, Pfeffermann 1993). Replace unweighted score equation n U(θ ) = ∑ Ui (θ ) i =1 for Ui = ∂ ∂θ log L(θ ; yi ) with weighted score equation n Uw (θ ) = ∑ wi Ui (θ ) i =1 Uw (θ ) is a consistent estimator of the population score equation N UN (θ ) = ∑ Ui (θ ) i =1 θˆw s.t. Uw (θˆw ) = 0 consistently estimates θˆN s.t. U(θˆN ) = 0. Variance estimators of θˆw can be obtained by linearizing the PMLE, or by jackknife or bootstrap. 21 / 42 Pseudo maximum likelihood estimator and Bayesian methods However, note that the weighted likelihood lw (θ ; y ) = ∑ni=1 wi log Li (θ ; yi ), cannot serve in place of a true likelihood for Bayesian estimation: pw (θ | y ) ∝ lw (θ ; y )p(θ )??? Adding together sampled unit log likelihoods assumes independence; even if stratification/clustering not present and we ignore joint selection probabilities, we do not have wi observations. 22 / 42 Pseudo maximum likelihood estimator and Bayesian methods However, note that the weighted likelihood lw (θ ; y ) = ∑ni=1 wi log Li (θ ; yi ), cannot serve in place of a true likelihood for Bayesian estimation: pw (θ | y ) ∝ lw (θ ; y )p(θ )??? Adding together sampled unit log likelihoods assumes independence; even if stratification/clustering not present and we ignore joint selection probabilities, we do not have wi observations. Options: Assume model is correct and sampling non-informative, and ignore weights. 23 / 42 Pseudo maximum likelihood estimator and Bayesian methods However, note that the weighted likelihood lw (θ ; y ) = ∑ni=1 wi log Li (θ ; yi ), cannot serve in place of a true likelihood for Bayesian estimation: pw (θ | y ) ∝ lw (θ ; y )p(θ )??? Adding together sampled unit log likelihoods assumes independence; even if stratification/clustering not present and we ignore joint selection probabilities, we do not have wi observations. Options: Assume model is correct and sampling non-informative, and ignore weights. Construct a model that allows for misspecification that is sensitive to the unequal probability of selection design: Allow for interactions between model parameters of interest and probability of selection. 24 / 42 Bayesian finite population inference Inference about population quantity Q(Y ) based on posterior predictive distribution of p(Ynob | Yobs , I), where Ynob consists of the elements of Yi for which sampling indicator Ii equals 0: p(Ynob | Yobs , I) = RR RRR p(Ynob | Yobs , θ , φ )p(I | Y , θ , φ )p(Yobs | θ )p(θ , φ )dθ dφ p(Ynob | Yobs , θ , φ )p(I | Y , θ , φ )p(Yobs | θ )p(θ , φ )dθ dφ Ynobs where φ models the inclusion indicator. 25 / 42 Bayesian finite population inference φ and θ a priori independent and I is independent of Y → sampling design “unconfounded” or “noninformative”; distribution of I depends only on Yobs → sampling design “ignorable” (Rubin 1987). Under ignorable sampling designs p(θ , φ ) = p(θ )p(φ ) and p(I | Y , θ , φ ) = p(I | Yobs , φ ): R p(Ynob | Yobs , I) = R R p(Ynob | Yobs , θ )p(Yobs | θ )p(θ )dθ p(Ynob | Yobs , θ )p(Yobs | θ )p(θ )dθ dYnobs = p(Ynob | Yobs ), Allows inference about Q(Y ) to be made without explicitly modeling I (Ericson 1969, Holt and Smith 1979, Little 1993, Rubin 1987, Skinner et al. 1989). But ignorability requires data model that is attentive to design features and robust enough to sufficiently capture all relevant aspects of the distribution of Y of interest. 26 / 42 Linear regression example Denote strata for disproportionally stratified or poststratifed design by h = 1, ..., H (group by [weighted] percentiles if continuous). Interact with regression slope by allowing separate regression estimators for each weight stratum: yhi | β h , σ 2 ∼ N(x Thi β h , σ 2 ) (β T1 , . . . , β TH )T | β ∗ , G ∼ NHp (β ∗ , G) p(φ , β ∗ , G) ∝ p(ζ ) The posterior mean of the population slope −1 Nh T) h B = ∑h ∑N (x x x y ∑ ∑ h i=1 hi hi is given by i=1 hi hi " E(B | y ) = nh ∑ Wh ∑ x hi x Thi h i=1 #−1 " nh ∑ Wh ∑ x hi x Thi h ! # βˆ h i=1 where βˆ h = E(βˆ h | y ). 27 / 42 Linear regression example Assuming a degenerate prior for β h (G ≡ 0) implies β h = β for all h and thus " #−1 " !# nh nh T T βˆ = E(B | y ) = Wh x hi x Wh x hi x ∑ ∑ h " βˆ = ∑ hi i=1 nh ∑∑ ∑ h #−1 " x hi x Thi nh ∑ ∑ h i=1 h hi i=1 ! x hi x Thi # yhi i=1 This is the standard unweighted regression slope estimator. i i−1 h h nh nh x hi x Thi yhi x hi x Thi Assuming a flat prior for β h (G ≡ ∞) implies βˆ h = ∑i=1 ∑i=1 and thus !−1 " # ! #−1 " nh nh nh nh T T T T E(B | y ) = ∑ Wh ∑ x hi x hi ∑ x hi x hi yhi = ∑ Wh ∑ x hi x hi ∑ x hi x hi h h i=1 " nh ∑ Wh ∑ x hi x Thi h i=1 i=1 #−1 " i=1 i=1 nh # ∑ Wh ∑ x hi x Thi yhi h i=1 This is the standard weighted regression slope estimator. Using a proper prior for β h (0 < G < ∞) allows for an intermediate estimator between unweighted and fully-weighted regression estimator – “weight smoothing,” a form of data-driven weight trimming (Elliott 2007). 28 / 42 Bayesian finite population inference with weights This general approach has a few drawbacks: Quickly get a large number of parameters in even moderately complex models Use of priors to smooth can help stabilize, but what if want to keep fully-weighted approach? Somewhat unclear how to handle hierarchical models (“cross-class” interaction between weighting effects and model random effects). Continuous weights (non-response, PPS) require “ad-hocery” of pre-pooling 29 / 42 Bayesian finite population inference with weights What if we could “covert” our complex design sample to a simple random sample? Use finite population Bayesian bootstrap to obtain draw of P syn ∼ p(Ynob | y ) that can be regarded as a simple random sample from the underlying superpopulation. 30 / 42 Non-parametric Synthetic Populations Account for stratification and clustering by drawing L Bayesian bootstraps sample of the clusters within each stratum. For stratum h with Ch clusters, draw Ch − 1 random variables from U(0, 1) and order a1 , .., aCh −1 ; sample Ch clusters with replacement with probability ac − ac−1 , where a0 = 0 and aCh = 1. Generate unobserved elements of the population within each cluster c in stratum h accounting for selection weights (Cohen 1997): Draw a sample of size Nch − nch by drawing (yk , xk ) from the ith unit among the nch sampled elements with wi −1+li,k −1 ∗(Nch −nch )/nch probability N −n where li,k −1 is the ch ch +(k −1)∗(Nch −nch )/nch number of bootstrap draws of the ith unit among the previous k − 1 bootstrap selections. Repeat S times for each lth boostrapped cluster. Can show (Dong et al. 2012) that this is equivalent to a Bayesian bootstrap Polya urn scheme (Lo 1988) 31 / 42 Bayesian Finite Population Inference with Nonparametric Synthetic Populations Large sample approximations (Raghunathan et al. 2003, Reiter et al. 2003) yield ˆ = E(Q | y ) = E(E(Q | P syn )|y ) ≈ Q L L−1 ∑ S −1 l =1 S L ∑ Qˆ ls = L−1 ∑ Qˆ l· s=1 l =1 ˆ = V (Q | y ) = E(V (Q | P ˆ (Q) V obs syn )|y ) + V (E(Q | P syn )|y ) ≈ L L l =1 l =1 ˆ l· − L−1 Q (1 + L−1 )(L − 1)−1 ∑ (Q ∑ ˆ l·· )2 ˆ ls is the estimator of Q derived from the sth finite where Q population Bayesian bootstrap of the lth Bayesian bootstrap. ˆ is approximately normally distributed; otherwise For large L, Q approximate with t distribution with (L − 1) degrees of freedom. 32 / 42 “Design Based” Bayesian Inference Use draws of nonparametric synthetic populations to feed into Bayesian estimation procedure to obtain p(θ | P syn ) ˆ ls . and Q Large sample approximations allow for “design based” Bayesian inference (Little 2011) If we have t = 1, ..., T draws from p(θ | y ) obtained by (l,s) treating y as an SRS, we can obtain p(θ | P syn ) by weighting the tth draw from p(θ | y ) by " qtls = n (l,s) ri #−1 ∏ f (yi | θ (t) ) i=1 (l,s) where ri is the number of times that the ith observation (l,s) was resampled in synthetic populatoin P syn generated by the lth and sth finite population Bayesian bootstrap. 33 / 42 Simulation Study Population: Measure of Size Mi ∼ U(101, 4100), i = 1, ..., N = 4000 Yi | Mi ∼ N((Mi − 2100.5)/2000, 10) Two sample designs: simple random sample and proportional to size (both without replacement, size n=100): 1. P(Ii = 1) = n/N for all i. 2. P(Ii = 1) = nMi / ∑i Mi Unweighted mean n−1 ∑i Ii Yi : 1. (superpopulation) bias=0, weight CV=0. 2. (superpopulation) bias=.32, weight CV=1.35. 34 / 42 Simulation Study Data model: Yi | µ, σ 2 ∼ N(µ, σ 2 ) p(µ, σ 2 ) ∝ σ −1 Misspecified model, but sufficient under SRS. Fit using Gibbs sample, 1,000 draws after 100 burn-in. Compute posterior mean and 95% credible interval for E(Y | y ) for 200 simulations. Using standard approach Using weighted finite population Bayesian bootstrap (L = 20, S = 5). 35 / 42 Simulation Study Mi ∼ U(101, 4100) Yi | Mi ∼ N((Mi − 2100.5)/2000, 10) Bias Emp. SD Mean Post. SD. 95% Coverage SRS Weighted Standard FPBB .080 -.075 .306 .306 .320 .323 98 97 PPS Weighted Standard FPBB .286 .038 .319 .380 .326 .353 86 92 Both approach perform well under SRS. Standard weighted appoached biased under PPS; Bayesian FPBB removes bias and increases variance to approximate correct coverage under PPS. 36 / 42 “Design Based” Bayesian Inference: Problems Can anyone see what the problem with the proposed reweighting “shortcut” is? 37 / 42 “Design Based” Bayesian Inference: Problems Can anyone see what the problem with the proposed reweighting “shortcut” is? Reweighting is similar to importance sampling: requires that the posterior generating the original draws from the model that ignores the sampling design. If σ 2 = 1, variability in p(µ | y ) will be insufficient to reweight to remove bias. Can diagnose through distribution of qls 0 400 800 Importance weights with variance=1 Frequency 800 400 0 Frequency Importance weights with variance=10 0.00 0.05 0.10 0.15 0.0 0.4 0.8 38 / 42 Simulation Study Mi ∼ U(101, 4100) Yi | Mi ∼ N((Mi − 2100.5)/2000, 1) Bias Emp. SD Mean Post. SD. 95% Coverage SRS Weighted Standard FPBB .016 .016 .115 .116 .103 .107 98 98 PPS Weighted Standard FPBB .334 .099 .115 .133 .115 .097 14 76 FPBB unable to fully correct for bias because unweighted posterior does not include FPBB posterior. 39 / 42 Simulation Study Mi ∼ U(101, 4100) Yi | Mi ∼ N((Mi − 2100.5)/2000, 1) Bias Emp. SD Mean Post. SD. 95% Coverage SRS Weighted Standard FPBB .016 .012 .115 .136 .103 .107 98 94 PPS Weighted Standard FPBB .334 .053 .115 .113 .326 .123 14 94 Simple solution: increase number of draws (T = 50, 000) so that tail of unweighted posterior includes FPBB posterior. 40 / 42 Simulation Study Mi ∼ U(101, 4100) Yi | Mi ∼ N((Mi − 2100.5)/2000, 1) Bias Emp. SD Mean Post. SD. 95% Coverage SRS Weighted Standard FPBB .016 .007 .115 .123 .103 .127 98 96 PPS Weighted Standard FPBB .334 .039 .115 .155 .326 .143 14 91 Simple solution #2: Draw ε (t) from N(0, 3Σ), where Σ = var (θ | y ) and compute importance weights by i h (l,s) −1 qtls = ∏ni=1 f (yi | θ (t) + ε (t) )ri . 41 / 42 Discussion Just starting to think through idea. Need more realistic simulation studies (strata, clusters; hierarchical models, latent variables). Try in real world applications to see if variability in importance weights is reasonable. Better ways to generate draws for importance weights (probably OK to generate ε from normal if posterior is approximately normal; otherwise may need better approximation to true posterior). 42 / 42
© Copyright 2024