Bootstrapping Bayes Estimators in Complex Sample Designs Michael Elliott

Bootstrapping Bayes Estimators in Complex
Sample Designs
Michael Elliott1,2
1 Survey
2
Methodology Program, Institute for Social Research
Department of Biostatistics, University of Michigan
1 / 42
Outline
Overview of weights in models.
Difficulties with weights in Bayesian setting.
Development of nonparametric synthetic population
generation to generate pseudo simple random samples.
Reweighting of simulated posterior draws to obtain
“design-based” bootstrap estimators of posterior
distributions.
Simulation study.
2 / 42
Using sampling weights in models
Why do we care about incorporating sampling weights in
models?
If the model is correct, weights are unnecessary.
Model parameters will be consistently estimated, weights
will only add variability and increase mean square error.
3 / 42
Using sampling weights in models
Consider a simple no-intercept model:
iid
Yi = β Xi + εi , εi ∼ N(0, σ 2 ).
Population MLE of β is B =
∑N
i=1 Xi Yi
2 .
∑N
i=1 Xi
B model-unbiased for β if mean model correctly specified:
2
Xi Eζ (Yi ) ∑N
∑N
i=1 Xi β
Eζ (B) = i=1 N
=
=β
2
∑i=1 Xi2
∑N
i=1 Xi
4 / 42
Using sampling weights in models: Unweighted
estimator
Unweighted estimator of β is
∑n xi yi
∑N
i=1 Ii Xi Yi
βˆ = i=1
=
n
2
2
∑i=1 xi
∑N
i=1 Ii Xi
for sampling indicator I.
Taking expectation with respect to both model and design
! !
N
I
X
i i
Eζ p (βˆ ) = Eζ (Ep (βˆ )) = Eζ ∑ Ep
Yi
N
2
∑
i=1
i=1 Ii Xi
5 / 42
Using sampling weights in models: Correct model
specification, ignorable sampling
If sampling is ignorable and model is correctly specified, taking
expectation with respect to both model and design yields
!
N
Ii Xi
ˆ
Eζ p (β ) = ∑ Ep
Eζ (Yi ) =
2
∑N
i=1
i=1 Ii Xi
N
∑ Ep
i=1
Ii Xi
N
∑i=1 Ii Xi2
!
Xi β = β Ep
2
∑N
i=1 Ii Xi
2
∑N
i=1 Ii Xi
!
=β
6 / 42
3
Using sampling weights in models: Correct model
specification, ignorable sampling
−1
0
Y
1
2
●
●
●
● ● ●
●
●●●
● ●
●
●
●
● ●
●
● ●●
●●
● ● ●
● ●
●●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●● ●
●
●
● ●
● ●●● ●●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●●
● ●
●
●
●
●
●
● ●
● ●●●
● ●●●
●● ●
● ● ●● ● ●
●●●● ● ●
●
●
●●
●●
● ●
●
● ●
●
●
●
●●
●● ●●
●
●
●● ● ● ● ●● ●●
●●
● ● ● ● ●●●
●●●
●
●● ●●
●●
●
● ●
● ●● ●
●
●●
● ●●●● ●
●
●
●●● ●●●
●●
●
●●
●
●
●
●●
●
● ●● ●●
●
●●
● ●● ● ● ●● ●
●●● ●●●
● ●●
●● ●●●
●
●● ●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ● ●●
●
●
●
● ●●● ●
●
●
● ●●●
●●
●
● ●●●
●
●●
●●●●●
●
●●
●
●
● ●●●
●
●●●● ●●
●●● ●
●
●
●
● ●●
●
●●
●●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ●
● ●
● ●
● ●●●●●●
● ● ●●●● ●
●● ●
●● ●●
●● ●
●
●●
● ●●●●
●
●
● ●● ●●●●●
●●●●● ●
●● ●●
●
●● ●●
●●
● ● ● ●●
●
●●
●● ●●●●
●
● ●
●
●● ●
●●
●● ● ●
●
● ● ● ●●
●
● ●●●●
●●
● ●
● ●●●●
●● ●
●●●●
●● ●
● ●
● ●●● ● ●
●● ●
●
●●
●
●●●●
●●
●●
●●●●
●●●●●●
●
●●
●●●
●
● ●
●●●
● ●● ●●●
●
●●
● ●
●
●
●●● ● ●●● ●
● ●
●
●
● ●
●
●
● ●●●● ●●●
● ●●
●● ●●
●
● ●
●● ● ●● ●
● ●
● ●●
●
● ●●●
● ●
● ●
● ●●
●●
●●
● ●
●
● ●
●●●
● ●●
●
● ●
● ●●● ●
●●
●
● ●●
●
●
● ●● ● ●
●
●● ● ●●
●●●●
● ●
●●
●●
● ●
●●●●●●●● ●
●
●
●
●
●
●●
●●
●
●
●
●
●●
● ●●
●●●●
●●● ●
●●●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●● ●
●● ● ●●● ● ● ● ●●●●
●
● ● ●
●
●●
●
● ● ●● ●
●● ●●
● ●
●●●●● ● ●
●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●● ●●
●●
●●
● ●
●●
●
●●
●
●● ● ●●
● ●
●
●
●●●
●
●
●
●
●●● ●●
●
●
●●● ●
●●
● ●
●●
●
● ●
●
●●
● ● ●●
●
●
● ●
● ●●
● ●● ●● ●
●
●
● ●●●
●
●●
●
● ●
●
●
●
●
●●
●●●
● ●
●●
●
● ● ●
● ●●
●
●
●
●● ●
●●
●
●
●
●
●
●
0.0
Population:
Xi ∼ UNI(0, 1),
Yi | Xi ∼ N(Xi , .25),
i = 1, ..., 1000
Model
y | x ∼ N(α + β x, σ 2 )
●
0.2
0.4
0.6
0.8
1.0
X
7 / 42
3
Using sampling weights in models: Correct model
specification, ignorable sampling
−1
0
Y
y
1
2
●
●
●
● ● ●
●
●●●
● ●
●
●
●
● ●
●
● ●●
●●
● ● ●
● ●
●●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●● ●
●
●
● ●
● ●●● ●●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●●
● ●
●
●
●
●
●
● ●
● ●●●
● ●●●
●● ●
● ● ●● ● ●
●●●● ● ●
●
●
●●
●●
● ●
●
● ●
●
●
●
●●
●● ●●
●
●
●● ● ● ● ●● ●●
●●
● ● ● ● ●●●
●●●
●
●● ●●
●●
●
● ●
● ●● ●
●
●●
● ●●●● ●
●
●
●●● ●●●
●●
●
●●
●
●
●
●●
●
● ●● ●●
●
●●
● ●● ● ● ●● ●
●●● ●●●
● ●●
●● ●●●
●
●● ●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ● ●●
●
●
●
● ●●● ●
●
●
● ●●●
●●
●
● ●●●
●
●●
●●●●●
●
●●
●
●
● ●●●
●
●●●● ●●
●●● ●
●
●
●
● ●●
●
●●
●●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ●
● ●
● ●
● ●●●●●●
● ● ●●●● ●
●● ●
●● ●●
●● ●
●
●●
● ●●●●
●
●
● ●● ●●●●●
●●●●● ●
●● ●●
●
●● ●●
●●
● ● ● ●●
●
●●
●● ●●●●
●
● ●
●
●● ●
●●
●● ● ●
●
● ● ● ●●
●
● ●●●●
●●
● ●
● ●●●●
●● ●
●●●●
●● ●
● ●
● ●●● ● ●
●● ●
●
●●
●
●●●●
●●
●●
●●●●
●●●●●●
●
●●
●●●
●
● ●
●●●
● ●● ●●●
●
●●
● ●
●
●
●●● ● ●●● ●
● ●
●
●
● ●
●
●
● ●●●● ●●●
● ●●
●● ●●
●
● ●
●● ● ●● ●
● ●
● ●●
●
● ●●●
● ●
● ●
● ●●
●●
●●
● ●
●
● ●
●●●
● ●●
●
● ●
● ●●● ●
●●
●
● ●●
●
●
● ●● ● ●
●
●● ● ●●
●●●●
● ●
●●
●●
● ●
●●●●●●●● ●
●
●
●
●
●
●●
●●
●
●
●
●
●●
● ●●
●●●●
●●● ●
●●●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●● ●
●● ● ●●● ● ● ● ●●●●
●
● ● ●
●
●●
●
● ● ●● ●
●● ●●
● ●
●●●●● ● ●
●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●● ●●
●●
●●
● ●
●●
●
●●
●
●● ● ●●
● ●
●
●
●●●
●
●
●
●
●●● ●●
●
●
●●● ●
●●
● ●
●●
●
● ●
●
●●
● ● ●●
●
●
● ●
● ●●
● ●● ●● ●
●
●
● ●●●
●
●●
●
● ●
●
●
●
●
●●
●●●
● ●
●●
●
● ● ●
● ●●
●
●
●
●● ●
●●
●
●
●
●
●
●
0.0
●
0.2
0.4
0.6
0.8
Population:
Xi ∼ UNI(0, 1),
Yi | Xi ∼ N(Xi , .25),
i = 1, ..., 1000
Model
y | x ∼ N(α + β x, σ 2 )
P(Ii = 1 | Xi ) ∝ Xi for
sampling indicator Ii ;
sample WOR n = 50
(—) Unweighted
ˆ + βˆxi
α
1.0
X
x
8 / 42
3
Using sampling weights in models: Correct model
specification, ignorable sampling
−1
0
Y
y
1
2
●
●
●
● ● ●
●
●●●
● ●
●
●
●
● ●
●
● ●●
●●
● ● ●
● ●
●●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●● ●
●
●
● ●
● ●●● ●●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●●
● ●
●
●
●
●
●
● ●
● ●●●
● ●●●
●● ●
● ● ●● ● ●
●●●● ● ●
●
●
●●
●●
● ●
●
● ●
●
●
●
●●
●● ●●
●
●
●● ● ● ● ●● ●●
●●
● ● ● ● ●●●
●●●
●
●● ●●
●●
●
● ●
● ●● ●
●
●●
● ●●●● ●
●
●
●●● ●●●
●●
●
●●
●
●
●
●●
●
● ●● ●●
●
●●
● ●● ● ● ●● ●
●●● ●●●
● ●●
●● ●●●
●
●● ●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ● ●●
●
●
●
● ●●● ●
●
●
● ●●●
●●
●
● ●●●
●
●●
●●●●●
●
●●
●
●
● ●●●
●
●●●● ●●
●●● ●
●
●
●
● ●●
●
●●
●●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ●
● ●
● ●
● ●●●●●●
● ● ●●●● ●
●● ●
●● ●●
●● ●
●
●●
● ●●●●
●
●
● ●● ●●●●●
●●●●● ●
●● ●●
●
●● ●●
●●
● ● ● ●●
●
●●
●● ●●●●
●
● ●
●
●● ●
●●
●● ● ●
●
● ● ● ●●
●
● ●●●●
●●
● ●
● ●●●●
●● ●
●●●●
●● ●
● ●
● ●●● ● ●
●● ●
●
●●
●
●●●●
●●
●●
●●●●
●●●●●●
●
●●
●●●
●
● ●
●●●
● ●● ●●●
●
●●
● ●
●
●
●●● ● ●●● ●
● ●
●
●
● ●
●
●
● ●●●● ●●●
● ●●
●● ●●
●
● ●
●● ● ●● ●
● ●
● ●●
●
● ●●●
● ●
● ●
● ●●
●●
●●
● ●
●
● ●
●●●
● ●●
●
● ●
● ●●● ●
●●
●
● ●●
●
●
● ●● ● ●
●
●● ● ●●
●●●●
● ●
●●
●●
● ●
●●●●●●●● ●
●
●
●
●
●
●●
●●
●
●
●
●
●●
● ●●
●●●●
●●● ●
●●●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●● ●
●● ● ●●● ● ● ● ●●●●
●
● ● ●
●
●●
●
● ● ●● ●
●● ●●
● ●
●●●●● ● ●
●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●● ●●
●●
●●
● ●
●●
●
●●
●
●● ● ●●
● ●
●
●
●●●
●
●
●
●
●●● ●●
●
●
●●● ●
●●
● ●
●●
●
● ●
●
●●
● ● ●●
●
●
● ●
● ●●
● ●● ●● ●
●
●
● ●●●
●
●●
●
● ●
●
●
●
●
●●
●●●
● ●
●●
●
● ● ●
● ●●
●
●
●
●● ●
●●
●
●
●
●
●
●
0.0
●
0.2
0.4
0.6
X
x
0.8
1.0
Population:
Xi ∼ UNI(0, 1),
Yi | Xi ∼ N(Xi , .25),
i = 1, ..., 1000
Model
y | x ∼ N(α + β x, σ 2 )
P(Ii = 1 | Xi ) ∝ Xi for
sampling indicator Ii ;
sample WOR n = 50
(—) Unweighted
ˆ + βˆxi
α
(—) Weighted
ˆw + βˆw xi
α
9 / 42
Using sampling weights in models
If the model is misspecified, weighted estimator is
consistent for the misspecified model fit to the population.
10 / 42
Using sampling weights in models: Misspecified
model, ignorable sampling
If mean model is misspecified, B provides closest
approximation of β˜ relating Y to X through the intercept: e.g., if
Eζ (Yi | Xi ) = β0 Xi + β1 Xi2 :
Eζ (B) =
3
2
∑N
∑N
i=1 Xi
i=1 Xi (β0 Xi + β1 Xi )
=
β
+
β
= β˜
0
1 N
2
∑N
∑i=1 Xi2
i=1 Xi
11 / 42
Using sampling weights in models: Misspecified
model, ignorable sampling
Unweighted estimator biased unless I ⊥ X as well as I ⊥ Y | X :
!
!
N
3
N
I
X
I
X
∑
i
i
i
i=1
2
i
Eζ p (βˆ ) = ∑ Ep
(β0 Xi +β1 Xi ) = β0 +β1 Ep
≈
2
2
∑N
∑N
i=1
i=1 Ii Xi
i=1 Ii Xi
!
3
∑N
i=1 πi Xi
β0 + β1
, πi = P(Ii = 1)
2
∑N
i=1 πi Xi
Weighted estimator βˆw =
∑ni=1 wi xi yi
∑ni=1 wi xi2
for wi = P(Ii = 1)−1 will be
asymptotically unbiased for β˜
Eζ p (βˆw ) =
N
∑ Ep
i=1
β0 + β1
Ii Xi
N
∑i=1 Ii Xi2
!
3
∑N
i=1 πi wi Xi
2
∑N
i=1 πi wi Xi
!
(β0 Xi + β1 Xi2 ) ≈
= β0 + β1
3
∑N
i=1 Xi
2
∑N
i=1 Xi
!
= β˜
12 / 42
Using sampling weights in models: Misspecified
model, ignorable sampling
●
−1
0
Y
1
2
3
●
●
●● ●
●●● ●
●
●
●
●● ●
● ●●
●
●
● ●●
● ●●● ● ●●●● ●
● ● ●●
●
● ●●●●
● ● ●● ●
●
●
●
●●●●
●
●
●
● ●●
●
● ●●●
● ●●
●
● ●
●●
●
●
●●●
●
●
●●●
●
●
●● ● ●●● ●●
● ●●
●
●●
●
● ●●
●● ● ●●●
● ●
● ● ●●
●●
●●● ● ● ●
●
●
●
●● ●
●●●
● ●●
●
● ●●●●
● ●●
●
●
● ●●●
●
●●
●
●
●●● ●●●
● ●
● ●●●
● ● ● ● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●●●●
●●
●
●●
●
●
●●●
●●●● ● ●●●
● ●
●
●●●
●
●
● ●●
●●
●● ●
●●● ●
●●●
●
●●
●
●
●●● ● ● ●●
●●
●
●●●●
●
●●
● ●
●
●
●
●
●●●●●
●●
●●
●
●
●
●●●●
●
●●
●●
●
●●
●
●
●●
● ● ●●
●
● ●●●●
●●●●
●●
●
● ●
●
●
●
●
●●
●
●
●
●
●● ●● ●
●●●
●● ●
●●● ●
●●
●●
● ●
●● ● ● ● ●
● ●
● ● ●●
●
● ● ●●
●
●
● ●
●●●
●
● ● ●● ●● ● ● ●
●●●
●●●●
●
●●●
● ●
●●● ● ●●
●●●
● ●
●●●
●●●●●● ● ●
● ●
●
●
● ● ●
●● ●●●
●●
● ●
●●●
●
●
●●
●
●● ●●●
●
●
● ●
●
● ●●
● ●
●
●●
●●
●● ●
●●●● ● ●
● ●
● ● ●●●●●●
●
●●● ●
●
●●
●● ●
●●
●●●●
● ●
● ● ●●
●●
●
●●●●
●●●
● ●●
● ●●
●● ●●●
●
●●●●● ●
●
●●●● ●● ●
●
●
●
● ● ●● ●
●
● ● ●● ●
●
●
●
●
●
●
●
●
●
●●●
●
●●● ● ●
● ● ●● ●●●
● ● ●●
● ●
●●●
●● ●●●
●● ● ●● ● ●
●
●● ● ● ●
●●●
●●●●● ●●
●●●● ● ●●●●●●●
●●● ●●
●
●
●● ●●
●
● ●●
● ● ●●●
●
● ● ●●
●●●
● ●●
●
●●
●
●
●●●●●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ●
●● ●
● ●●●
●
● ●
● ●● ●
●●
●●
●●
●
●
●●
●
●
●
●● ●
●●●●
●●●
●●●
●
●
● ●
●●
●
●●
●
●
●
●●
●●
●●
●
● ●
●●
●●
●
●
●
●
●●● ●
●
●●●
● ●● ● ●
●
●
●● ●
●
●
●●●●
●
●
●
●
●●●● ●
● ● ● ●● ●
●●
●●●●●● ● ● ●● ●
●
●
●●
● ●
●●
●●●●●● ●● ●● ●
● ●
● ●● ●●
● ●●
●●
● ● ● ● ●●● ●
●
●
●
● ●
●●●● ●
●
●●● ● ●
●
●
●●●● ●●●
●
●
●
●
●
●●
●
● ●● ●●
●
●
● ●●
●
● ●
●
●●
●
● ●●
●
●
●
●
●
0.0
0.2
0.4
0.6
0.8
Population:
Xi ∼ UNI(0, 1), Yi |
Xi ∼ N(Xi + 2Xi2 , .25),
i = 1, ..., 1000
Model
y | x ∼ N(α + β x, σ 2 )
1.0
X
13 / 42
Using sampling weights in models: Misspecified
model, ignorable sampling
●
−1
0
Y
y
1
2
3
●
●
●● ●
●●● ●
●
●
●
●● ●
● ●●
●
●
● ●●
● ●●● ● ●●●● ●
● ● ●●
●
● ●●●●
● ● ●● ●
●
●
●
●●●●
●
●
●
● ●●
●
● ●●●
● ●●
●
● ●
●●
●
●
●●●
●
●
●●●
●
●
●● ● ●●● ●●
● ●●
●
●●
●
● ●●
●● ● ●●●
● ●
● ● ●●
●●
●●● ● ● ●
●
●
●
●● ●
●●●
● ●●
●
● ●●●●
● ●●
●
●
● ●●●
●
●●
●
●
●●● ●●●
● ●
● ●●●
● ● ● ● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●●●●
●●
●
●●
●
●
●●●
●●●● ● ●●●
● ●
●
●●●
●
●
● ●●
●●
●● ●
●●● ●
●●●
●
●●
●
●
●●● ● ● ●●
●●
●
●●●●
●
●●
● ●
●
●
●
●
●●●●●
●●
●●
●
●
●
●●●●
●
●●
●●
●
●●
●
●
●●
● ● ●●
●
● ●●●●
●●●●
●●
●
● ●
●
●
●
●
●●
●
●
●
●
●● ●● ●
●●●
●● ●
●●● ●
●●
●●
● ●
●● ● ● ● ●
● ●
● ● ●●
●
● ● ●●
●
●
● ●
●●●
●
● ● ●● ●● ● ● ●
●●●
●●●●
●
●●●
● ●
●●● ● ●●
●●●
● ●
●●●
●●●●●● ● ●
● ●
●
●
● ● ●
●● ●●●
●●
● ●
●●●
●
●
●●
●
●● ●●●
●
●
● ●
●
● ●●
● ●
●
●●
●●
●● ●
●●●● ● ●
● ●
● ● ●●●●●●
●
●●● ●
●
●●
●● ●
●●
●●●●
● ●
● ● ●●
●●
●
●●●●
●●●
● ●●
● ●●
●● ●●●
●
●●●●● ●
●
●●●● ●● ●
●
●
●
● ● ●● ●
●
● ● ●● ●
●
●
●
●
●
●
●
●
●
●●●
●
●●● ● ●
● ● ●● ●●●
● ● ●●
● ●
●●●
●● ●●●
●● ● ●● ● ●
●
●● ● ● ●
●●●
●●●●● ●●
●●●● ● ●●●●●●●
●●● ●●
●
●
●● ●●
●
● ●●
● ● ●●●
●
● ● ●●
●●●
● ●●
●
●●
●
●
●●●●●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ●
●● ●
● ●●●
●
● ●
● ●● ●
●●
●●
●●
●
●
●●
●
●
●
●● ●
●●●●
●●●
●●●
●
●
● ●
●●
●
●●
●
●
●
●●
●●
●●
●
● ●
●●
●●
●
●
●
●
●●● ●
●
●●●
● ●● ● ●
●
●
●● ●
●
●
●●●●
●
●
●
●
●●●● ●
● ● ● ●● ●
●●
●●●●●● ● ● ●● ●
●
●
●●
● ●
●●
●●●●●● ●● ●● ●
● ●
● ●● ●●
● ●●
●●
● ● ● ● ●●● ●
●
●
●
● ●
●●●● ●
●
●●● ● ●
●
●
●●●● ●●●
●
●
●
●
●
●●
●
● ●● ●●
●
●
● ●●
●
● ●
●
●●
●
● ●●
●
●
●
●
●
0.0
0.2
0.4
0.6
0.8
Population:
Xi ∼ UNI(0, 1), Yi |
Xi ∼ N(Xi + 2Xi2 , .25),
i = 1, ..., 1000
Model
y | x ∼ N(α + β x, σ 2 )
P(Ii = 1 | Xi ) ∝ Xi for
sampling indicator Ii ;
sample WOR n = 50
(—) Unweighted
ˆ + βˆxi
α
1.0
X
x
14 / 42
Using sampling weights in models: Misspecified
model, ignorable sampling
●
−1
0
Y
y
1
2
3
●
●
●● ●
●●● ●
●
●
●
●● ●
● ●●
●
●
● ●●
● ●●● ● ●●●● ●
● ● ●●
●
● ●●●●
● ● ●● ●
●
●
●
●●●●
●
●
●
● ●●
●
● ●●●
● ●●
●
● ●
●●
●
●
●●●
●
●
●●●
●
●
●● ● ●●● ●●
● ●●
●
●●
●
● ●●
●● ● ●●●
● ●
● ● ●●
●●
●●● ● ● ●
●
●
●
●● ●
●●●
● ●●
●
● ●●●●
● ●●
●
●
● ●●●
●
●●
●
●
●●● ●●●
● ●
● ●●●
● ● ● ● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●●●●
●●
●
●●
●
●
●●●
●●●● ● ●●●
● ●
●
●●●
●
●
● ●●
●●
●● ●
●●● ●
●●●
●
●●
●
●
●●● ● ● ●●
●●
●
●●●●
●
●●
● ●
●
●
●
●
●●●●●
●●
●●
●
●
●
●●●●
●
●●
●●
●
●●
●
●
●●
● ● ●●
●
● ●●●●
●●●●
●●
●
● ●
●
●
●
●
●●
●
●
●
●
●● ●● ●
●●●
●● ●
●●● ●
●●
●●
● ●
●● ● ● ● ●
● ●
● ● ●●
●
● ● ●●
●
●
● ●
●●●
●
● ● ●● ●● ● ● ●
●●●
●●●●
●
●●●
● ●
●●● ● ●●
●●●
● ●
●●●
●●●●●● ● ●
● ●
●
●
● ● ●
●● ●●●
●●
● ●
●●●
●
●
●●
●
●● ●●●
●
●
● ●
●
● ●●
● ●
●
●●
●●
●● ●
●●●● ● ●
● ●
● ● ●●●●●●
●
●●● ●
●
●●
●● ●
●●
●●●●
● ●
● ● ●●
●●
●
●●●●
●●●
● ●●
● ●●
●● ●●●
●
●●●●● ●
●
●●●● ●● ●
●
●
●
● ● ●● ●
●
● ● ●● ●
●
●
●
●
●
●
●
●
●
●●●
●
●●● ● ●
● ● ●● ●●●
● ● ●●
● ●
●●●
●● ●●●
●● ● ●● ● ●
●
●● ● ● ●
●●●
●●●●● ●●
●●●● ● ●●●●●●●
●●● ●●
●
●
●● ●●
●
● ●●
● ● ●●●
●
● ● ●●
●●●
● ●●
●
●●
●
●
●●●●●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ●
●● ●
● ●●●
●
● ●
● ●● ●
●●
●●
●●
●
●
●●
●
●
●
●● ●
●●●●
●●●
●●●
●
●
● ●
●●
●
●●
●
●
●
●●
●●
●●
●
● ●
●●
●●
●
●
●
●
●●● ●
●
●●●
● ●● ● ●
●
●
●● ●
●
●
●●●●
●
●
●
●
●●●● ●
● ● ● ●● ●
●●
●●●●●● ● ● ●● ●
●
●
●●
● ●
●●
●●●●●● ●● ●● ●
● ●
● ●● ●●
● ●●
●●
● ● ● ● ●●● ●
●
●
●
● ●
●●●● ●
●
●●● ● ●
●
●
●●●● ●●●
●
●
●
●
●
●●
●
● ●● ●●
●
●
● ●●
●
● ●
●
●●
●
● ●●
●
●
●
●
●
0.0
0.2
0.4
0.6
X
x
0.8
1.0
Population:
Xi ∼ UNI(0, 1), Yi |
Xi ∼ N(Xi + 2Xi2 , .25),
i = 1, ..., 1000
Model
y | x ∼ N(α + β x, σ 2 )
P(Ii = 1 | Xi ) ∝ Xi for
sampling indicator Ii ;
sample WOR n = 50
(—) Unweighted
ˆ + βˆxi
α
(—) Weighted
ˆw + βˆw xi
α
15 / 42
Using sampling weights in models
If the sampling is nonignorable, only weighted estimator is
consistent.
16 / 42
Using sampling weights in models: Misspecified
model, nonignorable sampling
In zero-intercept example, if P(Ii = 1) =
Eζ p (βˆ ) ≈ β +
nYi
,
∑N
i=1 Yi
can show that
σ 2 ∑N
i=1 Xi
.
N
β ∑i=1 Xi3
However, same results as before show weighted estimator
asymptotically unbiased.
17 / 42
−1
0
Y
1
2
3
Using sampling weights in models: Correct model
specification, nonignorable sampling
●
●
●
●
●
●
●
●
●●
●
●
●
●
●● ●
●● ●
● ● ●●
●● ● ●●●●●
● ●● ●
●
● ● ●●●●
●
● ●
●
●
●
●
●●●
● ●
●
●●● ●●●● ●●●
● ●
●●●
●
●●
●
●
●
●●● ●
● ●
●
●●
●●
●●
●
●
●● ●
● ●● ● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
● ●
● ●
●
● ●● ●
●●● ●●
●
●●●
●
●
●
●
●●● ●●●●●
●
● ●● ●
●
●●●●
●●●
● ●●
●
● ●●● ●
●● ●●●● ●
● ●●●
●
●
●●
●
●●
●
●●
● ● ●● ●●● ● ● ● ●
●
●
●
●
●
●
●●●
●
●
●● ●
●
●
●
● ●
●
●
●● ●
●
●
●
●●
●●●●
●
●
●●●● ●
●●
●●●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●● ●
●
●
● ●
●●●●●
● ●●
●●
● ●● ● ●●●
●●●●
●
●●
●
● ● ●
●●●
●●●● ● ●
●
●●
●●●
● ●●●
● ●●● ● ● ●
●●
●●
●
● ●●
●
●●
●
● ●
●●
●●
●● ●●
●
●●● ●● ●
●
●●● ●
● ● ●
● ●
●
●
●●
● ●●
●●●●●●
●● ●
●
●
●●●
●
●●
●●●
● ●
● ●
●●●●●●●
● ●●
●
●●
●● ●●●
●
●●
●●
● ●●
●
●
● ●●●● ● ●
● ●
● ● ●● ●●●
●
●
●● ●● ●●
●●●●●●
●●
●●●● ●●●●●●
● ●
●●● ● ●
●● ●
●●
●
●●
● ● ●●● ●
●● ●●
●●
●●
●●●
● ●● ●
●
●
●
●●● ●●
●
●
● ●
●●● ●
●●
●● ●
● ●
●
●
●●●●
●
●
● ●
●●
● ●●●
●●
● ●
●● ●●
●●
●
● ●
●●
●●
●
●
●
●●
●
●
● ● ●●●
●● ● ●
●●
●
●●
●●● ●
●
●● ●
● ●
●●
●
●
●
●
● ● ●● ●
●●●●
●● ●
●●
●
● ● ●●
●
●
●●●
●● ●
●●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●● ●
●●●
●●
●
●●
● ●●
● ●●
● ●● ●
● ●● ●
●
●
●● ●
●● ●
●●
●●
●●
●● ●● ●●●
●
●●
● ●
● ●●●
●
●●
●
●●
●●●
●●
●
●●
● ●●
●●
●
●
● ● ●● ●● ●
●
●●
●
●
●
●●
●● ●●●
● ●● ● ● ●●
● ●●●●
●
●
● ●●
●●
●
●
●
●
●●●
● ●●
● ● ● ●● ●● ● ●● ● ●●●● ● ●
●
●
●●●●● ●
●●
●
●
●
●
●●
●
●
● ●
●
●
●● ● ●
●●●
●●●
●●
●
●●
● ●
●●
● ● ●●
●
●
● ●● ●
● ●●
●●● ●
● ●● ● ●● ●
● ●● ● ●
● ●● ●● ● ● ●
●
●
●
●●
● ●
●
● ● ●
●
●●●●
●
● ●●
● ●●
●●
● ●
●
●
●
●
●
● ●
●
0.0
0.2
0.4
0.6
0.8
Population:
Xi ∼ UNI(0, 1),
Yi | Xi ∼ N(Xi , .25),
i = 1, ..., 1000
Model
y | x ∼ N(α + β x, σ 2 )
1.0
X
18 / 42
Using sampling weights in models: Correct model
specification, nonignorable sampling
−1
0
Y
y
1
2
3
Population:
Xi ∼ UNI(0, 1),
Yi | Xi ∼ N(Xi , .25),
i = 1, ..., 1000
●
●
●
●
●
●
●
●
●●
●
●
●
●
●● ●
●● ●
● ● ●●
●● ● ●●●●●
● ●● ●
●
● ● ●●●●
●
● ●
●
●
●
●
●●●
● ●
●
●●● ●●●● ●●●
● ●
●●●
●
●●
●
●
●
●●● ●
● ●
●
●●
●●
●●
●
●
●● ●
● ●● ● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
● ●
● ●
●
● ●● ●
●●● ●●
●
●●●
●
●
●
●
●●● ●●●●●
●
● ●● ●
●
●●●●
●●●
● ●●
●
● ●●● ●
●● ●●●● ●
● ●●●
●
●
●●
●
●●
●
●●
● ● ●● ●●● ● ● ● ●
●
●
●
●
●
●
●●●
●
●
●● ●
●
●
●
● ●
●
●
●● ●
●
●
●
●●
●●●●
●
●
●●●● ●
●●
●●●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●● ●
●
●
● ●
●●●●●
● ●●
●●
● ●● ● ●●●
●●●●
●
●●
●
● ● ●
●●●
●●●● ● ●
●
●●
●●●
● ●●●
● ●●● ● ● ●
●●
●●
●
● ●●
●
●●
●
● ●
●●
●●
●● ●●
●
●●● ●● ●
●
●●● ●
● ● ●
● ●
●
●
●●
● ●●
●●●●●●
●● ●
●
●
●●●
●
●●
●●●
● ●
● ●
●●●●●●●
● ●●
●
●●
●● ●●●
●
●●
●●
● ●●
●
●
● ●●●● ● ●
● ●
● ● ●● ●●●
●
●
●● ●● ●●
●●●●●●
●●
●●●● ●●●●●●
● ●
●●● ● ●
●● ●
●●
●
●●
● ● ●●● ●
●● ●●
●●
●●
●●●
● ●● ●
●
●
●
●●● ●●
●
●
● ●
●●● ●
●●
●● ●
● ●
●
●
●●●●
●
●
● ●
●●
● ●●●
●●
● ●
●● ●●
●●
●
● ●
●●
●●
●
●
●
●●
●
●
● ● ●●●
●● ● ●
●●
●
●●
●●● ●
●
●● ●
● ●
●●
●
●
●
●
● ● ●● ●
●●●●
●● ●
●●
●
● ● ●●
●
●
●●●
●● ●
●●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●● ●
●●●
●●
●
●●
● ●●
● ●●
● ●● ●
● ●● ●
●
●
●● ●
●● ●
●●
●●
●●
●● ●● ●●●
●
●●
● ●
● ●●●
●
●●
●
●●
●●●
●●
●
●●
● ●●
●●
●
●
● ● ●● ●● ●
●
●●
●
●
●
●●
●● ●●●
● ●● ● ● ●●
● ●●●●
●
●
● ●●
●●
●
●
●
●
●●●
● ●●
● ● ● ●● ●● ● ●● ● ●●●● ● ●
●
●
●●●●● ●
●●
●
●
●
●
●●
●
●
● ●
●
●
●● ● ●
●●●
●●●
●●
●
●●
● ●
●●
● ● ●●
●
●
● ●● ●
● ●●
●●● ●
● ●● ● ●● ●
● ●● ● ●
● ●● ●● ● ● ●
●
●
●
●●
● ●
●
● ● ●
●
●●●●
●
● ●●
● ●●
●●
● ●
●
●
●
●
●
● ●
●
0.0
0.2
0.4
0.6
0.8
1.0
Model
y | x ∼ N(α + β x, σ 2 )
Zi ∼ N(Yi − Xi , .1)
P(Ii = 1 | Xi ) ∝
(3 ∗ I(Zi > 0) + 1) ∗ Xi
for sampling indicator
Ii ; sample WOR n = 50
(—) Unweighted
ˆ + βˆxi
α
X
x
19 / 42
Using sampling weights in models: Correct model
specification, nonignorable sampling
2
3
Population:
Xi ∼ UNI(0, 1),
Yi | Xi ∼ N(Xi , .25),
i = 1, ..., 1000
●
●
●
●
●
●●
●
● ●●
●●
● ●● ●
●● ● ●●
●●●
●●
●
●
●
●
●
● ● ●●● ● ●
●
●
●
●
●
●● ●●●
● ●
●
●●● ●●●● ●●●
● ●
●●●
●
●
●
●
●●● ●
● ●
●
●●
●●
●●
●
●
●● ●
● ●● ● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
●● ●
●● ●
●
● ●● ●
●●● ●
●
●●●
●
●
●
●
●●● ●●●●●
●
● ●● ●
● ●
●●●●
●●●
● ●●
●
● ●●● ●
●● ●●●● ●
● ●●●
●
●
●●
●
●●
●
●●
● ● ●● ●●● ● ● ● ●
●
●
●
●
●
●
●●● ●●
●●●
●● ●
●
●
●● ●
●
●
●
● ●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●
● ●●
●●
●●
●
●●
●●●●●●●●●
●
●
●
● ●
●●
● ●
● ● ●●●
●●
●
●●●●
●
●
● ● ●
●●●
●● ● ● ●
●
●●
●●●
●● ●●
● ●●● ● ● ●
●●
●● ●
●
● ●●
●
●●
●
● ●
●
●●
●
●● ●●
●
● ●●
●●● ●
● ● ●● ●
●● ●
● ●
●
●●
●●●●●●
●● ●
●
●
●●●●
●
●●
●●
● ●
●●
●● ●
● ●●
●
●●
●●●● ●
●●●●
●
●●
●●
● ●●
●
●
● ●●●● ● ●
● ●
● ● ●● ●●●
●●
●
●
●
●● ●● ●●
● ●●
●●●●●●
●
●●
●●●● ●●●●●●
●
●●● ● ●
●● ●
●
●●
● ● ●●● ●
●● ●●
●●
●●
●●●
● ●● ●
●
●
●
●●● ●●
●
●
● ●
●●● ●
●●
●● ●
● ●
●
●
●
●
● ●
●●
● ●●●
●●
● ●
●●●●
●● ●●
●●
●
● ●
●●
●●
●
●● ●●●
●
●
● ● ●●●
●
●●
●
●●
●●● ●
●
●● ●
● ●
●●
●
●
●
●
● ● ●● ●
●●●●
●● ●
●
●
●●
●
●
●●
● ● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
● ●●●
●
●● ●
●●● ●
●●
●
●●
● ●● ●
● ●●
● ●● ●
● ●●● ●
●
●
●
●● ●
●● ●
●●
●●
●●
●● ●● ●●●
●
●●
● ●
● ●●●
●
●●
●
●●
●●●
●●
●
●●
● ●●
●●
●
●
● ● ●● ●● ●
●
●●
●
●
●
●●
●● ●●●
● ●● ● ● ●●
● ●●●●
●
●
● ●●
●●
●
●
●
●
●●●
● ●●
● ● ● ●● ●● ● ●● ● ●●●● ● ●
●
●
●●●●● ●
●●
●
●
●
●
●●
●
●
● ●
●
●
●● ● ●
●●●
●●●
●●
●
●●
● ●
●●
● ● ●●
●
●
● ●● ●
● ●●
●●● ●
● ●● ● ●● ●
● ●● ● ●
● ●● ●● ● ● ●
●
●
●
●●
● ●
●
● ● ●
●
●●●●
●
● ●●
● ●●
●●
● ●
●
●
●
●
●
● ●
●
−1
0
Y
y
1
●
●
●
0.0
0.2
●
●
●
0.4
0.6
X
x
0.8
1.0
Model
y | x ∼ N(α + β x, σ 2 )
Zi ∼ N(Yi − Xi , .1)
P(Ii = 1 | Xi ) ∝
(3 ∗ I(Zi > 0) + 1) ∗ Xi
for sampling indicator
Ii ; sample WOR n = 50
(—) Unweighted
ˆ + βˆxi
α
(—) Weighted
ˆw + βˆw xi
α
20 / 42
Pseudo maximum likelihood estimator
This idea of using design-adjusted estimators for models is
encapsulated in the “pseudo-maxmimum likelihood estimator”
(PMLE) (Binder 1983, Pfeffermann 1993).
Replace unweighted score equation
n
U(θ ) =
∑ Ui (θ )
i =1
for Ui =
∂
∂θ
log L(θ ; yi ) with weighted score equation
n
Uw (θ ) =
∑ wi Ui (θ )
i =1
Uw (θ ) is a consistent estimator of the population score equation
N
UN (θ ) =
∑ Ui (θ )
i =1
θˆw s.t. Uw (θˆw ) = 0 consistently estimates θˆN s.t. U(θˆN ) = 0.
Variance estimators of θˆw can be obtained by linearizing
the PMLE, or by jackknife or bootstrap.
21 / 42
Pseudo maximum likelihood estimator and Bayesian
methods
However, note that the weighted likelihood
lw (θ ; y ) = ∑ni=1 wi log Li (θ ; yi ), cannot serve in place of a
true likelihood for Bayesian estimation:
pw (θ | y ) ∝ lw (θ ; y )p(θ )???
Adding together sampled unit log likelihoods assumes
independence; even if stratification/clustering not present
and we ignore joint selection probabilities, we do not have
wi observations.
22 / 42
Pseudo maximum likelihood estimator and Bayesian
methods
However, note that the weighted likelihood
lw (θ ; y ) = ∑ni=1 wi log Li (θ ; yi ), cannot serve in place of a
true likelihood for Bayesian estimation:
pw (θ | y ) ∝ lw (θ ; y )p(θ )???
Adding together sampled unit log likelihoods assumes
independence; even if stratification/clustering not present
and we ignore joint selection probabilities, we do not have
wi observations.
Options:
Assume model is correct and sampling non-informative,
and ignore weights.
23 / 42
Pseudo maximum likelihood estimator and Bayesian
methods
However, note that the weighted likelihood
lw (θ ; y ) = ∑ni=1 wi log Li (θ ; yi ), cannot serve in place of a
true likelihood for Bayesian estimation:
pw (θ | y ) ∝ lw (θ ; y )p(θ )???
Adding together sampled unit log likelihoods assumes
independence; even if stratification/clustering not present
and we ignore joint selection probabilities, we do not have
wi observations.
Options:
Assume model is correct and sampling non-informative,
and ignore weights.
Construct a model that allows for misspecification that is
sensitive to the unequal probability of selection design:
Allow for interactions between model parameters of interest
and probability of selection.
24 / 42
Bayesian finite population inference
Inference about population quantity Q(Y ) based on
posterior predictive distribution of p(Ynob | Yobs , I), where
Ynob consists of the elements of Yi for which sampling
indicator Ii equals 0:
p(Ynob | Yobs , I) =
RR
RRR
p(Ynob | Yobs , θ , φ )p(I | Y , θ , φ )p(Yobs | θ )p(θ , φ )dθ dφ
p(Ynob | Yobs , θ , φ )p(I | Y , θ , φ )p(Yobs | θ )p(θ , φ )dθ dφ Ynobs
where φ models the inclusion indicator.
25 / 42
Bayesian finite population inference
φ and θ a priori independent and I is independent of Y →
sampling design “unconfounded” or “noninformative”;
distribution of I depends only on Yobs → sampling design
“ignorable” (Rubin 1987).
Under ignorable sampling designs p(θ , φ ) = p(θ )p(φ ) and
p(I | Y , θ , φ ) = p(I | Yobs , φ ):
R
p(Ynob | Yobs , I) = R R
p(Ynob | Yobs , θ )p(Yobs | θ )p(θ )dθ
p(Ynob | Yobs , θ )p(Yobs | θ )p(θ )dθ dYnobs
= p(Ynob | Yobs ),
Allows inference about Q(Y ) to be made without explicitly
modeling I (Ericson 1969, Holt and Smith 1979, Little 1993,
Rubin 1987, Skinner et al. 1989).
But ignorability requires data model that is attentive to
design features and robust enough to sufficiently capture all
relevant aspects of the distribution of Y of interest.
26 / 42
Linear regression example
Denote strata for disproportionally stratified or poststratifed
design by h = 1, ..., H (group by [weighted] percentiles if
continuous).
Interact with regression slope by allowing separate
regression estimators for each weight stratum:
yhi | β h , σ 2 ∼ N(x Thi β h , σ 2 )
(β T1 , . . . , β TH )T | β ∗ , G ∼ NHp (β ∗ , G)
p(φ , β ∗ , G) ∝ p(ζ )
The posterior mean of the population slope
−1 Nh
T)
h
B = ∑h ∑N
(x
x
x
y
∑
∑
h i=1 hi hi is given by
i=1 hi hi
"
E(B | y ) =
nh
∑ Wh ∑ x hi x Thi
h
i=1
#−1 "
nh
∑ Wh ∑ x hi x Thi
h
!
#
βˆ h
i=1
where βˆ h = E(βˆ h | y ).
27 / 42
Linear regression example
Assuming a degenerate prior for β h (G ≡ 0) implies β h = β for all h and thus
"
#−1 "
!#
nh
nh
T
T
βˆ =
E(B | y ) =
Wh
x hi x
Wh
x hi x
∑
∑
h
"
βˆ =
∑
hi
i=1
nh
∑∑
∑
h
#−1 "
x hi x Thi
nh
∑ ∑
h i=1
h
hi
i=1
!
x hi x Thi
#
yhi
i=1
This is the standard unweighted regression slope estimator.
i
i−1 h
h
nh
nh
x hi x Thi yhi
x hi x Thi
Assuming a flat prior for β h (G ≡ ∞) implies βˆ h = ∑i=1
∑i=1
and thus
!−1 "
#
!
#−1 
"
nh
nh
nh
nh
T
T
T
T

E(B | y ) = ∑ Wh ∑ x hi x hi
∑ x hi x hi yhi  =
∑ Wh ∑ x hi x hi ∑ x hi x hi
h
h
i=1
"
nh
∑ Wh ∑ x hi x Thi
h
i=1
i=1
#−1 "
i=1
i=1
nh
#
∑ Wh ∑ x hi x Thi yhi
h
i=1
This is the standard weighted regression slope estimator.
Using a proper prior for β h (0 < G < ∞) allows for an intermediate estimator
between unweighted and fully-weighted regression estimator – “weight
smoothing,” a form of data-driven weight trimming (Elliott 2007).
28 / 42
Bayesian finite population inference with weights
This general approach has a few drawbacks:
Quickly get a large number of parameters in even
moderately complex models
Use of priors to smooth can help stabilize, but what if want
to keep fully-weighted approach?
Somewhat unclear how to handle hierarchical models
(“cross-class” interaction between weighting effects and
model random effects).
Continuous weights (non-response, PPS) require
“ad-hocery” of pre-pooling
29 / 42
Bayesian finite population inference with weights
What if we could “covert” our complex design sample to a
simple random sample?
Use finite population Bayesian bootstrap to obtain draw of
P syn ∼ p(Ynob | y ) that can be regarded as a simple
random sample from the underlying superpopulation.
30 / 42
Non-parametric Synthetic Populations
Account for stratification and clustering by drawing L
Bayesian bootstraps sample of the clusters within each
stratum.
For stratum h with Ch clusters, draw Ch − 1 random
variables from U(0, 1) and order a1 , .., aCh −1 ; sample Ch
clusters with replacement with probability ac − ac−1 , where
a0 = 0 and aCh = 1.
Generate unobserved elements of the population within
each cluster c in stratum h accounting for selection weights
(Cohen 1997):
Draw a sample of size Nch − nch by drawing (yk , xk ) from
the ith unit among the nch sampled elements with
wi −1+li,k −1 ∗(Nch −nch )/nch
probability N −n
where li,k −1 is the
ch
ch +(k −1)∗(Nch −nch )/nch
number of bootstrap draws of the ith unit among the
previous k − 1 bootstrap selections.
Repeat S times for each lth boostrapped cluster.
Can show (Dong et al. 2012) that this is equivalent to a
Bayesian bootstrap Polya urn scheme (Lo 1988)
31 / 42
Bayesian Finite Population Inference with
Nonparametric Synthetic Populations
Large sample approximations (Raghunathan et al. 2003, Reiter
et al. 2003) yield
ˆ = E(Q | y ) = E(E(Q | P syn )|y ) ≈
Q
L
L−1 ∑ S −1
l =1
S
L
∑ Qˆ ls = L−1 ∑ Qˆ l·
s=1
l =1
ˆ = V (Q | y ) = E(V (Q | P
ˆ (Q)
V
obs
syn
)|y ) + V (E(Q | P syn )|y ) ≈
L
L
l =1
l =1
ˆ l· − L−1 Q
(1 + L−1 )(L − 1)−1 ∑ (Q
∑ ˆ l·· )2
ˆ ls is the estimator of Q derived from the sth finite
where Q
population Bayesian bootstrap of the lth Bayesian bootstrap.
ˆ is approximately normally distributed; otherwise
For large L, Q
approximate with t distribution with (L − 1) degrees of freedom.
32 / 42
“Design Based” Bayesian Inference
Use draws of nonparametric synthetic populations to feed
into Bayesian estimation procedure to obtain p(θ | P syn )
ˆ ls .
and Q
Large sample approximations allow for “design based”
Bayesian inference (Little 2011)
If we have t = 1, ..., T draws from p(θ | y ) obtained by
(l,s)
treating y as an SRS, we can obtain p(θ | P syn ) by
weighting the tth draw from p(θ | y ) by
"
qtls =
n
(l,s)
ri
#−1
∏ f (yi | θ (t) )
i=1
(l,s)
where ri
is the number of times that the ith observation
(l,s)
was resampled in synthetic populatoin P syn generated
by the lth and sth finite population Bayesian bootstrap.
33 / 42
Simulation Study
Population:
Measure of Size Mi ∼ U(101, 4100), i = 1, ..., N = 4000
Yi | Mi ∼ N((Mi − 2100.5)/2000, 10)
Two sample designs: simple random sample and proportional
to size (both without replacement, size n=100):
1. P(Ii = 1) = n/N for all i.
2. P(Ii = 1) = nMi / ∑i Mi
Unweighted mean n−1 ∑i Ii Yi :
1. (superpopulation) bias=0, weight CV=0.
2. (superpopulation) bias=.32, weight CV=1.35.
34 / 42
Simulation Study
Data model:
Yi | µ, σ 2 ∼ N(µ, σ 2 )
p(µ, σ 2 ) ∝ σ −1
Misspecified model, but sufficient under SRS.
Fit using Gibbs sample, 1,000 draws after 100 burn-in.
Compute posterior mean and 95% credible interval for
E(Y | y ) for 200 simulations.
Using standard approach
Using weighted finite population Bayesian bootstrap
(L = 20, S = 5).
35 / 42
Simulation Study
Mi ∼ U(101, 4100)
Yi | Mi ∼ N((Mi − 2100.5)/2000, 10)
Bias
Emp. SD
Mean Post. SD.
95% Coverage
SRS
Weighted
Standard
FPBB
.080
-.075
.306
.306
.320
.323
98
97
PPS
Weighted
Standard
FPBB
.286
.038
.319
.380
.326
.353
86
92
Both approach perform well under SRS.
Standard weighted appoached biased under PPS;
Bayesian FPBB removes bias and increases variance to
approximate correct coverage under PPS.
36 / 42
“Design Based” Bayesian Inference: Problems
Can anyone see what the problem with the proposed
reweighting “shortcut” is?
37 / 42
“Design Based” Bayesian Inference: Problems
Can anyone see what the problem with the proposed
reweighting “shortcut” is?
Reweighting is similar to importance sampling: requires
that the posterior generating the original draws from the
model that ignores the sampling design.
If σ 2 = 1, variability in p(µ | y ) will be insufficient to
reweight to remove bias.
Can diagnose through distribution of qls
0
400
800
Importance weights with variance=1
Frequency
800
400
0
Frequency
Importance weights with variance=10
0.00 0.05 0.10 0.15
0.0
0.4
0.8
38 / 42
Simulation Study
Mi ∼ U(101, 4100)
Yi | Mi ∼ N((Mi − 2100.5)/2000, 1)
Bias
Emp. SD
Mean Post. SD.
95% Coverage
SRS
Weighted
Standard
FPBB
.016
.016
.115
.116
.103
.107
98
98
PPS
Weighted
Standard
FPBB
.334
.099
.115
.133
.115
.097
14
76
FPBB unable to fully correct for bias because unweighted
posterior does not include FPBB posterior.
39 / 42
Simulation Study
Mi ∼ U(101, 4100)
Yi | Mi ∼ N((Mi − 2100.5)/2000, 1)
Bias
Emp. SD
Mean Post. SD.
95% Coverage
SRS
Weighted
Standard
FPBB
.016
.012
.115
.136
.103
.107
98
94
PPS
Weighted
Standard
FPBB
.334
.053
.115
.113
.326
.123
14
94
Simple solution: increase number of draws (T = 50, 000)
so that tail of unweighted posterior includes FPBB
posterior.
40 / 42
Simulation Study
Mi ∼ U(101, 4100)
Yi | Mi ∼ N((Mi − 2100.5)/2000, 1)
Bias
Emp. SD
Mean Post. SD.
95% Coverage
SRS
Weighted
Standard
FPBB
.016
.007
.115
.123
.103
.127
98
96
PPS
Weighted
Standard
FPBB
.334
.039
.115
.155
.326
.143
14
91
Simple solution #2: Draw ε (t) from N(0, 3Σ), where
Σ = var (θ | y ) and compute importance weights by
i
h
(l,s) −1
qtls = ∏ni=1 f (yi | θ (t) + ε (t) )ri
.
41 / 42
Discussion
Just starting to think through idea.
Need more realistic simulation studies (strata, clusters;
hierarchical models, latent variables).
Try in real world applications to see if variability in
importance weights is reasonable.
Better ways to generate draws for importance weights
(probably OK to generate ε from normal if posterior is
approximately normal; otherwise may need better
approximation to true posterior).
42 / 42