Bayesian Inference Booth/UChicago Homework 5 RBG/Spring 2015

Bayesian Inference
Homework 5
Booth/UChicago
RBG/Spring 2015
Solutions.
1. Calculate the distribution of x1 conditional on x2 by treating the x2 term in the joint
distribution as known:
1
1
2
2
.
π(x1 |x2 ) ∝ exp − (x1 − 1) (x2 − 2) ∼ N 1,
2
(x2 − 2)2
Similarly,
π(x2 |x1 ) ∼ N 2,
1
(x1 − 1)2
.
So both full conditionals are normal, and therefore easy to sample from. But π(x1 , x2 )
is improper, i.e.
Z ∞Z ∞
π(x1 , x2 )dx1 dx2 = ∞.
−∞
−∞
Thus, π is not a density and so using a Gibbs sampler to sample from it makes no
sense.
2. (a) θA and θB are clearly not independent. It is easy to see that one is defined in terms
of the other: θB = θA γ. This is a sensible prior when the parameter(s) describe
similar populations which are largely identical except for a small characteristic.
Men of similar age but differing in education is one example.
(b) To derive the conditional(s) it helps to first obtain an expression for the joint
posterior.
π(θ, γ|yA , yB ) ∝ π(yA , yB |θ, γ) × π(θ, γ) = π(yA |θ) × π(yB |θ, γ) × π(θ, γ)
nA
nB
Y
Y
∝
θyAi e−θ × (θγ)yBj e−θγ × θαθ −1 e−βθ γ αγ −1 e−βγ γ
i=1
∝e
−θnA
j=1
θ
P
i yAi
P
× e−θγnB (θγ)
j
yBj
× θαθ −1 e−βθ θ γ αγ −1 e−βγ γ .
Then the conditionals can be derived by treating the parameters conditioned upon
as constants of proportionality.
We have
P
P
π(θ|γ, yA , yB ) ∝ e−θ(nA +γnB +βθ ) θ i yAi + j yBj +αθ −1 ,
P
P
so {θ|γ, yA , yB } ∼ G( i yAi + j yBj + αθ , nA + γnB + βθ ).
1
posterior expectation of thetaB − thetaA
0.35
●
0.25
●
0.20
tBmtA
0.30
●
0.15
●
●
20
40
60
80
100
120
ag and bg
Figure 1: Posterior expectation of θB − θA as a function of aγ − bγ .
(c) Similarly
P
π(γ|θ, yA , yB ) ∝ e−γ(θnB +βγ ) γ j yBj +αγ −1 ,
P
so {γ|θ, yA , yB } ∼ G( j yBj + αγ , θnB + βγ ).
(d) See R code in birth_edu_male_gibbs.R.
Figure 1 shows the resulting posterior expectations as a function of the prior
values for γ. The γ parameter links θA to θB , and the prior mean is αγ /βγ = 1
for all choices of these values. Notice that the prior sample size increases with
bγ . So as these values increase, the prior for γ becomes more concentrated near
one, forcing the populations of θA and θB to become more similar a priori, and
simultaneously contribute more weight to the prior in the posterior (relative to
the data via the likelihood). However, we see from the figure that the prior sample
size must become very large before we would conclude that θB θA given this
particular data.
3. The MH acceptance ratio is
π(φ)q(φ, θ)
.
α(θ, φ) = min 1,
π(θ)q(θ, φ)
Here
π(θ) = √
Therefore
1
2πσ 2
e
−
1
θ2
2σ 2
,
and
q(θ, φ) = √
1
2πτ 2
1
2
exp − 2 (φ − aθ) .
2τ
π(φ)q(φ, θ)
1 2
1
a2 − 1
2
= exp − (φ − θ )
+
.
π(θ)q(θ, φ)
2
σ2
τ2
2
This sampler never rejects if α(θ, φ) ≡ 1 for all θ and φ. Observe that this happens if
2
and only if σ12 + a τ−1
= 0, i.e., when τ 2 = σ 2 (1 − a2 ). Note this also implies that we
2
must have τ 2 > σ 2 .
4. See clouds.R for R code.
(a) Figure 2 shows histograms of the two sets of clouds, and histograms of their
logged values. The original values are all positive, and there is a heavy right-hand
Histogram of clouds$seed
15
0
0
5
10
Frequency
10
5
Frequency
15
20
20
Histogram of clouds$none
0
200
600
1000
1400
0
500
1500
2500
clouds$seed
Histogram of log(clouds$none)
Histogram of log(clouds$seed)
2
4
Frequency
6
4
0
0
2
Frequency
6
8
8
clouds$none
0
2
4
6
8
1
log(clouds$none)
2
3
4
5
6
7
8
log(clouds$seed)
Figure 2: Histograms of the original data and logged values.
tail, both of which preclude (reasonable) modeling with a normal distribution. It
may be sensible to model the logged values with a normal distribution though, as
these are much more symmetric and have less of a heavy tail. That may lead to
an interesting comparison.
(b) The code uses uses RW normal proposals centered around the previous values
with σa2 = 0.25 for both cloud’s “a” parameters, and σb2 = 0.00001 for their “b”
parameters. These values were found by several pilot runs and inspecting ESSs
and autocorrelations.
For the unseeded clouds (the “none” column), with u subscripts, in round s + 1
3
(s)
we propose a0u ∼ N (au , 0.25). Draw u ∼ Unif(0, 1). Then, if
Q26
(s)
0
i=1 G(yu,i ; au , bu )
,
(G is the Gamma pdf)
u < Q26
(s) (s)
i=1 G(yu,i ; au , bu )
(s+1)
(s+1)
(s)
(s+1)
(s+1)
(s)
(s)
we take au
= a0u ; otherwise au
= au . Then we propose b0u ∼ N (bu , 0.00001).
Draw u ∼ Unif(0, 1). Then, if
Q26
(s+1) 0
, bu )
i=1 G(yu,i ; au
,
u < Q26
(s+1) (s)
, bu )
i=1 G(yu,i ; au
= bu .
we take bu
= b0u ; otherwise bu
The procedure is exactly the same for the seeded clouds, but with s subscripts.
Both chains were initialized at (a, b) = (2, 0.015).
seed b
0.010
0.000
2000
4000
6000
8000
10000
0
2000
4000
6000
Index
Index
none a
none b
8000
10000
8000
10000
0.010
0.005
0.000
0.5
1.0
none$b
1.5
2.0
0.015
0
none$a
0.005
seeded$b
2.0
1.5
1.0
0.5
seeded$a
2.5
3.0
0.015
seed a
0
2000
4000
6000
8000
10000
0
Index
2000
4000
6000
Index
Figure 3: Traces of samples from the (independent) posteriors.
Figure 3 shows the traces from the four parameters—two independent sets from
each cloud type. The chains quickly converge their respective target distributions
and seem to mix well.
4
(c) As in Question 2, it helps to have an expression for the full posterior to derive
a conditional. Lets consider the unseeded clouds since the derivation for seeded
clouds is similar. The u subscript will be omitted for brevity.
π(α, β|y) ∝
n
Y
β α α−1 −βyi
yi e
× λα e−λα α λβ e−λβ
Γ(α)
i=1
Therefore,
π(β|α, y) ∝ β nα e−β
Pn
i=1
yi
× λβ eλβ β
P
nα −β( n
i=1 yi +λβ )
=β e
,
Pns
so {βs |αs , ys } ∼ G(ns αs + 1, i=1 ysi + λβs )
P u
and {βu |αu , yu } ∼ G(nu αu + 1, ni=1
yui + λβu ).
(d) The sampler is the same as the one in part (a) except that we draw from the
corresponding gamma distribution instead of making RW proposals for βu and
βs . See the R code. We measure the quality of the MC approximation(s) by
effective sample size (ESS). Below are the results comparing the full MH version
[part (a)] to the one that uses Gibbs sampling for β. Only the seeded samples are
shown; the unseeded ones are similar.
a
b
gibbs 977.7568 1927.7788
mh
596.9219 543.7367
Notice how both α and β marginal chains have better ESSs for the Gibbs version,
even though only β is being sampled from a full conditional. The lower autocorrelation in the marginal β chain reduces the correlation in the marginal α chain
too.
(e) Consider two ways of comparing seeded to unseeded clouds via the posterior
distribution: (1) using the posterior mean(s) (α/β), and (2) using the predictive
distribution(s). Both can use the samples obtained from the (partial) Gibbs
sampler, or the original ones. See the R code. The posterior probability that
the posterior mean of the seeded clouds (rainfall) is greater than the unseeded
ones is 0.99, whereas the probability that a new seeded cloud would produce more
rainfall than a new unseeded one is lower, at 0.67. Both suggest strong evidence
that seeding clouds leads to a higher level of rainfall.
5