Bayesian Inference Homework 5 Booth/UChicago RBG/Spring 2015 Solutions. 1. Calculate the distribution of x1 conditional on x2 by treating the x2 term in the joint distribution as known: 1 1 2 2 . π(x1 |x2 ) ∝ exp − (x1 − 1) (x2 − 2) ∼ N 1, 2 (x2 − 2)2 Similarly, π(x2 |x1 ) ∼ N 2, 1 (x1 − 1)2 . So both full conditionals are normal, and therefore easy to sample from. But π(x1 , x2 ) is improper, i.e. Z ∞Z ∞ π(x1 , x2 )dx1 dx2 = ∞. −∞ −∞ Thus, π is not a density and so using a Gibbs sampler to sample from it makes no sense. 2. (a) θA and θB are clearly not independent. It is easy to see that one is defined in terms of the other: θB = θA γ. This is a sensible prior when the parameter(s) describe similar populations which are largely identical except for a small characteristic. Men of similar age but differing in education is one example. (b) To derive the conditional(s) it helps to first obtain an expression for the joint posterior. π(θ, γ|yA , yB ) ∝ π(yA , yB |θ, γ) × π(θ, γ) = π(yA |θ) × π(yB |θ, γ) × π(θ, γ) nA nB Y Y ∝ θyAi e−θ × (θγ)yBj e−θγ × θαθ −1 e−βθ γ αγ −1 e−βγ γ i=1 ∝e −θnA j=1 θ P i yAi P × e−θγnB (θγ) j yBj × θαθ −1 e−βθ θ γ αγ −1 e−βγ γ . Then the conditionals can be derived by treating the parameters conditioned upon as constants of proportionality. We have P P π(θ|γ, yA , yB ) ∝ e−θ(nA +γnB +βθ ) θ i yAi + j yBj +αθ −1 , P P so {θ|γ, yA , yB } ∼ G( i yAi + j yBj + αθ , nA + γnB + βθ ). 1 posterior expectation of thetaB − thetaA 0.35 ● 0.25 ● 0.20 tBmtA 0.30 ● 0.15 ● ● 20 40 60 80 100 120 ag and bg Figure 1: Posterior expectation of θB − θA as a function of aγ − bγ . (c) Similarly P π(γ|θ, yA , yB ) ∝ e−γ(θnB +βγ ) γ j yBj +αγ −1 , P so {γ|θ, yA , yB } ∼ G( j yBj + αγ , θnB + βγ ). (d) See R code in birth_edu_male_gibbs.R. Figure 1 shows the resulting posterior expectations as a function of the prior values for γ. The γ parameter links θA to θB , and the prior mean is αγ /βγ = 1 for all choices of these values. Notice that the prior sample size increases with bγ . So as these values increase, the prior for γ becomes more concentrated near one, forcing the populations of θA and θB to become more similar a priori, and simultaneously contribute more weight to the prior in the posterior (relative to the data via the likelihood). However, we see from the figure that the prior sample size must become very large before we would conclude that θB θA given this particular data. 3. The MH acceptance ratio is π(φ)q(φ, θ) . α(θ, φ) = min 1, π(θ)q(θ, φ) Here π(θ) = √ Therefore 1 2πσ 2 e − 1 θ2 2σ 2 , and q(θ, φ) = √ 1 2πτ 2 1 2 exp − 2 (φ − aθ) . 2τ π(φ)q(φ, θ) 1 2 1 a2 − 1 2 = exp − (φ − θ ) + . π(θ)q(θ, φ) 2 σ2 τ2 2 This sampler never rejects if α(θ, φ) ≡ 1 for all θ and φ. Observe that this happens if 2 and only if σ12 + a τ−1 = 0, i.e., when τ 2 = σ 2 (1 − a2 ). Note this also implies that we 2 must have τ 2 > σ 2 . 4. See clouds.R for R code. (a) Figure 2 shows histograms of the two sets of clouds, and histograms of their logged values. The original values are all positive, and there is a heavy right-hand Histogram of clouds$seed 15 0 0 5 10 Frequency 10 5 Frequency 15 20 20 Histogram of clouds$none 0 200 600 1000 1400 0 500 1500 2500 clouds$seed Histogram of log(clouds$none) Histogram of log(clouds$seed) 2 4 Frequency 6 4 0 0 2 Frequency 6 8 8 clouds$none 0 2 4 6 8 1 log(clouds$none) 2 3 4 5 6 7 8 log(clouds$seed) Figure 2: Histograms of the original data and logged values. tail, both of which preclude (reasonable) modeling with a normal distribution. It may be sensible to model the logged values with a normal distribution though, as these are much more symmetric and have less of a heavy tail. That may lead to an interesting comparison. (b) The code uses uses RW normal proposals centered around the previous values with σa2 = 0.25 for both cloud’s “a” parameters, and σb2 = 0.00001 for their “b” parameters. These values were found by several pilot runs and inspecting ESSs and autocorrelations. For the unseeded clouds (the “none” column), with u subscripts, in round s + 1 3 (s) we propose a0u ∼ N (au , 0.25). Draw u ∼ Unif(0, 1). Then, if Q26 (s) 0 i=1 G(yu,i ; au , bu ) , (G is the Gamma pdf) u < Q26 (s) (s) i=1 G(yu,i ; au , bu ) (s+1) (s+1) (s) (s+1) (s+1) (s) (s) we take au = a0u ; otherwise au = au . Then we propose b0u ∼ N (bu , 0.00001). Draw u ∼ Unif(0, 1). Then, if Q26 (s+1) 0 , bu ) i=1 G(yu,i ; au , u < Q26 (s+1) (s) , bu ) i=1 G(yu,i ; au = bu . we take bu = b0u ; otherwise bu The procedure is exactly the same for the seeded clouds, but with s subscripts. Both chains were initialized at (a, b) = (2, 0.015). seed b 0.010 0.000 2000 4000 6000 8000 10000 0 2000 4000 6000 Index Index none a none b 8000 10000 8000 10000 0.010 0.005 0.000 0.5 1.0 none$b 1.5 2.0 0.015 0 none$a 0.005 seeded$b 2.0 1.5 1.0 0.5 seeded$a 2.5 3.0 0.015 seed a 0 2000 4000 6000 8000 10000 0 Index 2000 4000 6000 Index Figure 3: Traces of samples from the (independent) posteriors. Figure 3 shows the traces from the four parameters—two independent sets from each cloud type. The chains quickly converge their respective target distributions and seem to mix well. 4 (c) As in Question 2, it helps to have an expression for the full posterior to derive a conditional. Lets consider the unseeded clouds since the derivation for seeded clouds is similar. The u subscript will be omitted for brevity. π(α, β|y) ∝ n Y β α α−1 −βyi yi e × λα e−λα α λβ e−λβ Γ(α) i=1 Therefore, π(β|α, y) ∝ β nα e−β Pn i=1 yi × λβ eλβ β P nα −β( n i=1 yi +λβ ) =β e , Pns so {βs |αs , ys } ∼ G(ns αs + 1, i=1 ysi + λβs ) P u and {βu |αu , yu } ∼ G(nu αu + 1, ni=1 yui + λβu ). (d) The sampler is the same as the one in part (a) except that we draw from the corresponding gamma distribution instead of making RW proposals for βu and βs . See the R code. We measure the quality of the MC approximation(s) by effective sample size (ESS). Below are the results comparing the full MH version [part (a)] to the one that uses Gibbs sampling for β. Only the seeded samples are shown; the unseeded ones are similar. a b gibbs 977.7568 1927.7788 mh 596.9219 543.7367 Notice how both α and β marginal chains have better ESSs for the Gibbs version, even though only β is being sampled from a full conditional. The lower autocorrelation in the marginal β chain reduces the correlation in the marginal α chain too. (e) Consider two ways of comparing seeded to unseeded clouds via the posterior distribution: (1) using the posterior mean(s) (α/β), and (2) using the predictive distribution(s). Both can use the samples obtained from the (partial) Gibbs sampler, or the original ones. See the R code. The posterior probability that the posterior mean of the seeded clouds (rainfall) is greater than the unseeded ones is 0.99, whereas the probability that a new seeded cloud would produce more rainfall than a new unseeded one is lower, at 0.67. Both suggest strong evidence that seeding clouds leads to a higher level of rainfall. 5
© Copyright 2024