Introduction to Bayesian Statistics Lecture 5: Monte Carlo Approximation Rung-Ching Tsai Department of Mathematics National Taiwan Normal University April 1, 2015 Monte Carlo Approximation • We are often interested in summarizing not only the posterior distribution of the parameter(s), but also other aspects of a posterior distribution. For examples: 1. Pr(θ ∈ A|y1 , · · · , yn ) for arbitrary sets A. 2. posterior distribution of |θ1 − θ2 | 3. posterior distribution of max{θ1 , · · · , θm } • Obtaining exact values for these posterior quantities can be difficult or impossible. • Instead, we can generate random sample values of the parameters from their posterior distributions, then all of these posterior quantities of interest can be approximated to an arbitrary degree of precision using the Monte Carlo method. 2 of 18 Monte Carlo approximation Once we have θ(1) , · · · , θ(S) iid samples from p(θ|y1 , · · · , yn ) • The empirical distribution of {θ (1) , · · · , θ (S) } is known as a Monte Carlo approximation to p(θ|y1 , · · · , yn ). PS (s) s=1 θ /S → E[θ|y1 , · · · , yn ]; PS (s) − θ) ¯ 2 /(S − 1) → Var[θ|y1 , · · · , yn ]; • s=1 (θ • θ¯ = • #(θ (s) ≤ c)/S → Pr(θ ≤ c|y1 , · · · , yn ); • the median of {θ (1) , · · · , θ (S) } → θ1/2 ; • the α-percentile of {θ (1) , · · · , θ (S) } → θα . • For any function of θ, g (θ), 1 S PS s=1 g (θ S → ∞. 3 of 18 (s) ) → E(g (θ)|y1 , · · · , yn ) = R g (θ)p(θ|y1 , · · · , yn ) as iid Example: Y1 , · · · , Yn |θ ∼ Poisson(θ) Y1 , · · · , Yn : numbers of children for the n women without college iid degrees; Y1 , · · · , Yn |θ ∼ Poisson(θ). Pn • Data: n = 44, ¯ = 1.50 i=1 yi = 66, y • Prior: θ ∼ gamma(α = 2, β = 1) • Posterior: θ|y ∼ gamma(α + P yi , β + n) θ|{n = 44, i=1 yi = 66} ∼ gamma(68, 45) • Quantity of interest: Pn ◦ Probability: Pr(θ < c|y) ◦ Expectation: E(θ|y) ◦ Quantiles: 100(1-α)% quantile-based credible interval • Methods ◦ Numerical integration ◦ Monte Carlo method 4 of 18 Example (conti.): θ|y ∼ gamma(68, 45) • Probabilities: the posterior probability that {θ < 1.75} ◦ Numerical integration: Z Pr(θ < 1.75|y) 1.75 Z p(θ|y1 , · · · , yn )dθ = = pgamma(1.75, 68, 45) = 0.8998 0 ◦ Monte Carlo: R codes: S = 10, 100, 1000 a ← 2; b ← 1; sy ← 66; n ← 44 theta .mc10 ← rgamma (10 , a+sy , b+n) theta .mc100 ← rgamma(100 , a+sy , b+n) theta .mc1000 ← rgamma(1000 , a+sy , b+n) > mean(theta .mc10<1.75) 0.9 > mean(theta .mc100<1.75) 0.94 > mean(theta .mc1000<1.75) 0.899 5 of 18 1.75 = 0 4548 67 −45θ θ e dθ Γ(68) Example (conti.): θ|y ∼ gamma(68, 45) • Expectation: P yi 68 ◦ numerical integration: posterior mean= α+ β+n = 45 = 1.51 ◦ Monte Carlo: > mean(theta .mc10) 1.532794 > mean(theta .mc100) 1.513947 > mean(theta .mc1000) 1.501015 • Quantiles: 95% quantile-based credible interval ◦ posterior quantiles=qgamma(c(.025,.975),68,45)=(1.173,1.891). ◦ Monte Carlo: 6 of 18 > quantile(theta .mc10, c(.025, .975)) 2.5% 97.5% 1.260291 1.750068 > quantile(theta .mc100, c(.025, .975)) 2.5% 97.5% 1.231646 1.813752 > quantile(theta .mc1000, c(.025, .975)) 2.5% 97.5% 1.180194 1.892473 Posterior inference for arbitrary functions g (θ) Quantity of interest: a function of θ, g (θ) sample θ(1) ∼ p(θ|y1 , · · · , yn ), compute γ (1) = g (θ(1) ) (2) (2) (2) sample θ ∼ p(θ|y , · · · , y ), compute γ = g (θ ) 1 n .. . sample θ(S) ∼ p(θ|y1 , · · · , yn ), compute γ (S) = g (θ(S) ) independently • γ (1) , · · · , γ (S) iid samples from p(γ|y1 , · · · , yn ) P ◦ γ¯ = Ss=1 γ (s) /S → E[γ|y1 , · · · , yn ] as S → ∞; PS (s) ◦ − γ¯ )2 /(S − 1) → Var[γ|y1 , · · · , yn ] as S → ∞. s=1 (γ 7 of 18 Example: Log-odds • Data: 54% of the respondents in the 1998 General Social Survey • • • • reported their religious preference as Protestant, leaving non-Protestants in the minority. Respondents were also asked if they agreed with a Supreme Court ruling that prohibited state or local governments from requiring the reading of religious texts in public schools. Of the n = 860 individuals in the religious minority (non-Protestant), y = 441 (51%) said they agreed with the Supreme Court ruling, whereas 353 of the 1011 Protestants (35%) agreed with the ruling. θ: the population proportion agreeing with the ruling in the minority population. prior: θ ∼ uniform(0, 1) posterior: θ|y ∼ beta(442, 420). θ quantity of interest: log-odds γ = log 1−θ 8 of 18 θ Example: Log-odds γ = log 1−θ (conti.) • posterior: θ|y ∼ beta(442, 420) • Monte Carlo: R codes ◦ samples from prior a ← 1; b ← 1 theta.prior.mc ← rbeta (10000, a, b) gamma.prior.mc ← log(theta.prior.mc/(1-theta.prior.mc)) density(gamma.prior.mc) ◦ samples from posterior n0 ← 860-441; n1 ← 441 theta.post.mc ← rbeta(10000, a+n1, b+n0) gamma.post.mc ← log(theta.post.mc/(1-theta.post.mc)) density(gamma.post.mc) 9 of 18 Example: Functions of two parameters • Goal: To compare the birthrates of women with college degrees to those without in terms of their numbers of children. ◦ Y1,1 , · · · , Yn1 ,1 : numbers of children for the n1 women without college iid degrees; Y1,1 , · · · , Yn1 ,1 |θ1 ∼ Poisson(θ1 ). ◦ Y1,2 , · · · , Yn2 ,2 : numbers of children for the n2 women with college iid degrees; Y1,2 , · · · , Yn2 ,2 |θ2 ∼ Poisson(θ2 ). • Data: P 1 Less than bachelors: n1 = 111,P ni=1 yi,1 = 217, y¯1 = 1.95 n2 Bachelors or higher: n2 = 44, i=1 yi,2 = 66, y¯2 = 1.50 iid • (Conjugate) Prior: θ1 , θ2 ∼ gamma(α = 2, β = 1) P yi , β + n) Pn1 θ1 |{n1 = 111,P i=1 yi,1 = 217} ∼ gamma(219, 112) 2 θ2 |{n2 = 44, ni=1 yi,2 = 66} ∼ gamma(68, 45) • Posterior distributions: θ|y ∼ gamma(α + 10 of 18 Monte Carlo method • Samples of (θ1 .θ2 ) (1) sample θ1 ∼ p(θ1 | 111 X (1) yi,1 = 217), sample θ2 ∼ p(θ2 | 44 X i=1 (2) sample θ1 ∼ p(θ1 | 111 X yi,2 = 66), i=1 (2) yi,1 = 217), sample θ2 ∼ p(θ2 | 44 X yi,2 = 66), i=1 i=1 ··· (S) sample θ1 ∼ p(θ1 | 111 X (S) yi,1 = 217), sample θ2 ∼ p(θ2 | i=1 (1) (1) (S) (S) 44 X yi,2 = 66) i=1 {(θ1 , θ2 ), · · · , (θ1 , θ2 )} iid samples from joint posterior distribution of θ1 and θ2 • R code a ← 2 ; b ← 1; sy1 ← 217; n1 ← 111; sy2 ← 66; n2 ← 44 theta1.mc ← rgamma(10000, a+sy1, b+n1) theta2.mc ← rgamma(10000, a+sy2, b+n2) 11 of 18 Example: Functions of two parameters • Quantity of our interest: Pr(θ1 > θ2 |y1,1 , · · · , yn1 ,1 , y1,2 , · · · , yn2 ,2 ) • Methods ◦ Numerical integration Z ∞ Z θ1 Pr(θ1 > θ2 |y) = p(θ1 , θ2 |y1,1 , · · · , yn1 ,1 , y1,2 , · · · , yn2 ,2 )dθ2 dθ1 0 Z 0 ∞ Z = 0 0 θ1 112219 4548 218 −112θ1 67 −45θ2 θ e θ2 e dθ2 dθ1 . Γ(219)Γ(68) 1 ◦ Monte Carlo: PS (s) (s) Pr(θ1 > θ2 |y) ≈ S1 s=1 1(θ1 > θ2 ) mean(theta1.mc > theta2 .mc) 0.9708 12 of 18 Example: Functions of two parameters • Quantity of our interest: the posterior distribution of θ1 /θ2 • Monte Carlo: We could use the empirical distribution of (1) (1) (2) (2) (S) (S) {θ1 /θ2 , θ1 /θ2 , · · · , θ1 /θ2 } to approximate the posterior distribution of θ1 /θ2 . • R code a ← 2 ; b ← 1; sy1 ← 217; n1 ← 111; sy2 ← 66; n2 ← 44 theta1.mc ← rgamma(10000, a+sy1, b+n1) theta2.mc ← rgamma(10000, a+sy2, b+n2) ratiooftheta ← theta1.mc/theta2.mc density(ratiooftheta) 13 of 18 Sampling from predictive distributions R • Prior predictive distribution: p(Y˜ = y˜ ) = p(˜ y |θ)p(θ)dθ • Posterior predictive distribution: Z y |θ)p(θ|y1 , · · · , yn )dθ p(Y˜ = y˜ |y1 , · · · , yn ) = p(˜ ◦ Samples of (θ, y˜ ) sample θ(1) ∼ p(θ|y1 , · · · , yn ), sample y˜ (1) ∼ p(˜ y |θ(1) ), sample θ(2) ∼ p(θ|y1 , · · · , yn ), sample y˜ (2) ∼ p(˜ y |θ(2) ), ··· sample θ (S) ∼ p(θ|y1 , · · · , yn ), sample y˜ (S) ∼ p(˜ y |θ(S) ), ◦ {(θ(1) , y˜ (1) ), · · · , (θ(S) , y˜ (S) )} are iid samples from joint posterior distribution of θ and y˜ ◦ {˜ y (1) , · · · , y˜ (S) } are samples from posterior predictive distribution of y˜ . 14 of 18 Example: Poisson model • Y1,1 , · · · , Yn1 ,1 : numbers of children for the n1 women without college degrees; Y1,2 , · · · , Yn2 ,2 : numbers of children for the n2 women with college degrees; iid iid Y1,1 , · · · , Yn1 ,1 |θ1 ∼ Poisson(θ1 ) ; Y1,2 , · · · , Yn2 ,2 |θ2 ∼ Poisson(θ2 ). • Data: P1 Less than bachelors: n1 = 111,P ni=1 yi,1 = 217, y¯1 = 1.95 2 Bachelors or higher: n2 = 44, ni=1 yi,2 = 66, y¯2 = 1.50 iid • (Conjugate) Prior: θ1 , θ2 ∼ gamma(α = 2, β = 1) P • Posterior distributions: θ|y ∼ gamma(α + yi , β + n) P1 θ1 |{n1 = 111, ni=1 yi,1 = 217} ∼ gamma(219, 112) P2 θ2 |{n2 = 44, ni=1 yi,2 = 66} ∼ gamma(68, 45) • Question: Consider two randomly sampled individuals, one from each of the two populations. To what extent do we expect the one without the bachelors degree to have more children than the other? i.e. find Pr(Y˜1 > Y˜2 |y). 15 of 18 Posterior Predictive distribution of Y˜ • Prior of θ: p(θ) ∼ gamma(α, β) • Likelihood of θ: Y1 , · · · , Yn |θ ∼ Poisson(θ) P • Posterior of θ given y: θ|y ∼ gamma(α + i=1 Yi , β + n). ˜ given y: • Posterior predictive of Y ∞ Z p(˜ y |y1 , · · · , yn ) Z p(˜ y |θ, y)p(θ|y)dθ = = 0 Z = = = = 16 of 18 ∞ p(˜ y |θ)p(θ|y1 , · · · , yn )dθ 0 P α+ yi ∞ P 1 y˜ −θ (β + n) P θ e }{ θα+ yi −1 e −(β+n)θ }dθ y ˜ ! Γ(α + y ) i 0 P Z ∞ P (β + n)α+ yi α+ yi +˜ y −1 −(β+n+1)θ P θ e }dθ Γ(˜ y + 1)Γ(α + yi ) 0 P P (β + n)α+ yi Γ(α + yi + y˜ ) P P { } Γ(˜ y + 1)Γ(α + yi ) (β + n + 1)α+ yi +˜y P α+ yi y˜ P Γ(α + yi + y˜ ) β+n 1 P Γ(˜ y + 1)Γ(α + yi ) β + n + 1 β+n+1 { Posterior Predictive distribution of Y˜ (conti.) ˜ given y • Posterior predictive of Y P α+P yi y˜ β+n 1 Γ(α + yi + y˜ ) P p(˜ y |y1 , · · · , yn ) = Γ(˜ y + 1)Γ(α + yi ) β + n + 1 β+n+1 P i.e., Y˜ |y ∼ Negative Binomial(α + yi , β + n) P yi ◦ E(Y˜ |y) = α+ β+nP ◦ Var(Y˜ |y) = α+ y2i (β + n + 1) = (β+n) P α+ yi β+n+1 β+n β+n • Remark on Negative Binomial distribution: X ∼ Neg-bin(a, b), x +a−1 b a 1 x ( ) ( ) a−1 b+1 b+1 Γ(x + a) b a 1 x = ( ) ( ) Γ(x + 1)Γ(a) b + 1 b + 1 p(x) = E(X ) = ba , Var(X ) = 17 of 18 a (b b2 + 1) Posterior Predictive distribution of Y˜ (conti.) Monte Carlo method: • Known Negative-binomial a ← 2 ; b ← 1; sy1 ← 217; n1 ← 111; sy2 ← 66; n2 ← 44 > tildey1.mc ← rnbinom(10000, size=(a+sy1), mu=(a+sy1)/(b+n1)) > tildey2.mc ← rnbinom(10000, size=(a+sy2), mu=(a+sy2)/(b+n2)) mean( tildey1.mc > tildey2.mc) • Mixture of Poisson a ← 2 ; b ← 1; sy1 ← 217; n1 ← 111; sy2 ← 66; n2 ← 44 > theta1.mc← rgamma(10000, a+sy1, b+n1) > theta2.mc← rgamma(10000, a+sy2, b+n2) > tildey1.mc← rpois(10000, theta1.mc) > tildey2.mc← rpois(10000, theta2.mc) mean(tildey1.mc > tildey2.mc) 0.4823 18 of 18
© Copyright 2024