AMS 132/206: Classical and Bayesian Inference (Winter 2015) Data illustration of Bayesian inference Consider the acid concentration data example discussed in Examples 8.2.3, 8.5.4 and 8.6.2 of DeGroot and Schervish (2012). The data set we work with here combines the 10 measurements from Example 8.5.4 with the 20 measurements given in Exercise 8.6.16 for the full sample of size n = 30. To illustrate Bayesian analysis for a normal distribution under the conjugate prior, we assume a N(µ, τ ) distribution for lactic acid concentration, where µ is the mean and τ is the precision parameter (hence, the variance σ 2 = 1/τ ). Simple exploratory analysis of the data does not indicate obvious deviations from symmetry and unimodality for the underlying distribution, and thus the normal distribution assumption provides a reasonable starting point. Prior specification. The Bayesian model is applied with a gamma(α0 , β0 ) prior for τ and a N(µ0 , λ0 τ ) conditional prior for µ given τ , where α0 = 2, β0 = 0.125, µ0 = 1.45 and λ0 = 4.5. Details for this choice of prior hyperparameters will be given in class. Note that the value of α0 = 2 ensures finite marginal mean and variance for µ, but yields infinite prior variance for σ 2 ; recall that the gamma(α0 , β0 ) prior for τ implies an inverse-gamma(α0 , β0 ) prior for σ 2 . The remaining prior hyperparameters are specified based on a “prior guess” of (0.7, 2.2) for the range of plausible lactic acid concentration values; in the absence of actual prior information in this illustrative example, the interval above is chosen by expanding the range of observed values in the data. A useful means to study the combined effect of the priors for the model parameters is through the implied prior predictive distribution; refer to Figure 2 and the details given in the relevant section below. Inferences based on the posterior distribution. Under the specific prior given above, the posterior distribution for (µ, τ ) is analytically available through ξ(µ, τ | x) = N(µ | 1.443, 34.5τ ) gamma(τ | 17, 1.46) and we can also obtain the marginal posterior distribution, ξ(µ | x), from the transformation 20.04(µ − 1.443) | x ∼ t34 (refer to Theorems 8.6.1 and 8.6.2). The joint posterior density, ξ(µ, τ | x), and marginal posterior densities, ξ(µ | x) and ξ(τ | x), can be plotted and compared with the respective prior densities. Point and interval estimates for the parameters can be computed using moments and percentiles of the t and gamma marginal posterior −1 distributions. For instance, the 0.975 percentile of the t34 distribution, T34 (0.975) ≈ 2.033, and therefore Pr(−2.033 < 20.04(µ − 1.443) < 2.033 | x) = 0.95, from which we obtain Pr(1.3416 < µ < 1.5444 | x) = 0.95. Although the interpretation is different, the endpoints of the posterior probability interval can be contrasted with the 95% confidence interval for AMS 132/206 — Winter 2015 1 Athanasios Kottas 1.3 1.4 1.5 6 4 Density 2 0 25 20 15 o 10 5 * * **** * ***** * * * * * ******** ******* * * * **************** * * **************************************************** *** ************************************************************************* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ******************************************************************************* ** * * *********** ***** *** ** ** ** * * ************************************************************************************************************************************************************************************************************************* *** *** **************************************************************************************************************************************************************************************************************************************** * ************************************************************************************************************************************************************** ***** ***** * * *** ** * ******************************************************************************************************* *********** ********************************************************************************************************************* ** * * ***************************************************************************************** *** **************************************************************************************************************************************************************************** **************************************************************************** *** * ******************************************************************************************************************************************************************************************************************************************************************** * * * * * * * * * * * * * * * * * * * * * * *** ************************************************************************************************************************************************************ * * ** * ***** ****** * * ****** ******************************************************************************************************************************************************* **** ******** *********** ** *************************************************************** ******** * * * *** ** * ** ********************** ***** ****** * * * ** * * * 1.6 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 0.4 0.5 0.6 0.7 µ 10 8 0 0.00 2 4 6 Density 0.10 0.05 Density 0.15 12 µ 0 5 10 15 20 25 30 0.0 0.1 0.2 0.3 m o Figure 1: Simulation-based inference for the model parameters: bivariate plot of the posterior samples for (µ, τ ) (upper left panel), probability histogram of samples from ξ(µ | x) (upper right panel), from ξ(τ | x) (lower left panel), and from ξ(σ | x) (lower right panel). The red line in the upper right, lower left and lower right panels indicates the respective prior density. µ (based on the t29 distribution) given by (1.3287, 1.5553). The results become even more similar as the prior distribution for (µ, τ ) becomes more dispersed, that is, as the amount of prior information is reduced. Sampling from the posterior distribution. Inference for the model parameters and posterior predictive inference are facilitated by posterior simulation, that is, by sampling from the posterior distribution, ξ(µ, τ | x), and using the posterior samples for simulationbased inference and prediction. In the context of the specific problem, direct sampling from ξ(µ, τ | x) is straightforward using its decomposition in terms of the conditional posterior distribution for µ given τ and the marginal posterior distribution for τ . In particular, to collect B posterior samples, {(µb , τb ) : b = 1, ..., B}: for b = 1, ..., B, we draw τb ∼ gamma(17, 1.46); and then draw µb ∼ N(1.443, 34.5τb ). Note that {µb : b = 1, ..., B} is (by definition) a sample from the marginal posterior distribution ξ(µ | x), and we can therefore obtain inference for the mean even if we do not have access to the t distribution result. In particular, using the empirical 0.025 and 0.975 percentiles from the posterior samples for µ, we obtain AMS 132/206 — Winter 2015 2 Athanasios Kottas 0.5 1.0 1.5 2.0 2.5 3.0 Data histogram 1.2 Density 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0.0 0.2 0.4 0.6 Density 0.8 1.0 1.2 1.0 0.8 Density 0.6 0.4 0.2 0.0 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 Prior predictive density 0.5 1.0 1.5 2.0 2.5 3.0 Posterior predictive density Figure 2: Histogram of the lactic acid concentration data (left panel) along with simulation-based estimates of the prior and posterior predictive density (middle and right panels, respectively). Pr(1.3409 < µ < 1.5432 | x) = 0.95, which is essentially the same interval with the one based on the analytically available distribution for ξ(µ | x). Moreover, the posterior density of any function of the parameters can be estimated by simply evaluating the function on the posterior samples; see Figure 1 for an illustration with the standard deviation, σ = τ −1/2 . Prior and posterior predictive distributions. The posterior samples for (µ, τ ) can be used to sample from the posterior predictive distribution. Recall the form of the posterior predictive density: Z Z f (x0 | x) = N(x0 | µ, τ )ξ(µ, τ | x) dµdτ. Therefore, if for every sample (µb , τb ) from ξ(µ, τ | x) we draw x0,b from N(µb , τb ), the resulting {x0,b : b = 1, ..., B} provide a sample from the posterior predictive distribution which can be used for predictive inference given the data. The right panel of Figure 2 plots the simulation-based estimate of the posterior predictive density in the form of a histogram based on the posterior predictive samples. Following a similar approach where the posterior samples are replaced with samples from the prior distribution, ξ(µ, τ ), we can estimate the RR prior predictive density, f (x0 ) = N(x0 | µ, τ )ξ(µ, τ )dµdτ ; see the middle panel of Figure 2 for the results under the prior hyperparameters discussed above. AMS 132/206 — Winter 2015 3 Athanasios Kottas R code R code to estimate the prior predictive density: > B <- 10000 > pr.mu <- rep(0,B) > pr.tau <- rep(0,B) > pr.pred <- rep(0,B) > for(i in 1:B) + { + pr.tau[i] <- rgamma(1,shape=2,rate=0.125) + pr.mu[i] <- rnorm(1,mean=1.45,sd=sqrt(1/(4.5*pr.tau[i]))) + pr.pred[i] <- rnorm(1,mean=pr.mu[i],sd=sqrt(1/pr.tau[i])) + } > hist(prob=T,main="",xlab="Prior predictive density",xlim=c(0,3),ylim=c(0,1.3),pr.pred) and for the posterior predictive density: > B <- 10000 > mu <- rep(0,B) > tau <- rep(0,B) > post.pred <- rep(0,B) > for(i in 1:B) + { + tau[i] <- rgamma(1,shape=17,rate=1.46) + mu[i] <- rnorm(1,mean=1.443,sd=sqrt(1/(34.5*tau[i]))) + post.pred[i] <- rnorm(1,mean=mu[i],sd=sqrt(1/tau[i])) + } > hist(prob=T,main="",xlab="Posterior predictive density",xlim=c(0,3),ylim=c(0,1.3),post.pred) AMS 132/206 — Winter 2015 4 Athanasios Kottas
© Copyright 2024