AMS 132/206: Classical and Bayesian Inference (Winter

AMS 132/206: Classical and Bayesian Inference (Winter 2015)
Data illustration of Bayesian inference
Consider the acid concentration data example discussed in Examples 8.2.3, 8.5.4 and
8.6.2 of DeGroot and Schervish (2012). The data set we work with here combines the 10
measurements from Example 8.5.4 with the 20 measurements given in Exercise 8.6.16 for the
full sample of size n = 30.
To illustrate Bayesian analysis for a normal distribution under the conjugate prior, we
assume a N(µ, τ ) distribution for lactic acid concentration, where µ is the mean and τ is the
precision parameter (hence, the variance σ 2 = 1/τ ). Simple exploratory analysis of the data
does not indicate obvious deviations from symmetry and unimodality for the underlying distribution, and thus the normal distribution assumption provides a reasonable starting point.
Prior specification. The Bayesian model is applied with a gamma(α0 , β0 ) prior for τ
and a N(µ0 , λ0 τ ) conditional prior for µ given τ , where α0 = 2, β0 = 0.125, µ0 = 1.45 and
λ0 = 4.5. Details for this choice of prior hyperparameters will be given in class. Note that
the value of α0 = 2 ensures finite marginal mean and variance for µ, but yields infinite prior
variance for σ 2 ; recall that the gamma(α0 , β0 ) prior for τ implies an inverse-gamma(α0 , β0 )
prior for σ 2 . The remaining prior hyperparameters are specified based on a “prior guess” of
(0.7, 2.2) for the range of plausible lactic acid concentration values; in the absence of actual
prior information in this illustrative example, the interval above is chosen by expanding the
range of observed values in the data. A useful means to study the combined effect of the
priors for the model parameters is through the implied prior predictive distribution; refer to
Figure 2 and the details given in the relevant section below.
Inferences based on the posterior distribution. Under the specific prior given above,
the posterior distribution for (µ, τ ) is analytically available through
ξ(µ, τ | x) = N(µ | 1.443, 34.5τ ) gamma(τ | 17, 1.46)
and we can also obtain the marginal posterior distribution, ξ(µ | x), from the transformation
20.04(µ − 1.443) | x ∼ t34 (refer to Theorems 8.6.1 and 8.6.2). The joint posterior density,
ξ(µ, τ | x), and marginal posterior densities, ξ(µ | x) and ξ(τ | x), can be plotted and compared with the respective prior densities. Point and interval estimates for the parameters
can be computed using moments and percentiles of the t and gamma marginal posterior
−1
distributions. For instance, the 0.975 percentile of the t34 distribution, T34
(0.975) ≈ 2.033,
and therefore Pr(−2.033 < 20.04(µ − 1.443) < 2.033 | x) = 0.95, from which we obtain
Pr(1.3416 < µ < 1.5444 | x) = 0.95. Although the interpretation is different, the endpoints
of the posterior probability interval can be contrasted with the 95% confidence interval for
AMS 132/206 — Winter 2015
1
Athanasios Kottas
1.3
1.4
1.5
6
4
Density
2
0
25
20
15
o
10
5
*
* **** * ***** * *
*
* * ******** ******* *
*
*
**************** *
* **************************************************** ***
*************************************************************************
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
******************************************************************************* **
* * *********** ***** *** ** ** ** *
* ************************************************************************************************************************************************************************************************************************* ***
*** **************************************************************************************************************************************************************************************************************************************** *
**************************************************************************************************************************************************************
***** ***** * * ***
** * *******************************************************************************************************
***********
********************************************************************************************************************* **
* * *****************************************************************************************
*** ****************************************************************************************************************************************************************************
**************************************************************************** *** *
******************************************************************************************************************************************************************************************************************************************************************** *
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
***
************************************************************************************************************************************************************ *
*
** * ***** ****** *
* ****** *******************************************************************************************************************************************************
**** ******** *********** ** *************************************************************** ******** * * *
*** ** * ** ********************** ***** ****** *
*
* ** * *
*
1.6
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
0.4
0.5
0.6
0.7
µ
10
8
0
0.00
2
4
6
Density
0.10
0.05
Density
0.15
12
µ
0
5
10
15
20
25
30
0.0
0.1
0.2
0.3
m
o
Figure 1: Simulation-based inference for the model parameters: bivariate plot of the posterior samples
for (µ, τ ) (upper left panel), probability histogram of samples from ξ(µ | x) (upper right panel), from
ξ(τ | x) (lower left panel), and from ξ(σ | x) (lower right panel). The red line in the upper right,
lower left and lower right panels indicates the respective prior density.
µ (based on the t29 distribution) given by (1.3287, 1.5553). The results become even more
similar as the prior distribution for (µ, τ ) becomes more dispersed, that is, as the amount of
prior information is reduced.
Sampling from the posterior distribution. Inference for the model parameters and
posterior predictive inference are facilitated by posterior simulation, that is, by sampling
from the posterior distribution, ξ(µ, τ | x), and using the posterior samples for simulationbased inference and prediction. In the context of the specific problem, direct sampling from
ξ(µ, τ | x) is straightforward using its decomposition in terms of the conditional posterior distribution for µ given τ and the marginal posterior distribution for τ . In particular, to collect
B posterior samples, {(µb , τb ) : b = 1, ..., B}: for b = 1, ..., B, we draw τb ∼ gamma(17, 1.46);
and then draw µb ∼ N(1.443, 34.5τb ). Note that {µb : b = 1, ..., B} is (by definition) a sample from the marginal posterior distribution ξ(µ | x), and we can therefore obtain inference
for the mean even if we do not have access to the t distribution result. In particular, using the empirical 0.025 and 0.975 percentiles from the posterior samples for µ, we obtain
AMS 132/206 — Winter 2015
2
Athanasios Kottas
0.5
1.0
1.5
2.0
2.5
3.0
Data histogram
1.2
Density
0.0
0.2
0.4
0.6
0.8
1.0
1.2
0.0
0.2
0.4
0.6
Density
0.8
1.0
1.2
1.0
0.8
Density
0.6
0.4
0.2
0.0
0.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
0.0
Prior predictive density
0.5
1.0
1.5
2.0
2.5
3.0
Posterior predictive density
Figure 2: Histogram of the lactic acid concentration data (left panel) along with simulation-based
estimates of the prior and posterior predictive density (middle and right panels, respectively).
Pr(1.3409 < µ < 1.5432 | x) = 0.95, which is essentially the same interval with the one
based on the analytically available distribution for ξ(µ | x). Moreover, the posterior density
of any function of the parameters can be estimated by simply evaluating the function on the
posterior samples; see Figure 1 for an illustration with the standard deviation, σ = τ −1/2 .
Prior and posterior predictive distributions. The posterior samples for (µ, τ ) can be
used to sample from the posterior predictive distribution. Recall the form of the posterior
predictive density:
Z Z
f (x0 | x) =
N(x0 | µ, τ )ξ(µ, τ | x) dµdτ.
Therefore, if for every sample (µb , τb ) from ξ(µ, τ | x) we draw x0,b from N(µb , τb ), the
resulting {x0,b : b = 1, ..., B} provide a sample from the posterior predictive distribution
which can be used for predictive inference given the data. The right panel of Figure 2 plots
the simulation-based estimate of the posterior predictive density in the form of a histogram
based on the posterior predictive samples. Following a similar approach where the posterior
samples are replaced with samples from the prior distribution, ξ(µ, τ ), we can estimate the
RR
prior predictive density, f (x0 ) =
N(x0 | µ, τ )ξ(µ, τ )dµdτ ; see the middle panel of Figure
2 for the results under the prior hyperparameters discussed above.
AMS 132/206 — Winter 2015
3
Athanasios Kottas
R code
R code to estimate the prior predictive density:
> B <- 10000
> pr.mu <- rep(0,B)
> pr.tau <- rep(0,B)
> pr.pred <- rep(0,B)
> for(i in 1:B)
+ {
+ pr.tau[i] <- rgamma(1,shape=2,rate=0.125)
+ pr.mu[i] <- rnorm(1,mean=1.45,sd=sqrt(1/(4.5*pr.tau[i])))
+ pr.pred[i] <- rnorm(1,mean=pr.mu[i],sd=sqrt(1/pr.tau[i]))
+ }
> hist(prob=T,main="",xlab="Prior predictive density",xlim=c(0,3),ylim=c(0,1.3),pr.pred)
and for the posterior predictive density:
> B <- 10000
> mu <- rep(0,B)
> tau <- rep(0,B)
> post.pred <- rep(0,B)
> for(i in 1:B)
+ {
+ tau[i] <- rgamma(1,shape=17,rate=1.46)
+ mu[i] <- rnorm(1,mean=1.443,sd=sqrt(1/(34.5*tau[i])))
+ post.pred[i] <- rnorm(1,mean=mu[i],sd=sqrt(1/tau[i]))
+ }
> hist(prob=T,main="",xlab="Posterior predictive density",xlim=c(0,3),ylim=c(0,1.3),post.pred)
AMS 132/206 — Winter 2015
4
Athanasios Kottas