5'38 Chapter 14. Probabilistic Reasoning where the last step follows because a transition from x' is guaranteed to occur. The transition probability q(x x') defined by the sampling step in Glans-ASK is actually a special case of the more general definition of Gibbs sampling, according to which each variable is sampled conditionally on the current values of all the other variables. We start by showing that this general definition of Gibbs sampling satisfies the detailed balance equation with a stationary distribution equal to P(x I e), (the true posterior distribution on the nonevidence variables). Then, we simply observe that, for Bayesian networks, sampling conditionally on all variables is equivalent to sampling conditionally on the variable's Markov blanket (see page 517). To analyze the general Gibbs sampler, which samples each Xi in turn with a transition probability (q i that conditions on all the other variables, we define X i to be these other variables (except the evidence variables); their values in the current state are K. If we sample a new value x: fur Xi conditionally on all the other variables, including the evidence, we have l xi ) = qi(x C) = P(x i e) Now we show that the transition probability for each step of the Gibbs sampler is in detailed balance with the true posterior: 7r(x)qi(x = P(xi x') = P(x e)P(Zi I e)P(x% I e)P(x i.I = P( i K, = 91- ( x r )qi(x 1 3,7 ' xi, (x: e) e) = P( xi , xi I e)P(x ii I K., e) e) (using the chain rule on the first term) (using the chain rule backward) s x) . We can think of the loop 'Tor each Z 1 in Z do" in Figure 14.16 as defining one large transition probability q that is the sequential composition qi o 4 2 o • • • o qi, of the transition probabilities for the individual variables. It is easy to show (Exercise 14.19) that if each of qi and qj has r as its stationary distribution, then the sequential composition qi o qj does too; hence the transition probability q for the whole loop has P(x e) as its stationary distribution. Finally, unless the CPTs contain probabilities of 0 or 1—which can cause the state space to become disconnected—it is easy to see that q is ergodic. Hence, the samples generated by Gibbs sampling will eventually be drawn from the true posterior distribution The final step is to show how to perform the general Gibbs sampling step—sampling Xi from P(X i e) —in a Bayesian network. Recall from page 517 that a variable is independent of all other variables given its Markov blanket; hence, P(x% I K, e) = P(x: rrib(X i )) , where rrib(Xi ) denotes the values of the variables in X i 's Markov blanket, 114B(Xi ). As shown in Exercise 14.7, the probability of a variable given its Markov blanket is proportional to the probability of the variable given its parents times the probability of each child given its respective parents: P(x: I mb(X. i )) = cr P(ra parents(X 4 )) x P(yj parents(Yj )) . (14.12) E Chi idren(Xi) Hence, to flip each variable Xi conditioned on its Markov blanket, the number of multiplications required is equal to the number of Xi's children. Section 14.6. Relational and First-Order Probability Models 539 Quail tri,B 1.) Hun estqC 1 ) C;;;D Qi y(B ) Kindness(C 1 ) Kindness(C 1 ) Recommend nion(C Rf mlnencl irion(r ,f11 1 ) .) Recommend col on(C (u) Remmmendwi on`,, C ) 430 (b) Figure 14.17 (a) Bayes net for a single customer C1 recommending a single book B. Honest (Ci) is Boolean, while the other variables have integer values from 1 to 5. (b) Hayes net with two customers and Iwo books_ 14.6 RELATIONAL AND FIRST-ORDER PROBABILITY MODELS In Chapter 8, we explained the representational advantages possessed by first-order logic in comparison to propositional logic. First-order logic commits to the existence of objects and relations among them and can express facts about some or all of the objects in a domain. This often results in representations that are vastly more concise than the equivalent propositional descriptions. Now, Bayesian networks are essentially propositional: the set of random variables is fixed and finite, and each has a fixed domain of possible values This fact limits the applicability of Bayesian networks. If we can ,find a way to combine probability theory with the expressive power of first order representations, we expect to he able to increase draMall- catty the range of problems that can be handled. For example, suppose that an online book retailer would like to provide overall evaluations of products based on recommendations received from its customers. The evaluation will take the form of a posterior distribution over the quality of the book, given the available evidence. The simplest solution to base the evaluation on the average recommendation, perhaps with a variance determined by the number of recommendations, but this fails to take into account the fact that some customers are kinder than others and some are less honest than others. Kind customers tend to give high recommendations even to fairly mediocre books, while dishonest customers give very high or very low recommendations for reasons other than quality—for example, they might work for a publisher. 6 For a single customer C 1 , recommending a single book B1, the Bayes net might look like the one shown in Figure 14.17(a). (Just as in Section 9.1, expressions with parentheses such as Honest(C1) are just fancy symbols—in this case, fancy names for random variables.) A game theorist would advise a dishonest customer to avoid detection by occasionally recommending a good book from a competitor. See Chapter 17. 6