Reasoning with Uncertainty Why important: biomedical Have you got Mexican Flu? Topics: Why is uncertainty important? How do we represent and reason with uncertain knowledge? Progress in research: 1980s: rule-based representation (MYCIN, Prospector) P (m, c, s) = 0.009215 P (m, c¯, s) = 0.000485 P (m, c, s¯) = 0.000285 1990s to present: network models (Bayesian networks – Munin) P (m, c¯, s¯) = 1.5 · 10−5 P (m, ¯ c, s) = 9.9 · 10−6 P (m, ¯ c¯, s) = 0.0098901 P (m, ¯ c, s¯) = 0.0009801 latest developments: integration of network models and logic (Markov logica) P (m, ¯ c¯, s¯) = 0.97912 M : mexican flu; C : chills; S : sore throat Probability of mexican flu given sore throat? – p. 2/42 – p. 1/42 Why important: embedded systems Why important: agents Control of behaviour of large production printer Agents (robots) perceive an incomplete image of the world using sensors that are inherently unreliable Partially observable state Speed v given available power P and required energy: – p. 3/42 – p. 4/42 Representation of uncertainty Rule-based uncertain knowledge Representation of uncertainty is clearly important! Early, simple approach – certainty-factor calculus: fever ∧ myalgia → fluCF=0.8 How to do it? For example rule based: e: evidence Example how its works: CF(fever, B) = 0.6; CF(myalgia, B) = 1 (B is background knowledge) h: hypothesis e1 ∧ · · · ∧ en → hx If e1 , e2 , . . . , en are true (observed), then conclusion h is true with certainty x Combination functions: CF(flu, {fever, myalgia} ∪ B) = 0.8 · max{0, min{CF(fever, B), CF(myalgia, B)}} = 0.8 · max{0, min{0.6, 1}} = 0.48 How to proceed when ei , i = 1, . . . , n are uncertain? ⇒ uncertainty propagation/inference/reasoning – p. 5/42 Propagation rules Propagation rules (continued) Define for this theory combination functions f∧ , f∨ , fprop , fco , where ··· fprop (x, y): propagation of uncertain evidence ex to h given e → hy You need some basic ‘theory’ (e.g. certainty-factor calculus) Define for this theory combination functions f∧ , f∨ , fprop , fco , where f∧ (x, y): combine uncertainty w.r.t. x and y using the ∧-rule f∨ (x, y): similar for ∨ fco (x, y): for two co-concluding (same conclusion) rules e1 → hx e.g. contact chicken → flu0.01 Examples: Certainty-factor model (Mycin) Subjective Bayesian method (Prospector) – not discussed and e2 → hy – p. 6/42 Ishizuka’s application of Dempster-Shafer theory – not discussed e.g. train contact humans → flu0.1 – p. 7/42 – p. 8/42 Certainty factor calculus Combination functions fprop rule e → hCF(h,e) , and Developed by E.H. Shortliffe and B.G. Buchanan for rule-based expert systems uncertain evidence w.r.t. e, i.e. CF(e, e′ ) (e′ includes all evidence so far) then: CF(h, e′ ) = CF(h, e) · max{0, CF(e, e′ )} There is a relationship to probability theory Applied in MYCIN, the expert system for the diagnosis of infectious disease f∧ (or f∨ ) rule: e1 ∧ e2 → hCF(h,e) with uncertain evidence CF(e1 , e′ ) and CF(e2 , e′ ) then: Certainty factors (CFs): subjective estimates of uncertainty with CF(x, y) ∈ [−1, 1] (CF(x, y) = −1 false, CF(x, y) = 0 unknown, and CF(x, y) = 1 true) CF-calculus offers fill-in for combination functions: f∧ , f∨ , fprop , fco CF(e1 ∧ e2 , e′ ) = min{CF(e1 , e′ ), CF(e2 , e′ )} CF(e1 ∨ e2 , e′ ) = max{CF(e1 , e′ ), CF(e2 , e′ )} – p. 9/42 – p. 10/42 Example Combination functions fco : two rules: e1 → hCF(h,e1 ) , and e2 → hCF(h,e2 ) R = {R1 : flu → feverCF(fever,flu)=0.8 , R2 : common-cold → feverCF(fever,common-cold)=0.3 } Evidence: CF(flu, e′ ) = 0.6 and CF(common-cold, e′ ) = 1 For rule R1 : uncertain evidence CF(e1 , e′ ) and CF(e2 , e′ ) CF(fever, e′1 ) = CF(fever, flu) · max{0, CF(flu, e′ )} x + y(1 − x) if x, y > 0 x+y ′ ′ CF(h, e1 co e2 ) = 1−min{|x|,|y|} if −1 < xy ≤ 0 x + y(1 + x) if x, y < 0 = 0.8 · 0.6 = 0.48 for rule R2 this yields CF(fever, e′2 ) = 0.3 Application of fco : with CF(h, e′1 ) = x and CF(h, e′2 ) = y (using fprop ) CF(fever, e′1 co e′2 ) = CF(fever, e′1 ) + CF(fever, e′2 )(1 − CF(fever, e′1 )) = 0.48 + 0.3(1 − 0.48) = 0.636 – p. 11/42 – p. 12/42 Bayesian networks However · · · fever ∧ myalgia → fluCF=0.8 (my) (yes/no) myalgia How likely is the occurrence of fever or myalgia given that the patient has flu? flu (fl) (yes/no) How likely is the occurrence of fever or myalgia in the absence of flu? (fe) (yes/no) fever (pn) (yes/no) pneumonia How likely is the presence of flu when just fever is present? temp (≤ 37.5/> 37.5) How likely is the presence of no flu when just fever is present? – p. 14/42 – p. 13/42 Specification of probabilities Specification of probabilities 2 Without independences: P (temp, fe, my, fl, pn) = P (temp|fe, my, fl, pn) × P (fe|my, fl, pn) ×P (my|fl, pn) × P (fl|pn) × P (pn) P (temp, fe, my, fl, pn) P (my = y|fl = y) = 0.96 P (my = y|fl = n) = 0.20 (my) (yes/no) myalgia Number of probabilities: 31 (×2) With independences: P (temp, fe, my, fl, pn) = P (temp|fe) × P (fe|fl, pn) × P (my|fl) × P (fl) × P (pn) P (fl = y) = 0.1 (fl) (yes/no) flu P (pn = y) = 0.05 General: P (X1 , . . . , Xn ) = (π(Xi ): parents of node Xi ) i=1 P (Xi P (fe = y|fl = y, pn = n) = 0.88 P (fe = y|fl = n, pn = n) = 0.001 Number of probabilities: 10 (×2) Qn P (fe = y|fl = y, pn = y) = 0.95 P (fe = y|fl = n, pn = y) = 0.80 (pn) (yes/no) pneumonia | π(Xi )) (fe) (yes/no) fever temp (≤ 37.5/> 37.5) P (temp ≤ 37.5|fe = y) = 0.1 P (temp ≤ 37.5|fe = n) = 0.99 – p. 15/42 – p. 16/42 Inference: evidence propagation Inference: evidence propagation Nothing known: Nothing known: MYALGIA MYALGIA NO YES NO YES FLU FLU NO YES NO YES FEVER TEMP FEVER <=37.5 >37.5 NO YES TEMP <=37.5 >37.5 NO YES PNEUMONIA PNEUMONIA NO YES NO YES Temperature > 37.5 grades Celcius: Which symptoms belong to flu? MYALGIA MYALGIA NO YES NO YES FLU FLU NO YES NO YES FEVER NO YES TEMP FEVER <=37.5 >37.5 TEMP <=37.5 >37.5 NO YES PNEUMONIA PNEUMONIA NO YES NO YES – p. 17/42 Probabilistic reasoning – p. 18/42 Probabilistic reasoning: general We are interested in conditional probabilities: Joint probability distribution P (V ): P (Xi | E) = P E (Xi ) P (V ) = P (X1 , X2 , . . . , Xn ) marginalisation: X Y X P (X | π(X)) P (V ) = P (W ) = for (possibly empty) evidence E Examples V \W P (FLU = yes | TEMP < 37.5) V \W X∈V Conditional probabilities and Bayes’ rule: P (FLU = yes | MYALGIA = yes, TEMP < 37.5) P (Y | X, Z) = Mostly (marginal) probability distributions of single variables P (Y, X, Z) P (X, Z | Y )P (Y ) = P (X, Z) P (X, Z) There are many efficient algorithms – p. 19/42 – p. 20/42 Have you got Mexican Flu? P (m, c, s) = 0.009215 P (m, c¯, s) = 0.000485 P (m, c, s¯) = 0.000285 P (m, c¯, s¯) = 1.5 · 10−5 −6 P (m, ¯ c, s) = 9.9 · 10 P (m, ¯ c¯, s) = 0.0098901 P (m, ¯ c, s¯) = 0.0009801 Have you got Mexican Flu? P (m, c, s) = 0.009215 P (m, c¯, s) = 0.000485 P (m, c, s¯) = 0.000285 P (m, ¯ c¯, s¯) = 0.97912 M : mexican flu; C : chills; S : sore throat P (m, ¯ c¯, s¯) = 0.97912 M : mexican flu; C : chills; S : sore throat P (m, c¯, s¯) = 1.5 · 10−5 Probability of mexican flu and sore throat? Probability of mexican flu and sore throat? 0.0097 −6 P (m, ¯ c, s) = 9.9 · 10 P (m, ¯ c¯, s) = 0.0098901 P (m, ¯ c, s¯) = 0.0009801 Probability of mexican flu given sore throat? Probability of mexican flu given sore throat? 0.495 – p. 21/42 – p. 21/42 Have you got Mexican Flu? Naive probabilistic reasoning How did we compute these probabilities? X1 y/n Marginalisation: P (m, s) = P (m, c, s) + P (m, c¯, s) = 0.009215 + 0.000485 = 0.0097 X2 y/n X3 y/n X4 y/n Conditional probability: P (m | s) = P (m, s)/P (s) = 0.0097/0.0196 = 0.495 P E (x2 ) = P (x2 | x4 ) = P (x4 |x2 )P (x2 ) P (x4 ) P (x4 | x3 ) = 0.4 P (x4 | ¬x3 ) = 0.1 P (x3 | x1 , x2 ) = 0.3 P (x3 | ¬x1 , x2 ) = 0.5 P (x3 | x1 , ¬x2 ) = 0.7 P (x3 | ¬x1 , ¬x2 ) = 0.9 P (x1 ) = 0.6 P (x2 ) = 0.2 (Bayes’ rule) P P P (x4 |X3 ) X1 P (X3 |X1 , x2 )P (X1 )P (x2 ) P =P ≈ 0.14 X1 ,X2 P (X3 | X1 , X2 )P (X1 )P (X2 ) X3 P (x4 | X3 ) X3 – p. 22/42 – p. 23/42 Judea Pearl’s algorithm G1 V1 V2 π(V1 ) π(V2 ) λ(V1 ) π(V0 ) Probabilistic interpretation CF calculus? Rule-based uncertainty: e → hx propagation from antecedent e to conclusion h (fprop ) combination of ∧ and ∨ evidence in e (f∧ and f∨ ) co-concluding rules (fco ): G2 λ(V2 ) V0 π(V 0) e1 → hx e2 → hy λ(V0 ) λ(V0 ) G3 V3 V4 G4 Bayesian networks: joint probabilityP distribution P (X1 , . . . , Xn ) with marginalisation Y P (Y, Z) and conditioning P (Y | Z) Object-oriented: nodes are objects that contain local information and carry out local computations Updating via message passing: arrows are communication channels – p. 24/42 – p. 25/42 Propagation Co-concluding fprop (propagation): fco (co-concluding): CF(e, e′ ) e′ CF(h, e) e h h e′2 CF(h, e′ ) = CF(h, e) · max{0, CF(e, e′ )} corresponding Bayesian network (with P (e′ ) extra): E′ P (E | E ′) E P (H | E) H E1′ NB: P (h | = P (h | | e′ ) + P (h | ¬e, e′ )P (¬e I1 H ⇒ P (h | ¬e) = 0 (assumption of CF-model) e, e′ )P (e CF(h, e′2 ) idea: see this as uncertain deterministic interaction ⇒ causal independence model P (h | e′ ) = P (h | e)P (e | e′ ) + P (h | ¬e)P (¬e | e′ ) e′ ) CF(h, e′1 ) e′1 | E2′ e′ ) – p. 26/42 function f I2 – p. 27/42 Causal Independence C1 C2 Example: noisy OR Cn ... C1 C2 I1 I2 conditional independence I1 I2 f P (e | C1 , . . . , Cn ) = ... E In interaction function X P (e | I1 , . . . , In ) I1 ,...,In = X n Y OR Interactions between ‘causes’: logical OR P (Ik | Ck ) k=1 n Y E Meaning: presence of the intermediate causes Ik produces effect e (i.e. E = true ) X Y P (e|C1 , C2 ) = P (e|I1 , I2 ) P (Ik | Ck ) P (Ik | Ck ) f (I1 ,...,In )=e k=1 I1 ∨I2 =e Boolean functions: P (E | I1 , . . . , In ) ∈ {0, 1} with f (I1 , . . . , In ) = 1 if P (e | I1 , . . . , In ) = 1 k=1,2 = P (i1 |C1 )P (i2 |C2 ) + P (¬i1 |C1 )P (i2 |C2 ) +P (i1 |C1 )P (¬i2 |C2 ) – p. 29/42 – p. 28/42 Noisy OR and fco Example fco : The consequences of ‘flu’ and ‘common cold’ on ‘fever’ are modelled by the variables I1 and I2 : P (i1 | flu) = 0.8, and P (i2 | common-cold) = 0.3 CF(h, e′1 co e′ ) = CF(h, e′1 ) + CF(h, e′2 )(1 − CF(h, e′1 )) for CF(h, e′1 ) ∈ [0, 1] and CF(h, e′2 ) ∈ [0, 1] causal independence with logical OR (noisy OR): X Y P (e|C1 , C2 ) = P (e|I1 , I2 ) P (Ik | Ck ) I1 ∨I2 =e Furthermore, P (ik | w) = 0, k = 1, 2, if w ∈ {¬flu, ¬common-cold} Interaction between FLU and COMMON - COLD as noisy-OR: ( 0 if I1 = false and I2 = false P (fever | I1 , I2 ) = 1 otherwise k=1,2 = P (i1 |C1 )P (i2 |C2 ) + P (¬i1 |C1 )P (i2 |C2 ) +P (i1 |C1 )P (¬i2 |C2 ) = P (i1 |C1 ) + P (i2 |C2 )(1 − P (i1 |C1 )) – p. 30/42 – p. 31/42 Result Conclusions Bayesian network: FLU FALSE TRUE 0.400 0.600 Bayesian networks and other probabilistic graphical models (Markov networks, chain graphs) are the state of the art for reasoning with uncertainty COMMON-COLD FALSE 0.000 TRUE 1.000 I1 Earlier rule-based uncertainty reasoning can be mapped (at least particially) to Bayesian networks I2 FALSE TRUE 0.520 0.480 FALSE TRUE 0.700 0.300 Imposed restrictions of early methods become apparent, and new ideas for modelling may emerge FEVER FALSE TRUE 0.364 0.636 Fragment CF model: CF(fever, e′1 co e′2 ) = CF(fever, e′1 ) + CF(fever, e′2 )(1− CF(fever, e′1 )) = 0.48 + 0.3(1 − 0.48) = 0.636 – p. 32/42 Logic and graphical models Decomposition Structure of a joint probability distribution P can also be described by undirected graphs (instead of directed graphs as in Bayesian networks) X1 X4 X2 X3 – p. 33/42 Probability distribution of a Markov network: X 1Y 1 P (V ) = φk (X{k} ) = exp wk fk (V ) Z Z k X7 k X{k} ⊆ V , wk ∈ R a weight and feature fk (V ) ∈ {0, 1} X5 The functions φk are called potentials When a Markov network is chordal (i.e. given three subsequent connected nodes a − b − c, then a and c are always connected by a link) X6 Together with P (V ) = P (X1 , X2 , X3 , X4 , X5 , X6 , X7 ): Markov network φk (X{k} ) = P (Rk | Sk ) Marginalisation (example): X P (X1 , ¬x2 , X3 , X4 , X5 , X6 , X7 ) P (¬x2 ) = with Rk called the residual and Sk the separator, i.e. φ is a conditional probability distribution (and Z = 1) X1 ,X3 ,X4 ,X5 ,X6 ,X7 – p. 34/42 – p. 35/42 Markov logic network (MLN) Semantics of MLN C = {c1 , . . . , cn } is a set of constants, then: corresponding Markov network ML,C : A Markov logic net (MLN) is a set of pairs: L = {(Fk , wk ) | k = 1, . . . , n} where Fk is formula in first-order logic and wk ∈ R ML,C contains a node with corresponding binary variable for each ground atom in BL (Herband base) Example (smoking causes cancer; if one friend smokes, the other smokes as well): the value of the variable is 1 if the ground atom is true; otherwise 0 ML,C contains a complete graph with feature fk for each instance of formula Fk 0.8 ∀x(S(x) → C(x)) 0.3 ∀x∀y(F (x, y) → (S(x) ↔ S(y))) For the probability distribution it holds that: X 1Y 1 φk (X{k} )nk (V ) wk nk (V ) = P (V ) = exp Z Z where S : Smoking C : Cancer F : Friends k k where nk (V ) is the number of instances of Fk based on V – p. 37/42 – p. 36/42 Example Alchemy system Input: C = {A,B} Friends(C,C) Smoking(C) Cancer(C) 0.8 Smoking(x) => Cancer(x) 0.3 Friends(x,y) => (Smoking(x) <=> Smoking(y)) Formula F ≡ w ∀x(S(x) → C(x)) with S ‘smoking’ and C ‘cancer’ weight w Constants C = {a, b} (interpretations of x) Some combined interpretations of F (worlds): {S(a), C(a), S(b), C(b)} 2 models {S(a), ¬C(a), S(b), ¬C(b)} 0 models {S(a), ¬C(a), S(b), C(b)} 1 model .. . 1 2 models P (S(a), C(a), S(b), C(b)) = ew2 Z Markov network: S(a) C(a) S(b) Output: Friends(A,A) 0.484002 Friends(A,B) 0.476002 Friends(B,A) 0.458004 Friends(B,B) 0.519998 Cancer(A) 0.569993 Cancer(B) 0.566993 C(b) – p. 38/42 – p. 39/42 Alchemy system Exact calculations Input: C = {A,B} Friends(C,C) Smoking(C) Cancer(C) 0.8 Smoking(x) => Cancer(x) 0.3 Friends(x,y) => (Smoking(x) <=> Smoking(y)) Initializing MC-SAT with MaxWalksat on hard clauses... Running MC-SAT sampling... Sample (per pred) 100, time elapsed = 0 secs, num. clauses = 6 Done burning. 100 samples. Sample (per pred) 100, time elapsed = 0.01 secs, num. clauses = 6 Sample (per pred) 200, time elapsed = 0.01 secs, num. clauses = 6 Sample (per pred) 300, time elapsed = 0.01 secs, num. clauses = 6 Sample (per pred) 400, time elapsed = 0.02 secs, num. clauses = 6 Sample (per pred) 500, time elapsed = 0.02 secs, num. clauses = 6 Sample (per pred) 600, time elapsed = 0.02 secs, num. clauses = 6 Sample (per pred) 700, time elapsed = 0.03 secs, num. clauses = 6 Sample (per pred) 800, time elapsed = 0.03 secs, num. clauses = 6 Sample (per pred) 900, time elapsed = 0.03 secs, num. clauses = 6 Sample (per pred) 1000, time elapsed = 0.04 secs, num. clauses = 6 Done MC-SAT sampling. 1000 samples. P (Cancer(A)) ≡ P (C(A)): P (C(A)) = P (S(A), C(A)) + P (¬S(A), C(A)) Z = 1 + 3 exp 0.8, as {S(A), ¬C(A)} inconsistent ⇒ P (C(A)) = 2 Z1 exp(0.8 · 1) ≈ 4.45 7.68 ≈ 0.58 P (Friends(A, A)) = 2 Z1 exp(0.3 · 1), with Z = 4 exp(0.3 · 1), since 4 A, A models ⇒ P (Friends(A, A)) = 0.5 – p. 40/42 Summary Early rule-based (logical) approach to reasoning with uncertainty was attractive However, these had severe limitations (which ones?) Therefore: exploitation of probability theory however, probability theory has also limitations (which ones?) “Best of all worlds” (Leibniz): probabilistic logics (Markov logic, Bayesian logic, etc.) – p. 42/42 – p. 41/42
© Copyright 2024