Ch 13 - Oncourse

Machine Learning: Probabilistic
13.0
Stochastic and dynamic Models of
Learning
13.3
Stochastic Extensions to Reinforcement
Learning
13.1
Hidden Markov Models (HMMs)
13.4
Epilogue and References
13.2
Dynamic Bayesian Networks and
Learning
13.5
Exercises
George F Luger
ARTIFICIAL INTELLIGENCE 6th edition
Structures and Strategies for Complex Problem Solving
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
1
DEFINITION
HIDDEN MARKOV MODEL
A graphical model is called a hidden Markov model (HMM) if it is a
Markov model whose states are not directly observable but are hidden
by a further stochastic system interpreting their output. More formally,
given a set of states S = s1, s2, ..., sn, and given a set of state transition
probabilities A = a11, a12, ..., a1n, a21, a22, ..., ..., ann, there is a set of
observation likelihoods, O = pi(ot), each expressing the probability of
an observation ot (at time t) being generated by a state st.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
2
a12 = 1 - a 11
a11
S1
S2
a22
a21 = 1 - a 22
p(H) = b1
p(H) = b2
p(T) = 1 - b 1
p(T) = 1 - b 2
Figure 13.1
A state transition diagram of a hidden Markov model of two
states designed for the coin flipping problem. The a ij are
determined by the elements of the 2 x 2 transition matrix,A.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
3
p(H) = b2
p(H) = b2
S2
a12
p(H) = b1
p(T) = 1 - b 1
a11
Figure 13.2
S1
a22
a21
a32
a23
a13
S3
p(H) = b3
p(T) = 1 - b 3
a33
a31
The state transition diagram for a three state hidden Markov
model of coin flipping. Each coin/state, Si, has its bias,bi.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
4
a.
Figure 13.3.
b.
a. The hidden, S, and observable, O, states of the AR-HMM where
p(O t | St , O t -1).
b. The values of the hidden St of the example AR-HMM:s afe, unsafe,
and faulty.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
5
Figure 13.4 A selection of the real-time data across multiple time
periods, with one time slice expanded, f or the AR-HMM.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
6
Fig ure 1 3.5 The tim e-series data of Fig ure 1 3.4 processe d by a f ast
Four ier trans form o n th e fre quen cy do main .Th is was
the data su bmitted to the AR-HMM f or e ach tim e pe riod .
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
7
Fig ure 1 3.6 An au to re gress ive factor ial HMM, wh ere the o bserv able state Ot,
at time t is depe nden t on multiple (St) su bprocess, S it, a nd O t-1.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
8
neat
.00013
n
iy
.48
t
.52
need
.00056
n
iy
Sta rt
#
d
.11
new
.001
n
knee
.000024
n
Fi gure 13.7
.89
.36
.64
End
#
iy
um
iy
A PFSM repres enti ng a set of phon eticall y related En glis h
words . Th e pro babi lity o f ea ch wo rd occu rring is b elow that
word. Ad apte d fro m Jurasky and Martin (200 8).
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
9
Start = 1.0
#
n
iy
#
neat
.00013
2 paths
1.0
1.0 x .00013 = .00013
.00013 x 1.0 = .00013
.00013 x .52 = .000067
need
.00056
2 paths
1.0
1.0 x .00056 = .00056
.00056 x 1.0 = .00056
.00056 x .11 = .000062
new
.001
2 paths
1.0
1.0 x .001 = .001
.001 x .36 = .00036
.00036 x 1.0 = .00036
knee
.000024
1 path
1.0
1.0 x .000024 = .000024 .000024 x 1.0 = .000024 .000024 x 1.0 = .000024
end
Total best
.00036
Figure 13.8 A trace of the Viterbi algorithm on several of the paths of Figure 13.7. Rows
report the maximum value for Viterbi on each word for each input value (top row).
Adapted from Jurafsky and Martin (2008).
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
10
function Viterbi(Observations of length T, Probabilistic FSM)
begin
number := number of states in FSM
create probability matrix viterbi[R = N + 2, C = T + 2];
viterbi[0, 0] := 1.0;
for each time step (observation) t from 0 to T do
for each state si from i = 0 to number do
for each transition from si to sj in the Probabilistic FSM do
begin
new-count := viterbi[si, t] x path[si, sj] x p(sj | si);
if ((viterbi[sj, t + 1] = 0) or (new-count > viterbi[sj, t + 1]))
then
begin
viterbi[si, t + 1] := new-count
append back-pointer [sj , t + 1] to back-pointer list
end
end;
return viterbi[R, C];
return back-pointer list
end.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
11
Fig ure 1 3.9
A DBN examp le o f two time sli ces .The set Q of rand om
variab les are hid den, the set O ob served, t i ndicate s ti me.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
12
Fi gure 13.1 0 A Ba ye sian bel ief net for the b urgla r ala rm, e arthq uake,
bu rglary e xa mple.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
13
Fig ure 1 3.11
A Markov r and om field refl ecting the pote ntia l fu nctions of the
r and om variables in the BBN of Figure 13.10, toge ther wi th the
two o bservati ons abou t th e system.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
14
Fi gure 13.1 2.
L
u
A l earna ble node Li s ad ded to the Markov ran dom fiel d of Fig ure
13 .11. The Markov rand om field ite rates across three tim e pe riods .
Fo r sim plicity, the EM itera tion is only i ndicate d at tim e 1.
DEFINITION
A MARKOV DECISION PROCESS, or MDP
A Markov Decision Process is a tuple <S, A, P, R> where:
S is a set of states, and
A is a set of actions.
pa(st , st+1) = p(st+1 | st , at = a) is the probability that if the agent executes
action a ΠA from state st at time t, it results in state st+1 at time t+1. Since
the probability, pa ΠP is defined over the entire state-space of actions, it is
often represented with a transition matrix.
R(s) is the reward received by the agent when in state s.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
16
DEFINITION
A PARTIALLY OBSERVABLE MARKOV DECISION PROCESS, or POMDP
A Partially Observable Markov Decision Process is a tuple <S, A, O, P, R> where:
S is a set of states, and
A is a set of actions.
O is the set of observations denoting what the agent can see about its world.
Since the agent cannot directly observe its current state, the observations are
probabilistically related to the underlying actual state of the world.
pa(st, o, st+1) = p(st+1 , ot = o | st , at = a) is the probability that when the agent
executes action a from state st at time t, it results in an observation o that
leads to an underlying state st +1 at time t+1.
R(st, a, st+1) is the reward received by the agent when it executes action a in
state st and transitions to state st+1.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
17
st
st+1
at
high
high
low
low
high
high
low
low
low
low
high
low
high
low
high
low
high
low
high
low
search
search
search
search
wait
wait
wait
wait
recharge
recharge
Table 13.1.
pa (st,st+1)
a
1-a
1-b
b
1
0
0
1
1
0
R a(st,st+1)
R_search
R_search
-3
R_search
R_wait
R_wait
R_wait
R_wait
0
0
Tran siti on p robab ilities and xepected reward s for the fini te MDP of
th e recycling robo t examp le. The ta ble co ntai ns a ll p ossi ble co mbin atio ns o f th e current sta te, s t, next s tate , s t+1, the actions and
re wa rds p ossi ble from the curre nt s tateat.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
18
Fi gure 13.1 3. Th e tran siti on grap h for the recycling robo t. Thestate node s are
th e la rge circles and the action node s are the sma lllabck sta tes.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
19