Hidden Markov Models HMMs probabilistic models for learning
• Hidden Markov Models (HMMs) – probabilistic models for learning patterns in sequences (e. g. DNA, speech, weather, cards. . . ) (2 nd order model)
• an observable Markov model – directly get sequence of states – p(s 1, s 2. . . sn|q)=p(s 1) Pi=2. . n p(si|si-1) – (why I don’t like the Urn example in the book) • Hidden Markov model – only observe sequence of symbols generated by states – for each state, there is a probability distribution over finite set of symbols (emission probabilities) – example: think of soda machine • • • observations: messages display (“insert 20 cents more”), output can, give change states: coins inserted so far add up to N cents. . . state transitions are determined by coins input
• tasks 1) given a sequence, compute the probability it came from one of a set of models (e. g. most likely phoneme) – classification 2) infer the most likely sequence of states underlying sequence of symbols find Q* such that: 3) train the HMM by learning the parameters (transition probabilities) from a set of examples given seqs X, find l* such that
• given an observation sequence O=o 1. . . o. T – if we also knew state seq Q=q 1. . q. T, then we could easily calculate p(O|Q, l) – joint probability: • p(O, Q|l)=p(q 1) · Pi=2. . T p(qi|qi-1) · Pi=1. . T p(oi|qi) – could calculate by marginalization • p(O|l) = SQ p(O, Q|l) • intractable, have to sum over all possible sequences Q – the forward-backward algorithm is a recursive procedure that solves this efficiently (via dynamic programming)
• Forward variable: at(i) is the probability of observing prefix o 1. . ot and ending in state qi . . . Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2 e © The MIT Press (V 1. 0) 5
• Backward variable: bt(i) is the probability of being in state qi at time t and observing suffix ot+1. . o. T. . . Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2 e © The MIT Press (V 1. 0) 6
Forward-backward algorithm, O(N 2 T) forward pass: for each time step i=1. . T calculate a(i) by summing over all predecessor states j reverse pass for each time step i=T. . 1 calculate b(i) by summing over all successor states j
• • • A 01 function Forward. Backward( O, S, π, A, B ) : returns p(O|π, A, B) A 02 for each state si do A 03 a 1(i)←πi*Bi(O 1) A 04 end for A 05 for i← 2, 3, . . . , T do A 06 for each state sj do A 07 ai(j)←Sk (ai-1(k)*Akj*Bj(Oi)) A 08 end for A 09 end for • • A 10 b. T(i)← 1 A 11 for i←T-1, . . . , 1 do A 12 for each state sj do A 13 bi(j)←Sk (Ajk*Bk(Oi+1)*bi+1(k)) A 14 end for A 15 end for A 16 return Si a. T(i) A 17 end function // b is not needed for output, but is often computed for other purposes
Finding the State Sequence Choose the state that has the highest probability, for each time step: qt*= arg maxi γt(i) No! Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2 e © The MIT Press (V 1. 0) 9
Viterbi’s Algorithm δt(i) ≡ maxq 1 q 2∙∙∙ qt-1 p(q 1 q 2∙∙∙qt-1, qt =Si, O 1∙∙∙Ot | λ) • Initialization: δ 1(i) = πibi(O 1), ψ1(i) = 0 • Recursion: δt(j) = maxi δt-1(i)aijbj(Ot), ψt(j) = argmaxi δt-1(i)aij ψt(j) – note: I think the book has wrong formula for ψt(j) • Termination: p* = maxi δT(i), q. T*= argmaxi δT (i) • Path backtracking: qt* = ψt+1(qt+1* ), t=T-1, T-2, . . . , 1 10
• • • A 01 function VITERBI( O, S, π, A, B ) : returns state sequence q 1*. . q. T* A 02 for each state si do A 03 d 1(i)←πi*Bi(O 1) A 04 y 1(i)← 0 A 05 end for A 06 for i← 2, 3, . . . , T do A 07 for each state sj do A 08 di(j)←maxk (di-1(k)*Akj*Bj(Oi)) A 09 yi(j)←argmaxk (di-1(k)*Akj*Bj(Oi)) A 10 end for A 11 end for // traceback, extract sequence of states • • A 12 p*←maxi d. T(i) A 13 q. T*← argmaxi d. T(i) A 14 for i←T-1, T-2, . . . , 1 do A 15 qi*=yj+1(qi+1*) A 16 end for A 17 return q 1*. . q. T* A 19 end function
Learning learn model parameters (transition aij and emission probabilities bij with highest likelihood for a given set of training examples define xt(i, j) as prob of being in si at time t and sj at time t+1, given sequence of observations O define latent variables zjt and zijt as indicators of which state a sequence passes through at each time step Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2 e © The MIT Press (V 1. 0) 12
Baum-Welch (EM) expectation of transition recall, git = ait bit , prob of being in state i at time t Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2 e © The MIT Press (V 1. 0) 13
- Slides: 13