CHAPTER 9 Hidden Markov Models cont Markov Decision

Stationary Distributions • If we simulate the chain long enough: What happens? Uncertainty accumulates

How (Not) to Solve an MDP • The inefficient way: Enumerate policies Calculate the

Policy Iteration • Alternate approach: Policy evaluation: calculate utilities for a fixed policy Policy

Comparison • In value iteration: Every pass (or “backup”) updates both policy (based on

Slides: 27

Download presentation

CHAPTER 9 Hidden Markov Models (cont. ) Markov Decision Processes

Markov Models

Conditional Independence

Weather Example

Mini-Forward Algorithm

Example

Stationary Distributions • If we simulate the chain long enough: What happens? Uncertainty accumulates Eventually, we have no idea what the state is! • Stationary distributions: For most chains, the distribution we end up in is independent of the initial distribution Called the stationary distribution of the chain Usually, can only predict a short time out

Example: Web Link Analysis

Mini-Viterbi Algorithm

Hidden Markov Models

HMM Applications

Filtering: Forward Algorithm

Filtering Example

MLE: Viterbi Algorithm

Viterbi Properties

Markov Decision Processes

MDP Solutions

Example Optimal Policies

Stationarity

How (Not) to Solve an MDP • The inefficient way: Enumerate policies Calculate the expected utility (discounte rewards) starting from the start state E. g. by simulating a bunch of runs Choose the best policy • We’ll return to a (better) idea like this later

Utilities of States

Infinite Utilities?

The Bellman Equation

Example: Bellman Equations

Value Iteration

Policy Iteration • Alternate approach: Policy evaluation: calculate utilities for a fixed policy Policy improvement: update policy based on resulting utilities Repeat until convergence • This is policy iteration Can converge faster under some conditions

Comparison • In value iteration: Every pass (or “backup”) updates both policy (based on current utilities) and utilities (based on current policy • In policy iteration: Several passes to update utilities Occasional passes to update policies • Hybrid approaches (asynchronous policy iteration): Any sequences of partial updates to either policy entries or utilities will converge if every state is visited infinitely often