CHAPTER 9 Hidden Markov Models cont Markov Decision

  • Slides: 27
Download presentation
CHAPTER 9 Hidden Markov Models (cont. ) Markov Decision Processes

CHAPTER 9 Hidden Markov Models (cont. ) Markov Decision Processes

Markov Models

Markov Models

Conditional Independence

Conditional Independence

Weather Example

Weather Example

Mini-Forward Algorithm

Mini-Forward Algorithm

Example

Example

Stationary Distributions • If we simulate the chain long enough: What happens? Uncertainty accumulates

Stationary Distributions • If we simulate the chain long enough: What happens? Uncertainty accumulates Eventually, we have no idea what the state is! • Stationary distributions: For most chains, the distribution we end up in is independent of the initial distribution Called the stationary distribution of the chain Usually, can only predict a short time out

Example: Web Link Analysis

Example: Web Link Analysis

Mini-Viterbi Algorithm

Mini-Viterbi Algorithm

Hidden Markov Models

Hidden Markov Models

HMM Applications

HMM Applications

Filtering: Forward Algorithm

Filtering: Forward Algorithm

Filtering Example

Filtering Example

MLE: Viterbi Algorithm

MLE: Viterbi Algorithm

Viterbi Properties

Viterbi Properties

Markov Decision Processes

Markov Decision Processes

MDP Solutions

MDP Solutions

Example Optimal Policies

Example Optimal Policies

Stationarity

Stationarity

How (Not) to Solve an MDP • The inefficient way: Enumerate policies Calculate the

How (Not) to Solve an MDP • The inefficient way: Enumerate policies Calculate the expected utility (discounte rewards) starting from the start state E. g. by simulating a bunch of runs Choose the best policy • We’ll return to a (better) idea like this later

Utilities of States

Utilities of States

Infinite Utilities?

Infinite Utilities?

The Bellman Equation

The Bellman Equation

Example: Bellman Equations

Example: Bellman Equations

Value Iteration

Value Iteration

Policy Iteration • Alternate approach: Policy evaluation: calculate utilities for a fixed policy Policy

Policy Iteration • Alternate approach: Policy evaluation: calculate utilities for a fixed policy Policy improvement: update policy based on resulting utilities Repeat until convergence • This is policy iteration Can converge faster under some conditions

Comparison • In value iteration: Every pass (or “backup”) updates both policy (based on

Comparison • In value iteration: Every pass (or “backup”) updates both policy (based on current utilities) and utilities (based on current policy • In policy iteration: Several passes to update utilities Occasional passes to update policies • Hybrid approaches (asynchronous policy iteration): Any sequences of partial updates to either policy entries or utilities will converge if every state is visited infinitely often