Stationary Distributions • If we simulate the chain long enough: What happens? Uncertainty accumulates Eventually, we have no idea what the state is! • Stationary distributions: For most chains, the distribution we end up in is independent of the initial distribution Called the stationary distribution of the chain Usually, can only predict a short time out
Example: Web Link Analysis
Mini-Viterbi Algorithm
Hidden Markov Models
HMM Applications
Filtering: Forward Algorithm
Filtering Example
MLE: Viterbi Algorithm
Viterbi Properties
Markov Decision Processes
MDP Solutions
Example Optimal Policies
Stationarity
How (Not) to Solve an MDP • The inefficient way: Enumerate policies Calculate the expected utility (discounte rewards) starting from the start state E. g. by simulating a bunch of runs Choose the best policy • We’ll return to a (better) idea like this later
Utilities of States
Infinite Utilities?
The Bellman Equation
Example: Bellman Equations
Value Iteration
Policy Iteration • Alternate approach: Policy evaluation: calculate utilities for a fixed policy Policy improvement: update policy based on resulting utilities Repeat until convergence • This is policy iteration Can converge faster under some conditions
Comparison • In value iteration: Every pass (or “backup”) updates both policy (based on current utilities) and utilities (based on current policy • In policy iteration: Several passes to update utilities Occasional passes to update policies • Hybrid approaches (asynchronous policy iteration): Any sequences of partial updates to either policy entries or utilities will converge if every state is visited infinitely often