Chapter 10 Dimensions of Reinforcement Learning Objectives of

Chapter 10: Dimensions of Reinforcement Learning Objectives of this chapter: p Review the treatment of RL taken in this course p What have left out? p What are the hot research areas? R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

Three Common Ideas p Estimation of value functions p Backing up values along real or simulated trajectories p Generalized Policy Iteration: maintain an approximate optimal value function and approximate optimal policy, use each to improve the other R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 2

Backup Dimensions R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3

Other Dimensions p Function approximation n tables n aggregation n other linear methods n many nonlinear methods p On-policy/Off-policy n On-policy: learn the value function of the policy being followed n Off-policy: try learn the value function for the best policy, irrespective of what policy is being followed R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 4

Still More Dimensions p Definition of return: episodic, continuing, discounted, etc. p Action values vs. state values vs. afterstate values p Action selection/exploration: e-greed, softmax, more sophisticated methods p Synchronous vs. asynchronous p Replacing vs. accumulating traces p Real vs. simulated experience p Location of backups (search control) p Timing of backups: part of selecting actions or only afterward? p Memory for backups: how long should backed up values be retained? R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 5

Frontier Dimensions p Prove convergence for bootstrapping control methods. p Trajectory sampling p Non-Markov case: n Partially Observable MDPs (POMDPs) – Bayesian approach: belief states – construct state from sequence of observations n Try to do the best you can with non-Markov states p Modularity and hierarchies n Learning and planning at several different levels – Theory of options R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 6

More Frontier Dimensions p Using more structure n factored state spaces: dynamic Bayes nets n factored action spaces R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 7

Still More Frontier Dimensions p Incorporating prior knowledge n advice and hints n trainers and teachers n shaping n Lyapunov functions n etc. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 8