Artificial Intelligence Representation and Problem Solving Sequential Decision

Artificial Intelligence: Representation and Problem Solving Sequential Decision Making (4): Active Reinforcement Learning 15 -381 / 681 Instructors: Fei Fang (This Lecture) and Dave Touretzky feifang@cmu. edu Wean Hall 4126

Recap � Know exactly how the world works Don’t know how the world works 2 Fei Fang 11/30/2020

Outline � 3 Fei Fang 11/30/2020

Model-Based Active RL with Random Actions � 4 Fei Fang 11/30/2020

Outline � 5 Fei Fang 11/30/2020

Q-Value � 6 Fei Fang 11/30/2020

Optimal Q-Value � 7 Fei Fang 11/30/2020

Outline � 8 Fei Fang 11/30/2020

SARSA � 9 Fei Fang 11/30/2020

On-Policy vs Off-Policy Methods �Two types of RL approaches: � On-policy methods attempt to evaluate or improve the policy that is used to make decisions � Off-policy methods evaluate or improve a policy different from that used to generate the data 10 Fei Fang 11/30/2020

Outline � 11 Fei Fang 11/30/2020

Q-Learning � 12 Fei Fang 11/30/2020

Q-Learning 13 Off policy algorithm: the policy is being evaluated (estimation policy) is unrelated to the policy being followed (behavior policy) Fei Fang 11/30/2020

Q-Learning Example a 23 a 12 S 1 a 41 a 14 a 45 S 4 S 2 a 21 a 32 a 25 S 3 a 36 a 52 a 54 S 5 a 56 S 6: END 15 Fei Fang 11/30/2020

Q-Learning Example a 23 a 12 S 1 a 41 S 4 a 21 a 14 a 45 S 2 a 32 a 25 S 3 a 36 a 52 a 54 S 5 a 56 S 6: END Get reward 0, get to state S 2 Update state-value function 16 Fei Fang 11/30/2020

Q-Learning Example a 23 a 12 S 1 a 41 S 4 a 21 a 14 a 45 S 2 a 32 a 25 S 3 a 36 a 52 a 54 S 5 a 56 S 6: END Get reward 0, get to state S 3 Update state-value function 17 Fei Fang 11/30/2020

Q-Learning Example a 23 a 12 S 1 a 41 S 4 a 21 a 14 a 45 S 2 a 32 a 25 S 3 a 36 a 52 a 54 S 5 a 56 S 6: END Get reward 0, get to state S 6 Update state-value function 18 Fei Fang 11/30/2020

Q-Learning Example a 23 a 12 S 1 a 41 S 4 a 21 a 14 a 45 S 2 a 32 a 25 S 3 a 36 a 52 a 54 S 5 a 56 S 6: END 19 Fei Fang 11/30/2020

Q-Learning Example a 23 a 12 S 1 a 41 S 4 a 21 a 14 a 45 S 2 a 32 a 25 S 3 a 36 a 52 a 54 S 5 a 56 S 6: END Get reward 0, get to state S 3 Update state-value function 20 Fei Fang 11/30/2020

Q-Learning Example a 23 a 12 S 1 a 41 S 4 a 21 a 14 a 45 S 2 a 32 a 25 S 3 a 36 a 52 a 54 S 5 a 56 S 6: END Get reward 0, get to state S 6 Update state-value function 21 Fei Fang 11/30/2020

Q-Learning � 22 Fei Fang 11/30/2020

Q-Learning Example � 23 Fei Fang 11/30/2020

Q-Learning Example � 24 Fei Fang 11/30/2020

Q-Learning Example �After trial 1: 25 Fei Fang 11/30/2020

Q-Learning Example �Trial 2: 26 Fei Fang 11/30/2020

Q-Learning Example �After trial 2: 27 Fei Fang 11/30/2020

Q-Learning Example �Trial 3: 28 Fei Fang 11/30/2020

Q-Learning Example �After trial 3: 29 Fei Fang 11/30/2020

Q-Learning Properties �If acting randomly, Q-learning converges to optimal state-action values, and also therefore finds optimal policy � Off-policy learning � Can act in one way � But learning values of another policy (the optimal one!) �Acting randomly is sufficient, but not necessary, to learn the optimal values and policy 30 Fei Fang 11/30/2020

Quiz 1 �Is the following algorithm guaranteed to learn optimal policy? A: Yes B: No C: Not sure Some Algorithm 31 Fei Fang 11/30/2020

Outline � 33 Fei Fang 11/30/2020

Exploration vs Exploitation 34 Fei Fang 11/30/2020

� 35 Fei Fang 11/30/2020

� 36 Fei Fang 11/30/2020

Greedy in Limit of Infinite Exploration (GLIE) � 37 Fei Fang 11/30/2020

� 38 Fei Fang 11/30/2020

39 Fei Fang 11/30/2020

Boltzmann Policy � 40 Fei Fang 11/30/2020

Quiz 2 � 41 Fei Fang 11/30/2020

Summary Reinforcement Learning (RL) Active RL Model-free Active RL SARSA, Q-Learning (with some exploratory policy) Model-based Active RL 42 Fei Fang 11/30/2020

SARSA vs Q-Learning SARSA 43 Q-Learning Fei Fang 11/30/2020

Acknowledgment �Some slides are borrowed from previous slides made by Tai Sing Lee and Zico Kolter, and some examples are borrowed from Meg Aycinena and Emma Brunskill 45 Fei Fang 11/30/2020

Other Resources �http: //courses. csail. mit. edu/6. 825/fall 05/rl_lecture /rl_examples. pdf �http: //www. cs. cmu. edu/afs/cs/academic/class/15 780 -s 16/www/slides/rl. pdf �http: //incompleteideas. net/bookdraft 2017 no v 5. pdf 46 Fei Fang 11/30/2020

Backup Slides Fei Fang

Terminal States and Reward � 48 Fei Fang 11/30/2020

Terminal States and Reward � 49 Fei Fang 11/30/2020

Terminal States and Reward � 50 Fei Fang 11/30/2020

Terminal States and Reward � 51 Fei Fang 11/30/2020

Terminal States and Reward � 52 Fei Fang 11/30/2020

Terminal States and Reward � 53 Fei Fang 11/30/2020