Welcome to IST 597 Reinforcement Learning and Its

  • Slides: 33
Download presentation
Welcome to IST 597: Reinforcement Learning and Its Applications Many slides adapted from Emma

Welcome to IST 597: Reinforcement Learning and Its Applications Many slides adapted from Emma Brunskill’s CS 234 course at Stanford

Teaching Team • Instructor: Zihan Zhou – – Ph. D in Electrical and Computer

Teaching Team • Instructor: Zihan Zhou – – Ph. D in Electrical and Computer Engineering from UIUC Research interest: computer vision, machine learning Office: E 367 Westgate Building Office hour: 2: 00 -3: 00 pm Tuesday or by appointment • TA: Huaxiu Yao – – IST Ph. D student Research interest: data mining, machine learning Office hour: 3: 00 -4: 00 pm Thursday or by appointment Location: E 301 Westgate Building 2

Course Website • Log into your Canvas • Go to Home Schedule – You

Course Website • Log into your Canvas • Go to Home Schedule – You will see all course information on wikispace 3

What Is Reinforcement Learning? Learn to make good sequences of decisions

What Is Reinforcement Learning? Learn to make good sequences of decisions

What Is Reinforcement Learning? Learn to make good sequences of decisions repeated interactions with

What Is Reinforcement Learning? Learn to make good sequences of decisions repeated interactions with world action a World Agent observation o

What Is Reinforcement Learning? Learn to make good sequences of decisions reward for decisions

What Is Reinforcement Learning? Learn to make good sequences of decisions reward for decisions action a observation o reward r

What Is Reinforcement Learning? Learn to make good sequences of decisions don’t know how

What Is Reinforcement Learning? Learn to make good sequences of decisions don’t know how world works in advance action a observation o reward r

Example: Atari • https: //www. youtube. com/watch? v=V 1 e. Yni. J 0 Rnk

Example: Atari • https: //www. youtube. com/watch? v=V 1 e. Yni. J 0 Rnk What are the observation, action, and reward in the example? Mnih, et al. , 2013. Playing Atari With Deep Reinforcement Learning.

Example: Alpha. Go • https: //www. alphagomovie. com/ • https: //deepmind. com/blog/alphago-zero-learning-scratch/ Silver, et

Example: Alpha. Go • https: //www. alphagomovie. com/ • https: //deepmind. com/blog/alphago-zero-learning-scratch/ Silver, et al. , 2016. Mastering the game of Go with deep neural networks and tree search.

Example: Robotics • https: //www. youtube. com/watch? v=CE 6 f. BDHPb. P 8 Levine,

Example: Robotics • https: //www. youtube. com/watch? v=CE 6 f. BDHPb. P 8 Levine, et al. , 2016. End-to-End Training of Deep Visuomotor Policies.

Example: Navigation • https: //www. youtube. com/watch? v=2 yj. WDNXYh 5 s Mirowski, et

Example: Navigation • https: //www. youtube. com/watch? v=2 yj. WDNXYh 5 s Mirowski, et al. , 2018. Learning to Navigate in Cities Without a Map.

Example: NLP • https: //vimeo. com/234955545 Guu, et al. , 2017. From Language to

Example: NLP • https: //vimeo. com/234955545 Guu, et al. , 2017. From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood.

Example: Language and Vision • https: //www. youtube. com/watch? v=R 4 hug. Gn. Nr

Example: Language and Vision • https: //www. youtube. com/watch? v=R 4 hug. Gn. Nr 7 s&t=283 s Das, et al. , 2017. Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning.

Example: Healthcare • https: //www. youtube. com/watch? time_continue=187&v=x. Utif. Y 3_1 g. E Wang,

Example: Healthcare • https: //www. youtube. com/watch? time_continue=187&v=x. Utif. Y 3_1 g. E Wang, et al. , 2018. Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation.

RL Applications • • Business operations Intelligent transportation Education and many more…

RL Applications • • Business operations Intelligent transportation Education and many more…

Why is RL hard? • Goal is to find an optimal way to make

Why is RL hard? • Goal is to find an optimal way to make decisions – Yielding best outcomes – Or at least very good strategy

Delayed Consequences • Decisions now can impact things much later • Introduces two challenges

Delayed Consequences • Decisions now can impact things much later • Introduces two challenges – When planning: decisions involve reasoning about not just immediate benefit of a decision but how its longer term ramifications – When learning: temporal credit assignment is hard

Exploration • Learning about the world by making decisions – Agent as scientist –

Exploration • Learning about the world by making decisions – Agent as scientist – Learn to ride a bike by trial-and-error • Censored data – Only get a reward (label) for decision made – Don’t know what would have happened if had taken red pill instead of blue pill • Decisions impact what learn about – If choose going to Stanford instead of going to Penn State, will have different later experiences…

Generalization • Policy is mapping from past experience to action • Why not just

Generalization • Policy is mapping from past experience to action • Why not just pre-program a policy? Input: Image How many images are there?

RL vs. AI Planning • AI Planning: – computes good sequence of decisions –

RL vs. AI Planning • AI Planning: – computes good sequence of decisions – but given model of the world • Example: Solitaire – single player card game – Know all rules of game – Can compute probability distribution over next state and potential score

RL vs. Supervised Learning • Supervised Learning: – learns from experience – but provided

RL vs. Supervised Learning • Supervised Learning: – learns from experience – but provided correct labels

RL as Supervised Learning • Imitation learning: – learns from experience – given demos

RL as Supervised Learning • Imitation learning: – learns from experience – given demos of good policies

RL as Supervised Learning • Imitation learning: – learns from experience – given demos

RL as Supervised Learning • Imitation learning: – learns from experience – given demos of good policies ot at

Example: NVIDIA AI Car • https: //www. youtube. com/watch? v=-96 BEo. XJMs 0

Example: NVIDIA AI Car • https: //www. youtube. com/watch? v=-96 BEo. XJMs 0

Imitation Learning • Benefits – Great tools for supervised learning – Avoids exploration –

Imitation Learning • Benefits – Great tools for supervised learning – Avoids exploration – With big data lots of data about outcomes of decisions • Limitations – Data can be expensive to capture – Limited by data collected

How Do We Proceed? • Explore the world & use feedback to guide future

How Do We Proceed? • Explore the world & use feedback to guide future decisions • More challenges – Where do rewards come from? – Robustness / Risk sensitivity – Multi agents – ……

How Will This Course Be Taught? • First half: lectures – key concepts –

How Will This Course Be Taught? • First half: lectures – key concepts – In-class exercises – Discussion on your answers • Second half: student presentations – Each student will sign up for a topic related to RL, and take turns to give an in-depth tutorial to the class. – A list of suggested topics will be provided soon.

Group Project • Work in group to develop, implement, evaluate and document novel ideas

Group Project • Work in group to develop, implement, evaluate and document novel ideas in RL and its applications • 2 -3 people per group. • You can choose whom to work with.

Prerequisites • Calculus, Linear Algebra, Probability • Foundations of Machine Learning – familiar with

Prerequisites • Calculus, Linear Algebra, Probability • Foundations of Machine Learning – familiar with concepts like loss function, derivative, gradient descent • Proficiency in Python – Necessary for implementing algorithms for the course project – unless you plan to focus on RL theory

Readings for Lectures • Text – Reinforcement Learning: An Introduction • Sutton and Barto

Readings for Lectures • Text – Reinforcement Learning: An Introduction • Sutton and Barto • 2 nd Edition – Always check the latest schedule in Canvas • Read the text before class is strongly encouraged! – See course website for a reading list before each class

In Class • Attendance and participation – Attendance is required for every class –

In Class • Attendance and participation – Attendance is required for every class – 15% in final grading – If you are not able to attend class with reasonable excuse, instructor should be notified before class

Grading • Student presentation: 35% • Course project: 50% • Class attendance and participation:

Grading • Student presentation: 35% • Course project: 50% • Class attendance and participation: 15%

After today’s class • ML quiz

After today’s class • ML quiz