Welcome to IST 597 Reinforcement Learning and Its
- Slides: 33
Welcome to IST 597: Reinforcement Learning and Its Applications Many slides adapted from Emma Brunskill’s CS 234 course at Stanford
Teaching Team • Instructor: Zihan Zhou – – Ph. D in Electrical and Computer Engineering from UIUC Research interest: computer vision, machine learning Office: E 367 Westgate Building Office hour: 2: 00 -3: 00 pm Tuesday or by appointment • TA: Huaxiu Yao – – IST Ph. D student Research interest: data mining, machine learning Office hour: 3: 00 -4: 00 pm Thursday or by appointment Location: E 301 Westgate Building 2
Course Website • Log into your Canvas • Go to Home Schedule – You will see all course information on wikispace 3
What Is Reinforcement Learning? Learn to make good sequences of decisions
What Is Reinforcement Learning? Learn to make good sequences of decisions repeated interactions with world action a World Agent observation o
What Is Reinforcement Learning? Learn to make good sequences of decisions reward for decisions action a observation o reward r
What Is Reinforcement Learning? Learn to make good sequences of decisions don’t know how world works in advance action a observation o reward r
Example: Atari • https: //www. youtube. com/watch? v=V 1 e. Yni. J 0 Rnk What are the observation, action, and reward in the example? Mnih, et al. , 2013. Playing Atari With Deep Reinforcement Learning.
Example: Alpha. Go • https: //www. alphagomovie. com/ • https: //deepmind. com/blog/alphago-zero-learning-scratch/ Silver, et al. , 2016. Mastering the game of Go with deep neural networks and tree search.
Example: Robotics • https: //www. youtube. com/watch? v=CE 6 f. BDHPb. P 8 Levine, et al. , 2016. End-to-End Training of Deep Visuomotor Policies.
Example: Navigation • https: //www. youtube. com/watch? v=2 yj. WDNXYh 5 s Mirowski, et al. , 2018. Learning to Navigate in Cities Without a Map.
Example: NLP • https: //vimeo. com/234955545 Guu, et al. , 2017. From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood.
Example: Language and Vision • https: //www. youtube. com/watch? v=R 4 hug. Gn. Nr 7 s&t=283 s Das, et al. , 2017. Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning.
Example: Healthcare • https: //www. youtube. com/watch? time_continue=187&v=x. Utif. Y 3_1 g. E Wang, et al. , 2018. Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation.
RL Applications • • Business operations Intelligent transportation Education and many more…
Why is RL hard? • Goal is to find an optimal way to make decisions – Yielding best outcomes – Or at least very good strategy
Delayed Consequences • Decisions now can impact things much later • Introduces two challenges – When planning: decisions involve reasoning about not just immediate benefit of a decision but how its longer term ramifications – When learning: temporal credit assignment is hard
Exploration • Learning about the world by making decisions – Agent as scientist – Learn to ride a bike by trial-and-error • Censored data – Only get a reward (label) for decision made – Don’t know what would have happened if had taken red pill instead of blue pill • Decisions impact what learn about – If choose going to Stanford instead of going to Penn State, will have different later experiences…
Generalization • Policy is mapping from past experience to action • Why not just pre-program a policy? Input: Image How many images are there?
RL vs. AI Planning • AI Planning: – computes good sequence of decisions – but given model of the world • Example: Solitaire – single player card game – Know all rules of game – Can compute probability distribution over next state and potential score
RL vs. Supervised Learning • Supervised Learning: – learns from experience – but provided correct labels
RL as Supervised Learning • Imitation learning: – learns from experience – given demos of good policies
RL as Supervised Learning • Imitation learning: – learns from experience – given demos of good policies ot at
Example: NVIDIA AI Car • https: //www. youtube. com/watch? v=-96 BEo. XJMs 0
Imitation Learning • Benefits – Great tools for supervised learning – Avoids exploration – With big data lots of data about outcomes of decisions • Limitations – Data can be expensive to capture – Limited by data collected
How Do We Proceed? • Explore the world & use feedback to guide future decisions • More challenges – Where do rewards come from? – Robustness / Risk sensitivity – Multi agents – ……
How Will This Course Be Taught? • First half: lectures – key concepts – In-class exercises – Discussion on your answers • Second half: student presentations – Each student will sign up for a topic related to RL, and take turns to give an in-depth tutorial to the class. – A list of suggested topics will be provided soon.
Group Project • Work in group to develop, implement, evaluate and document novel ideas in RL and its applications • 2 -3 people per group. • You can choose whom to work with.
Prerequisites • Calculus, Linear Algebra, Probability • Foundations of Machine Learning – familiar with concepts like loss function, derivative, gradient descent • Proficiency in Python – Necessary for implementing algorithms for the course project – unless you plan to focus on RL theory
Readings for Lectures • Text – Reinforcement Learning: An Introduction • Sutton and Barto • 2 nd Edition – Always check the latest schedule in Canvas • Read the text before class is strongly encouraged! – See course website for a reading list before each class
In Class • Attendance and participation – Attendance is required for every class – 15% in final grading – If you are not able to attend class with reasonable excuse, instructor should be notified before class
Grading • Student presentation: 35% • Course project: 50% • Class attendance and participation: 15%
After today’s class • ML quiz
- Ist 597
- Apprenticeship learning via inverse reinforcement learning
- Apprenticeship learning via inverse reinforcement learning
- Active learning reinforcement learning
- Positive reinforcement psychology definition
- Round the factors and estimate the products 656 x 106
- 149 597 871
- Hanjun kim
- Cos 597
- Winter kommt winter kommt flocken fallen nieder lied
- Meine lieblingsjahreszeit ist der winter
- Es ist herbst bunte blätter fliegen
- Es ist herbst bunte blätter fliegen
- Zu glauben ist schwer. nichts zu glauben ist unmöglich
- Passive reinforcement learning example
- What is active and passive reinforcement learning
- Socially mediated negative reinforcement
- Sutton and barto reinforcement learning
- Active and passive reinforcement learning
- Cuadro comparativo entre e-learning b-learning y m-learning
- Karan kathpalia
- Coarse coding reinforcement learning
- Snake game
- Direct reinforcement
- Hierarchical reinforcement learning survey
- What is optimal policy in reinforcement learning
- Supervised vs unsupervised learning
- Reinforcement learning exploration vs exploitation
- Jack's car rental reinforcement learning
- Neural network blackjack
- I2a reinforcement learning
- Reinforcement learning slides
- Reinforcement learning slides
- Reinforcement learning agent environment