A: $$$ Use free interaction data rather than expensive labels
AI: A function programmed with data AI: An economically viable digital agent that explores, learns, and acts
user profile news story user’s response Goal:
user profile news story user’s response Goal:
user profile news story user’s response Goal:
Ex: Which advice? Repeatedly: 1. Observe features of user+advice 2. Choose an advice. 3. Observe steps walked Goal: Healthy behaviors
L
Take-aways 1) Good fit for many real problems
Outline 1) Algs & Theory Overview 1) Evaluate? 2) Learn? 3) Explore? 2) Things that go wrong in practice 3) Systems for going right 4) Really doing it in practice
acts.
Policy
Randomization Policy
Randomization Policy
L
1 2 0. 5 <action> <loss> <probability>
L L AL
Offline L test set error.
A L L A L
L L
AL A A L
http: //hunch. net/~jl/interact. pdf http: //hunch. net/~mltf http: //alekhagarwal. net/bandits_and_rl/
Take-aways 1) Good fit for many problems 2) Fundamental questions have useful answers