Alekh Agarwal John Langford KDD Tutorial August 19

How about news? Repeatedly: 1. Observe features of user+articles 2. Choose a news article.

Is childproofing interesting to Alekh? A: Need Right Signal for Right Answer

Fraction value retained Model value over time 1 0. 8 0. 6 0. 4

Action (selected news story) Learning Features Consequence (user history, news stories) (click-or-not)

A: $$$ Use free interaction data rather than expensive labels

AI: A function programmed with data AI: An economically viable digital agent that explores,

Ex: Which advice? Repeatedly: 1. Observe features of user+advice 2. Choose an advice. 3.

Take-aways 1) Good fit for many real problems

Outline 1) Algs & Theory Overview 1) Evaluate? 2) Learn? 3) Explore? 2) Things

http: //hunch. net/~jl/interact. pdf http: //hunch. net/~mltf http: //alekhagarwal. net/bandits_and_rl/

Take-aways 1) Good fit for many problems 2) Fundamental questions have useful answers

Slides: 46

Download presentation

Alekh Agarwal John Langford KDD Tutorial, August 19 Slides and full references at http: //hunch. net/~rwil/kdd 2018. html

1 1 5 4 3 7 5 3 5 5 9 0 6 3 5 2 0 0

How about news? Repeatedly: 1. Observe features of user+articles 2. Choose a news article. 3. Observe click-or-not Goal: Maximize fraction of clicks

Is childproofing interesting to Alekh? A: Need Right Signal for Right Answer

Fraction value retained Model value over time 1 0. 8 0. 6 0. 4 0. 2 0 Day 1/Day 1/Day 2 Day 1/Day 3 A: The world changes!

Action (selected news story) Learning Features Consequence (user history, news stories) (click-or-not)

A: $$$ Use free interaction data rather than expensive labels

AI: A function programmed with data AI: An economically viable digital agent that explores, learns, and acts

user profile news story user’s response Goal:

Ex: Which advice? Repeatedly: 1. Observe features of user+advice 2. Choose an advice. 3. Observe steps walked Goal: Healthy behaviors

Take-aways 1) Good fit for many real problems

Outline 1) Algs & Theory Overview 1) Evaluate? 2) Learn? 3) Explore? 2) Things that go wrong in practice 3) Systems for going right 4) Really doing it in practice

acts.

Policy

Randomization Policy

1 2 0. 5 <action> <loss> <probability>

L L AL

Offline L test set error.

A L L A L

L L

AL A A L

http: //hunch. net/~jl/interact. pdf http: //hunch. net/~mltf http: //alekhagarwal. net/bandits_and_rl/

Take-aways 1) Good fit for many problems 2) Fundamental questions have useful answers