Alekh Agarwal John Langford KDD Tutorial August 19

  • Slides: 46
Download presentation
Alekh Agarwal John Langford KDD Tutorial, August 19 Slides and full references at http:

Alekh Agarwal John Langford KDD Tutorial, August 19 Slides and full references at http: //hunch. net/~rwil/kdd 2018. html

1 1 5 4 3 7 5 3 5 5 9 0 6 3

1 1 5 4 3 7 5 3 5 5 9 0 6 3 5 2 0 0

How about news? Repeatedly: 1. Observe features of user+articles 2. Choose a news article.

How about news? Repeatedly: 1. Observe features of user+articles 2. Choose a news article. 3. Observe click-or-not Goal: Maximize fraction of clicks

Is childproofing interesting to Alekh? A: Need Right Signal for Right Answer

Is childproofing interesting to Alekh? A: Need Right Signal for Right Answer

Fraction value retained Model value over time 1 0. 8 0. 6 0. 4

Fraction value retained Model value over time 1 0. 8 0. 6 0. 4 0. 2 0 Day 1/Day 1/Day 2 Day 1/Day 3 A: The world changes!

Action (selected news story) Learning Features Consequence (user history, news stories) (click-or-not)

Action (selected news story) Learning Features Consequence (user history, news stories) (click-or-not)

A: $$$ Use free interaction data rather than expensive labels

A: $$$ Use free interaction data rather than expensive labels

AI: A function programmed with data AI: An economically viable digital agent that explores,

AI: A function programmed with data AI: An economically viable digital agent that explores, learns, and acts

user profile news story user’s response Goal:

user profile news story user’s response Goal:

user profile news story user’s response Goal:

user profile news story user’s response Goal:

 user profile news story user’s response Goal:

user profile news story user’s response Goal:

Ex: Which advice? Repeatedly: 1. Observe features of user+advice 2. Choose an advice. 3.

Ex: Which advice? Repeatedly: 1. Observe features of user+advice 2. Choose an advice. 3. Observe steps walked Goal: Healthy behaviors

L

L

Take-aways 1) Good fit for many real problems

Take-aways 1) Good fit for many real problems

Outline 1) Algs & Theory Overview 1) Evaluate? 2) Learn? 3) Explore? 2) Things

Outline 1) Algs & Theory Overview 1) Evaluate? 2) Learn? 3) Explore? 2) Things that go wrong in practice 3) Systems for going right 4) Really doing it in practice

acts.

acts.

Policy

Policy

Randomization Policy

Randomization Policy

Randomization Policy

Randomization Policy

L

L

1 2 0. 5 <action> <loss> <probability>

1 2 0. 5 <action> <loss> <probability>

L L AL

L L AL

Offline L test set error.

Offline L test set error.

A L L A L

A L L A L

L L

L L

AL A A L

AL A A L

http: //hunch. net/~jl/interact. pdf http: //hunch. net/~mltf http: //alekhagarwal. net/bandits_and_rl/

http: //hunch. net/~jl/interact. pdf http: //hunch. net/~mltf http: //alekhagarwal. net/bandits_and_rl/

Take-aways 1) Good fit for many problems 2) Fundamental questions have useful answers

Take-aways 1) Good fit for many problems 2) Fundamental questions have useful answers