Machine Teaching Patrice Simard Microsoft 1292017 1 Outline

  • Slides: 22
Download presentation
Machine Teaching Patrice Simard Microsoft 12/9/2017 1

Machine Teaching Patrice Simard Microsoft 12/9/2017 1

Outline • Problem and Definition: Machine Teaching • Insight: Teaching is a Form Programming

Outline • Problem and Definition: Machine Teaching • Insight: Teaching is a Form Programming • Machine Teaching in Action • Summary 2

 The teacher is a human The learner is an ML algorithm “Machine Learning”

The teacher is a human The learner is an ML algorithm “Machine Learning” is about extracting knowledge from data. There is a large community working on this… “Machine Teaching” is about extracting knowledge from teachers. This is a new art… More info on Machine Teaching at: https: //arxiv. org/abs/1707. 06742 3

Examples of Human Machine Teachers: Saleema Amershi Alicia Edelman Pelton Patrice Simard Soroush Ghorashi

Examples of Human Machine Teachers: Saleema Amershi Alicia Edelman Pelton Patrice Simard Soroush Ghorashi Jina Suh Matthew Hurst Johan Verwey Max Chickering Geoff Cox Riham Mansour Chris Meek Gonzalo Ramos Mo Wang John Wernsing Jason Williams 4

What is Teacher Knowledge? Concept(example) Concepts Concept in teacher’s head 1 1 0 0

What is Teacher Knowledge? Concept(example) Concepts Concept in teacher’s head 1 1 0 0 has “recipe” 0 1 0 0 has “the” 1 1 0 1 Feature assessment Sampling set Examples Teacher view Label assessment 1 Training set

Machine Teaching Summary • Machine Teaching is different from Machine learning • Labels (1

Machine Teaching Summary • Machine Teaching is different from Machine learning • Labels (1 bit) are one but one form of teaching. • Other ways to teach: shape the hypothesis space: • • • Features Label Schema Feature schema Example selection Regularizers … 6

Outline • Problem and Definition: Machine Teaching? • Insight: Teaching is a Form Programming

Outline • Problem and Definition: Machine Teaching? • Insight: Teaching is a Form Programming • Machine Teaching in Action • Summary 7

Commonalities between Machine Teaching and Programming • A human (or a group of humans)

Commonalities between Machine Teaching and Programming • A human (or a group of humans) create a desired function • The desired function needs to be specified • The desired function can be decomposed into sub-functions • The functions need to be debugged • The functions need to be maintained • The functions can be shared • The desired function needs to be deployed 8

 • Image. Net, word 2 vec features • Features, Training set ((X, Y)

• Image. Net, word 2 vec features • Features, Training set ((X, Y) pairs), • • • Schemas, … Teaching Expertise (feature tuning, modularity, exploration) Source Control IDEs Open Source Data collection (unlabeled), ground truth test set, PM-ing 9

History of Programming: Scale and Productivity Trend Programming Evolution 1950 s: Weather forecasting 1960

History of Programming: Scale and Productivity Trend Programming Evolution 1950 s: Weather forecasting 1960 s: Management Information Systems 1980 s: PCs games 1990 s: Web programming 2000 s: Web, Apps Scientifics Scientists, Engineers, Hobbyist Performance Productivity 2014: IDC: 18. 5 Million programmers, 7. 5 M hobbyists software engineers Machine Learning Evolution 1990 s: USPS OCR 2000 s: Basic recognition, diagnostics 2010 s: Perceptual tasks (vision, speech) 2015 s: Perceptual tasks Bots, dialog, Io. T 2020: 20 Million teachers? Scientists, Engineers, Data Analysts Engineers, Hobbyist, Domain experts Performance Productivity 10

Next Generation of “Teachers” • ML expert (10 Ks): Has profound understanding of ML.

Next Generation of “Teachers” • ML expert (10 Ks): Has profound understanding of ML. Can modify an ML algorithm or architecture to improve performance. • Data Analyst (100 Ks): Can analyze big data, detect trend and correlation, using ML. Can train ML models on existing data to extract value. • Programmers (10 Ms): Sophisticated understanding of problem decomposition and problem solving. Can create features programmatically. • Domain Expert (100 Ms): Understands the semantics of a problem. Can provide examples and counter examples, and explain the difference between them. Performance Machine Teaching Goal: Enable domain experts to transfer their knowledge into functions through “teaching”. Productivity 11

Tasks which are bottlenecked by “teaching” • Io. T: an appliance recognizing spoken commands.

Tasks which are bottlenecked by “teaching” • Io. T: an appliance recognizing spoken commands. • Back end services: routing customer feedback, suggestions, bug reports, to the right department. • Front end services: Ordering services (e. g. fast food drive through) • One-time assistant: A doctor or a lawyer building a model to sift through 100 s of 1000 s of cases. • Large problems broken into many small problems: Multiple topic classifiers/extractors for queries and documents to improve matching. • … 12

Insights from Programming: Summary • Programming and Machine Teaching are about encoding procedural knowledge

Insights from Programming: Summary • Programming and Machine Teaching are about encoding procedural knowledge into functions • Both are subject to an explosion in demand • Both have become bottlenecked by human productivity • We have decades of lessons from the evolution of programming • We are applying these lessons to Machine Teaching 13

Outline • Scope and Definition: Machine Teaching? • Insight: Teaching is a Form Programming

Outline • Scope and Definition: Machine Teaching? • Insight: Teaching is a Form Programming • Machine Teaching in Action • Summary 14

Assumptions that empower teachers to remove all errors: • Learning power: ML algorithm is

Assumptions that empower teachers to remove all errors: • Learning power: ML algorithm is consistent • Featuring power: “teaching completeness” • Composition power: models can be isolated and composed • Data power: Sampling set is infinite and searchable 15

ML Prediction Errors • E : How well we could possibly do (Bayes Error)

ML Prediction Errors • E : How well we could possibly do (Bayes Error) • E(H): How well we can do in (hypothesis) space H • E(H, D): How well we can do in space H with D data E(H, D) – E = E(H, D) – E(H) + E(H) – E Generalization error = estimation error + approximation error more examples needed more features needed (Change the hypothesis space) 16

More Complete View: 4 Kinds of Errors • Ignorance errors -> add examples to

More Complete View: 4 Kinds of Errors • Ignorance errors -> add examples to training • Feature blindness errors -> add features • Mislabel errors -> correct label • Uncertainty errors -> ignore or postpone (estimation errors) (approximation errors) 17

Training Process: The 0 -Error Design Pattern Repeat until quality criteria is met: While

Training Process: The 0 -Error Design Pattern Repeat until quality criteria is met: While there are no training errors do Search sampling set for example “test” errors Add test error to training set (fix ignorance errors) Fix training set error (feature blindness) If mislabel or uncertainty then correct label else Add features 18

Machine Teaching in Action: www. LUIS. ai 19

Machine Teaching in Action: www. LUIS. ai 19

Machine Teaching in Action (continued) “LUIS always performs best, independent from the domain, API.

Machine Teaching in Action (continued) “LUIS always performs best, independent from the domain, API. ai always worst, also independent from the domain, merely the second and third place changes. ” Reference: http: //www. sigdial. org/workshops/conference 18/proceedings/pdf/SIGDIAL 22. pdf 20

Outline • Scope and Definition: Machine Teaching? • Insight: Teaching is a form programming

Outline • Scope and Definition: Machine Teaching? • Insight: Teaching is a form programming • Machine Teaching in Action • Summary 21

Summary • Machine Teaching is about extracting knowledge from teacher. • The teaching language

Summary • Machine Teaching is about extracting knowledge from teacher. • The teaching language is far richer than just providing labels. • There is a deep connection between teaching and programming. • The Machine Teaching languages and design patterns are active topics of research. 22