Principles of Machine Learning PoChen Wu Media IC

  • Slides: 67
Download presentation
Principles of Machine Learning Po-Chen Wu Media IC and System Lab Graduate Institute of

Principles of Machine Learning Po-Chen Wu Media IC and System Lab Graduate Institute of Electronics Engineering National Taiwan University

Outline • • • Introduction to Machine Learning Theory of Generalization Learning Algorithm Hazard

Outline • • • Introduction to Machine Learning Theory of Generalization Learning Algorithm Hazard of Overfitting Blending and Bagging Media IC & System Lab Po-Chen Wu (吳柏辰) 2

Outline • • • Introduction to Machine Learning Theory of Generalization Learning Algorithm Hazard

Outline • • • Introduction to Machine Learning Theory of Generalization Learning Algorithm Hazard of Overfitting Blending and Bagging Media IC & System Lab Po-Chen Wu (吳柏辰) 3

Mars One Project • A one-way ticket to Mars. • There is a total

Mars One Project • A one-way ticket to Mars. • There is a total of 2, 782 applicants. • The application consists of applicant’s General information Motivational letter Résumé Video Media IC & System Lab Po-Chen Wu (吳柏辰) 4

Admission Ticket Approval Applicant Information Age 37 years Gender Male Occupation Professor Annual Salary

Admission Ticket Approval Applicant Information Age 37 years Gender Male Occupation Professor Annual Salary Year in Job Current Debt NTD 2, 000 11 Years NTD 110, 000 • Unknown target function to be learned: “Should we approve the admission ticket or not? ” Media IC & System Lab Po-Chen Wu (吳柏辰) 5

Formalize the Learning Problem • ML Media IC & System Lab Po-Chen Wu (吳柏辰)

Formalize the Learning Problem • ML Media IC & System Lab Po-Chen Wu (吳柏辰) 6

Learning Flow for Ticker Approval • (ideal credit approval formula) (historical records) Media IC

Learning Flow for Ticker Approval • (ideal credit approval formula) (historical records) Media IC & System Lab (‘learned’ formula to be used) Po-Chen Wu (吳柏辰) 7

The Learning Model (ideal credit approval formula) (historical records) (‘learned’ formula to be used)

The Learning Model (ideal credit approval formula) (historical records) (‘learned’ formula to be used) (set of candidate formula) Media IC & System Lab Po-Chen Wu (吳柏辰) 8

Practical Definition of Machine Learning (ideal credit approval formula) (historical records) (‘learned’ formula to

Practical Definition of Machine Learning (ideal credit approval formula) (historical records) (‘learned’ formula to be used) (set of candidate formula) Media IC & System Lab Po-Chen Wu (吳柏辰) 9

Outline • • • Introduction to Machine Learning Theory of Generalization Learning Algorithm Hazard

Outline • • • Introduction to Machine Learning Theory of Generalization Learning Algorithm Hazard of Overfitting Blending and Bagging Media IC & System Lab Po-Chen Wu (吳柏辰) 10

Sex Ratio of EE Students Sample • Population Media IC & System Lab Po-Chen

Sex Ratio of EE Students Sample • Population Media IC & System Lab Po-Chen Wu (吳柏辰) 11

Hoeffding’s Inequality • Population Sample Hoeffding’s Inequaltiy Media IC & System Lab Po-Chen Wu

Hoeffding’s Inequality • Population Sample Hoeffding’s Inequaltiy Media IC & System Lab Po-Chen Wu (吳柏辰) 12

Connection to Learning EE Learning • • Media IC & System Lab Po-Chen Wu

Connection to Learning EE Learning • • Media IC & System Lab Po-Chen Wu (吳柏辰) ? 13

Error Measure • In-sample Error Media IC & System Lab Out-of-sample Error Po-Chen Wu

Error Measure • In-sample Error Media IC & System Lab Out-of-sample Error Po-Chen Wu (吳柏辰) 14

Find a Separation Line Classifier Media IC & System Lab Po-Chen Wu (吳柏辰) 15

Find a Separation Line Classifier Media IC & System Lab Po-Chen Wu (吳柏辰) 15

The Formal Guarantee • Media IC & System Lab Po-Chen Wu (吳柏辰) 16

The Formal Guarantee • Media IC & System Lab Po-Chen Wu (吳柏辰) 16

Find a Separation Line Media IC & System Lab Po-Chen Wu (吳柏辰) 17

Find a Separation Line Media IC & System Lab Po-Chen Wu (吳柏辰) 17

 • Vapnik-Chervonenkis (VC) bound: Media IC & System Lab Po-Chen Wu (吳柏辰) 18

• Vapnik-Chervonenkis (VC) bound: Media IC & System Lab Po-Chen Wu (吳柏辰) 18

Find a Separation Linear Seperable! Media IC & System Lab Po-Chen Wu (吳柏辰) 19

Find a Separation Linear Seperable! Media IC & System Lab Po-Chen Wu (吳柏辰) 19

Noise & Model Complexity High Complexity Low Complexity Media IC & System Lab Po-Chen

Noise & Model Complexity High Complexity Low Complexity Media IC & System Lab Po-Chen Wu (吳柏辰) 20

Statistical Learning Flow (set of candidate formula) Media IC & System Lab Po-Chen Wu

Statistical Learning Flow (set of candidate formula) Media IC & System Lab Po-Chen Wu (吳柏辰) 21

Outline • • • Introduction to Machine Learning Theory of Generalization Learning Algorithm Hazard

Outline • • • Introduction to Machine Learning Theory of Generalization Learning Algorithm Hazard of Overfitting Blending and Bagging Media IC & System Lab Po-Chen Wu (吳柏辰) 22

A Simple Hypothesis Set : Perceptron • called ‘perceptron’ hypothesis historically Media IC &

A Simple Hypothesis Set : Perceptron • called ‘perceptron’ hypothesis historically Media IC & System Lab Po-Chen Wu (吳柏辰) 23

Vector Form of Perception Hypothesis • Media IC & System Lab Po-Chen Wu (吳柏辰)

Vector Form of Perception Hypothesis • Media IC & System Lab Po-Chen Wu (吳柏辰) 24

 • Media IC & System Lab Po-Chen Wu (吳柏辰) 25

• Media IC & System Lab Po-Chen Wu (吳柏辰) 25

Perceptron Learning Algorithm • Media IC & System Lab Po-Chen Wu (吳柏辰) 26

Perceptron Learning Algorithm • Media IC & System Lab Po-Chen Wu (吳柏辰) 26

Line with Noise Tolerance • Media IC & System Lab Po-Chen Wu (吳柏辰) 27

Line with Noise Tolerance • Media IC & System Lab Po-Chen Wu (吳柏辰) 27

Pocket Algorithm • Modify PLA algorithm (black lines) by keeping best weights in pocket.

Pocket Algorithm • Modify PLA algorithm (black lines) by keeping best weights in pocket. Media IC & System Lab Po-Chen Wu (吳柏辰) 28

 • Linear regression find lines/hyperplanes with small residuals Media IC & System Lab

• Linear regression find lines/hyperplanes with small residuals Media IC & System Lab Po-Chen Wu (吳柏辰) 29

Error Measure • In-sample Error Media IC & System Lab Out-of-sample Error Po-Chen Wu

Error Measure • In-sample Error Media IC & System Lab Out-of-sample Error Po-Chen Wu (吳柏辰) 30

 • Media IC & System Lab Po-Chen Wu (吳柏辰) 31

• Media IC & System Lab Po-Chen Wu (吳柏辰) 31

 • Media IC & System Lab Po-Chen Wu (吳柏辰) 32

• Media IC & System Lab Po-Chen Wu (吳柏辰) 32

Recap: Matrix Calculus • Denominator-layout notation Media IC & System Lab Po-Chen Wu (吳柏辰)

Recap: Matrix Calculus • Denominator-layout notation Media IC & System Lab Po-Chen Wu (吳柏辰) 33

 • Media IC & System Lab Po-Chen Wu (吳柏辰) 34

• Media IC & System Lab Po-Chen Wu (吳柏辰) 34

Linear Regression Algorithm • Media IC & System Lab Po-Chen Wu (吳柏辰) 35

Linear Regression Algorithm • Media IC & System Lab Po-Chen Wu (吳柏辰) 35

Logistic Hypothesis • Media IC & System Lab Po-Chen Wu (吳柏辰) 36

Logistic Hypothesis • Media IC & System Lab Po-Chen Wu (吳柏辰) 36

Logistic Function • smooth, monotonic, sigmoid function of s Media IC & System Lab

Logistic Function • smooth, monotonic, sigmoid function of s Media IC & System Lab Po-Chen Wu (吳柏辰) 37

Cross-Entropy Error • Media IC & System Lab Po-Chen Wu (吳柏辰) 38

Cross-Entropy Error • Media IC & System Lab Po-Chen Wu (吳柏辰) 38

 • Media IC & System Lab Po-Chen Wu (吳柏辰) 39

• Media IC & System Lab Po-Chen Wu (吳柏辰) 39

 • No closed-form solution! Media IC & System Lab Po-Chen Wu (吳柏辰) 40

• No closed-form solution! Media IC & System Lab Po-Chen Wu (吳柏辰) 40

Iterative Optimization • Media IC & System Lab Po-Chen Wu (吳柏辰) 41

Iterative Optimization • Media IC & System Lab Po-Chen Wu (吳柏辰) 41

Gradient Descent • Gradient descent: descent a simple & popular optimization tool Media IC

Gradient Descent • Gradient descent: descent a simple & popular optimization tool Media IC & System Lab Po-Chen Wu (吳柏辰) 42

 • too slow too unstable better the fixed learning rate Media IC &

• too slow too unstable better the fixed learning rate Media IC & System Lab Po-Chen Wu (吳柏辰) 43

Logistic Regression Algorithm • Media IC & System Lab Po-Chen Wu (吳柏辰) 44

Logistic Regression Algorithm • Media IC & System Lab Po-Chen Wu (吳柏辰) 44

Stochastic Gradient Descent (SGD) • Media IC & System Lab Po-Chen Wu (吳柏辰) 45

Stochastic Gradient Descent (SGD) • Media IC & System Lab Po-Chen Wu (吳柏辰) 45

Three Linear Models • linear classification Media IC & System Lab linear regression Po-Chen

Three Linear Models • linear classification Media IC & System Lab linear regression Po-Chen Wu (吳柏辰) logistic regression 46

Outline • • • Introduction to Machine Learning Theory of Generalization Learning Algorithm Hazard

Outline • • • Introduction to Machine Learning Theory of Generalization Learning Algorithm Hazard of Overfitting Blending and Bagging Media IC & System Lab Po-Chen Wu (吳柏辰) 47

Circular Separable 5 0 -5 -5 0 5 • Media IC & System Lab

Circular Separable 5 0 -5 -5 0 5 • Media IC & System Lab Po-Chen Wu (吳柏辰) 48

Circular Separable and Linear Separable 5 5 0 2. 5 -5 -5 0 0

Circular Separable and Linear Separable 5 5 0 2. 5 -5 -5 0 0 2. 5 5 • Media IC & System Lab Po-Chen Wu (吳柏辰) 49

 • Media IC & System Lab Po-Chen Wu (吳柏辰) 50

• Media IC & System Lab Po-Chen Wu (吳柏辰) 50

Hazard of Overfitting • Vapnik-Chervonenkis (VC) bound (remember? ): Media IC & System Lab

Hazard of Overfitting • Vapnik-Chervonenkis (VC) bound (remember? ): Media IC & System Lab Po-Chen Wu (吳柏辰) 51

Regularization: The Magic • Data Target Fit overfitting Media IC & System Lab Data

Regularization: The Magic • Data Target Fit overfitting Media IC & System Lab Data Target Fit step back Po-Chen Wu (吳柏辰) ‘regularized fit’ 52

Stepping Back as Constraint • Media IC & System Lab Po-Chen Wu (吳柏辰) 53

Stepping Back as Constraint • Media IC & System Lab Po-Chen Wu (吳柏辰) 53

Regression with Looser Constraint • Media IC & System Lab Po-Chen Wu (吳柏辰) 54

Regression with Looser Constraint • Media IC & System Lab Po-Chen Wu (吳柏辰) 54

Regression with Softer Constraint • Media IC & System Lab Po-Chen Wu (吳柏辰) 55

Regression with Softer Constraint • Media IC & System Lab Po-Chen Wu (吳柏辰) 55

The Lagrange Multiplier • Media IC & System Lab Po-Chen Wu (吳柏辰) 56

The Lagrange Multiplier • Media IC & System Lab Po-Chen Wu (吳柏辰) 56

Ridge Regression • Media IC & System Lab Po-Chen Wu (吳柏辰) 57

Ridge Regression • Media IC & System Lab Po-Chen Wu (吳柏辰) 57

The Results • Data Target Fit overfitting Media IC & System Lab Data Target

The Results • Data Target Fit overfitting Media IC & System Lab Data Target Fit ‘regularized fit’ Po-Chen Wu (吳柏辰) Data Target Fit underfitting 58

The VC Message • out-of-sample error model complexity underfitting overfitting in-sample error Media IC

The VC Message • out-of-sample error model complexity underfitting overfitting in-sample error Media IC & System Lab Po-Chen Wu (吳柏辰) 59

Model Selection Problem • Which one is better? Media IC & System Lab Po-Chen

Model Selection Problem • Which one is better? Media IC & System Lab Po-Chen Wu (吳柏辰) 60

 • Media IC & System Lab Po-Chen Wu (吳柏辰) 61

• Media IC & System Lab Po-Chen Wu (吳柏辰) 61

pick the best Media IC & System Lab Po-Chen Wu (吳柏辰) 62

pick the best Media IC & System Lab Po-Chen Wu (吳柏辰) 62

V-fold Cross Validation • training validation Media IC & System Lab Po-Chen Wu (吳柏辰)

V-fold Cross Validation • training validation Media IC & System Lab Po-Chen Wu (吳柏辰) 63

Outline • • • Introduction to Machine Learning Theory of Generalization Learning Algorithm Hazard

Outline • • • Introduction to Machine Learning Theory of Generalization Learning Algorithm Hazard of Overfitting Blending and Bagging Media IC & System Lab Po-Chen Wu (吳柏辰) 64

Blending (Aggregation) • Media IC & System Lab Po-Chen Wu (吳柏辰) 65

Blending (Aggregation) • Media IC & System Lab Po-Chen Wu (吳柏辰) 65

Bagging (Bootstrap Aggregation) • Media IC & System Lab Po-Chen Wu (吳柏辰) 66

Bagging (Bootstrap Aggregation) • Media IC & System Lab Po-Chen Wu (吳柏辰) 66

Reference • Machine learning slides by Prof. Hsuan-Tien Lin http: //www. csie. ntu. edu.

Reference • Machine learning slides by Prof. Hsuan-Tien Lin http: //www. csie. ntu. edu. tw/~htlin/course/ml 14 fall/ Media IC & System Lab Po-Chen Wu (吳柏辰) 67