Classification Logistic Regression Hungyi Lee Step 1 Function

  • Slides: 38
Download presentation
Classification: Logistic Regression Hung-yi Lee 李宏毅

Classification: Logistic Regression Hung-yi Lee 李宏毅

Step 1: Function Set Function set: Including all different w and b z 0

Step 1: Function Set Function set: Including all different w and b z 0 class 1 z 0 class 2

Step 1: Function Set … … Sigmoid Function

Step 1: Function Set … … Sigmoid Function

Step 2: Goodness of a Function Training Data Given a set of w and

Step 2: Goodness of a Function Training Data Given a set of w and b, what is its probability of generating the data?

Step 2: Goodness of a Function Cross entropy between two Bernoulli distribution Distribution p:

Step 2: Goodness of a Function Cross entropy between two Bernoulli distribution Distribution p: Distribution q: cross entropy

Step 2: Goodness of a Function Cross entropy between two Bernoulli distribution 0. 0

Step 2: Goodness of a Function Cross entropy between two Bernoulli distribution 0. 0 1. 0 minimize cross entropy

Step 3: Find the best function

Step 3: Find the best function

Step 3: Find the best function

Step 3: Find the best function

Step 3: Find the best function Larger difference, larger update

Step 3: Find the best function Larger difference, larger update

Logistic Regression + Square Error Step 1: Step 2: Step 3: (close to target)

Logistic Regression + Square Error Step 1: Step 2: Step 3: (close to target) (far from target)

Logistic Regression + Square Error Step 1: Step 2: Step 3: (far from target)

Logistic Regression + Square Error Step 1: Step 2: Step 3: (far from target) (close to target)

Cross Entropy v. s. Square Error Cross Entropy Total Loss Square Error http: //jmlr.

Cross Entropy v. s. Square Error Cross Entropy Total Loss Square Error http: //jmlr. org/proceedi ngs/papers/v 9/glorot 10 a. pdf w 1 w 2

Logistic Regression Linear Regression Step 1: Output: between 0 and 1 Step 2: Step

Logistic Regression Linear Regression Step 1: Output: between 0 and 1 Step 2: Step 3: Output: any value

Logistic Regression Linear Regression Step 1: Output: between 0 and 1 Step 2: Cross

Logistic Regression Linear Regression Step 1: Output: between 0 and 1 Step 2: Cross entropy: Output: any value

Logistic Regression Linear Regression Step 1: Output: between 0 and 1 Step 2: Step

Logistic Regression Linear Regression Step 1: Output: between 0 and 1 Step 2: Step 3: Logistic regression: Linear regression: Output: any value

Discriminative v. s. Generative directly find w and b Will we obtain the same

Discriminative v. s. Generative directly find w and b Will we obtain the same set of w and b? The same model (function set), but different function may be selected by the same training data.

Generative v. s. Discriminative Generative Discriminative All: hp, att, sp att, de, speed 73%

Generative v. s. Discriminative Generative Discriminative All: hp, att, sp att, de, speed 73% accuracy 79% accuracy

Generative v. s. Discriminative • Example Training Data Testing Data 1 1 X 4

Generative v. s. Discriminative • Example Training Data Testing Data 1 1 X 4 1 0 Class 1 Class 2 1 1 Class 1? Class 2? 0 1 Class 2 X 4 0 0 X 4 Class 2 How about Naïve Bayes?

Generative v. s. Discriminative • Example Training Data 1 1 X 4 1 0

Generative v. s. Discriminative • Example Training Data 1 1 X 4 1 0 Class 1 Class 2 0 1 Class 2 X 4 0 0 X 4 Class 2

Training Data 1 1 1 0 Class 1 Class 2 <0. 5 Testing Data

Training Data 1 1 1 0 Class 1 Class 2 <0. 5 Testing Data 1 1 X 4 0 1 Class 2 X 4 0 0 X 4 Class 2

Generative v. s. Discriminative • Usually people believe discriminative model is better • Benefit

Generative v. s. Discriminative • Usually people believe discriminative model is better • Benefit of generative model • With the assumption of probability distribution • less training data is needed • more robust to the noise • Priors and class-dependent probabilities can be estimated from different sources.

Multi-class Classification (3 classes as example) C 1 : C 2 : C 3

Multi-class Classification (3 classes as example) C 1 : C 2 : C 3 : Softmax 3 1 -3 0. 88 20 0. 12 2. 7 0. 05 ≈0

Multi-class Classification (3 classes as example) [Bishop, P 209 -210] Cross Entropy Softmax target

Multi-class Classification (3 classes as example) [Bishop, P 209 -210] Cross Entropy Softmax target

Limitation of Logistic Regression Input Feature x 1 x 2 0 0 0 1

Limitation of Logistic Regression Input Feature x 1 x 2 0 0 0 1 1 1 0 1 Can we? Label Class 2 Class 1 Class 2 z≥ 0 z<0 z≥ 0

Limitation of Logistic Regression • Feature transformation Not always easy …. . domain knowledge

Limitation of Logistic Regression • Feature transformation Not always easy …. . domain knowledge can be helpful

Limitation of Logistic Regression • Cascading logistic regression models Feature Transformation Classification (ignore bias

Limitation of Logistic Regression • Cascading logistic regression models Feature Transformation Classification (ignore bias in this figure)

-1 -2 2 2 -2 -1

-1 -2 2 2 -2 -1

(0. 73, 0. 05) (0. 27, 0. 27) (0. 05, 0. 73)

(0. 73, 0. 05) (0. 27, 0. 27) (0. 05, 0. 73)

Deep Learning! All the parameters of the logistic regressions are jointly learned. “Neuron” Feature

Deep Learning! All the parameters of the logistic regressions are jointly learned. “Neuron” Feature Transformation Classification Neural Network

Reference • Bishop: Chapter 4. 3

Reference • Bishop: Chapter 4. 3

Appendix

Appendix

Three Steps • Step 1. Function Set (Model) feature Otherwise, output: y = class

Three Steps • Step 1. Function Set (Model) feature Otherwise, output: y = class 2 • Step 2. Goodness of a function • Step 3. Find the best function: gradient descent class

Step 2: Loss function Ideal loss: 0 class +1 1 z 0 class -1

Step 2: Loss function Ideal loss: 0 class +1 1 z 0 class -1 2 Approximation: 0 or 1 Ideal loss z

Step 2: Loss function cross entropy Ground 1. 0 Truth

Step 2: Loss function cross entropy Ground 1. 0 Truth

Step 2: Loss function Ideal loss Divided by ln 2 here

Step 2: Loss function Ideal loss Divided by ln 2 here