Support Vector Machine Linear Discriminant Function gx n

  • Slides: 14
Download presentation
Support Vector Machine

Support Vector Machine

Linear Discriminant Function � g(x) n n is a linear function: A hyper-plane in

Linear Discriminant Function � g(x) n n is a linear function: A hyper-plane in the feature space x 2 w. T x + b > 0 T n w b + x =0 (Unit-length) normal vector of the hyper-plane: w. T x + b < 0 x 1

denotes +1 Linear Discriminant Function � How would you classify these points using a

denotes +1 Linear Discriminant Function � How would you classify these points using a linear discriminant function in order to minimize the error rate? n x 2 denotes -1 Infinite number of answers! x 1

denotes +1 Linear Discriminant Function � How would you classify these points using a

denotes +1 Linear Discriminant Function � How would you classify these points using a linear discriminant function in order to minimize the error rate? n x 2 denotes -1 Infinite number of answers! x 1

denotes +1 Linear Discriminant Function � How would you classify these points using a

denotes +1 Linear Discriminant Function � How would you classify these points using a linear discriminant function in order to minimize the error rate? n x 2 denotes -1 Infinite number of answers! x 1

denotes +1 Linear Discriminant Function � How would you classify these points using a

denotes +1 Linear Discriminant Function � How would you classify these points using a linear discriminant function in order to minimize the error rate? n Infinite number of answers! n Which one is the best? x 2 denotes -1 x 1

denotes +1 Large Margin Linear Classifier � The linear discriminant function (classifier) with the

denotes +1 Large Margin Linear Classifier � The linear discriminant function (classifier) with the maximum margin is the best n Margin is defined as the width that the boundary could be increased by before hitting a data point n Why it is the best? q Robust to outliners and thus strong generalization ability x 2 denotes -1 “safe zone” Margin x 1

denotes +1 Large Margin Linear Classifier � Given a set of data points: x

denotes +1 Large Margin Linear Classifier � Given a set of data points: x 2 denotes -1 , where n With a scale transformation on both w and b, the above is equivalent to x 1

denotes +1 Large Margin Linear Classifier � We know that denotes -1 x 2

denotes +1 Large Margin Linear Classifier � We know that denotes -1 x 2 Margin T x+ n w b + x =1 x+ = 0 = -1 b b + + T x w w The margin width is: n x- Support Vectors x 1

denotes +1 Large Margin Linear Classifier � Formulation: denotes -1 x 2 Margin T

denotes +1 Large Margin Linear Classifier � Formulation: denotes -1 x 2 Margin T x+ such that w b + x =1 x+ = 0 = -1 b b + + T x w w n x- x 1

denotes +1 Large Margin Linear Classifier � Formulation: denotes -1 x 2 Margin T

denotes +1 Large Margin Linear Classifier � Formulation: denotes -1 x 2 Margin T x+ such that w b + x =1 x+ = 0 = -1 b b + + T x w w n x- x 1

denotes +1 Large Margin Linear Classifier � Formulation: denotes -1 x 2 Margin T

denotes +1 Large Margin Linear Classifier � Formulation: denotes -1 x 2 Margin T x+ such that w b + x =1 x+ = 0 = -1 b b + + T x w w n x- x 1

denotes +1 Large Margin Linear Classifier � What if data is not linear separable?

denotes +1 Large Margin Linear Classifier � What if data is not linear separable? (noisy data, outliers, etc. ) n Slack variables ξi can be added to allow misclassification of difficult or noisy data points x 2 b + x denotes -1 =1 =0 b w -1 + = T x b w + T x w T x 1

Large Margin Linear Classifier n Formulation: such that n Parameter C can be viewed

Large Margin Linear Classifier n Formulation: such that n Parameter C can be viewed as a way to control over-fitting.