Support vector machines When the data is linearly
- Slides: 24
Support vector machines • When the data is linearly separable, which of the many possible solutions should we prefer? • SVM criterion: maximize the margin, or distance between the hyperplane and the closest training example
Support vector machines • When the data is linearly separable, which of the many possible solutions should we prefer? • SVM criterion: maximize the margin, or distance between the hyperplane and the closest training example Margin Support vectors
Finding the maximum margin hyperplane
Finding the maximum margin hyperplane • Find separating hyperplane that maximizes the distance to the closest training example Support vectors Margin
Finding the maximum margin hyperplane
“Soft margin” formulation • What about non-separable data? • And even for separable data, we may prefer a larger margin with a few constraints violated Source
“Soft margin” formulation • What about non-separable data? • And even for separable data, we may prefer a larger margin with a few constraints violated Source
“Soft margin” formulation Hinge loss Incorrectly classified +1 -1 0 Correctly classified
“Soft margin” formulation Hinge loss Incorrectly classified +1 -1 0 Correctly classified Recall hinge loss used by the perceptron update algorithm!
“Soft margin” formulation Maximize margin – a. k. a. regularization Minimize misclassification loss
SGD optimization (omitting bias)
SGD optimization (omitting bias) S. Shalev-Schwartz et al. , Pegasos: Primal Estimated sub-Gr. Adient SOlver for SVM, Mathematical Programming, 2011
SVM vs. perceptron
Dual SVM formulation
Dual SVM formulation
Nonlinear SVMs • General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable Φ: x → φ(x) Slide credit: Andrew Moore
Nonlinear SVMs
Example 0 x x 2 0 x
Kernel example 1: Polynomial
Kernel example 1: Polynomial
Kernel example 2: Gaussian K(x, x’) ||x – x’||
Kernel example 2: Gaussian SV’s It’s also called a Radial Basis Function (RBF) kernel
Kernel example 2: Gaussian
SVM: Pros and cons • Pros • • • Margin maximization and kernel trick are elegant, amenable to convex optimization and theoretical analysis SVM loss gives very good accuracy in practice Linear SVMs can scale to large datasets Kernel SVMs are flexible, can be used with problem-specific kernels Perfect “off-the-shelf” classifier, many packages are available • Cons • • Kernel SVM training does not scale to large datasets: memory cost is quadratic and computation cost even worse “Shallow” method: predictor is a “flat” combination of kernel functions of support vectors and test example, no explicit feature representations are learned
- Transductive support vector machines
- Linearly independent
- Vector space is defined over
- Dimension of vector space
- Linearly independent paths
- Linearly dependent and independent vectors
- Condition for linearly independent vectors
- Conditions for linear independence
- What is the equation of the following line
- Linearly independent paths
- Linearly independent paths
- Basic concepts of matrices
- Vector data vs raster data
- Raster vs vector data
- Directed line segment vector
- Vector unitario formula
- Resolution of vectors
- Whats a position vector
- Operational data vs decision support data
- Support vector machine icon
- Support vector machine regression
- Father of support vector machine
- Support vector machine exercise solutions
- Support vector machine pdf
- Support vector regression