Recent Results in Support Vector Machines Dave Musicant

Simple Linear Perceptron Class -1 n Class 1 Goal: Find the best line (or

Simple Linear Perceptron (cont. ) n The Simple Linear Perceptron is a classifier as

Finding the Best Plane n Not all planes are equal. Which of the two

Separating the planes n Construct the bounding planes: – Draw two parallel planes to

Recap: Finding the Best Plane n Details – All points in class 1 should

The Optimization Problem n The previous slide can be rewritten as: n This is

Data Which is Not Linearly Separable n What if a separating plane does not

The Support Vector Machine n Push the planes apart and minimize the error at

Terminology n Those points that touch the bounding plane, or lie on the wrong

Example from Carleton College n n n 1850 students 4 year undergraduate liberal arts

Student Research Example n n Goal: automatically generate “frequently asked questions” list from discussion

Building A Training Set n Which sentences are questions in the following text? From:

Representing the training set n Each document is a point Each potential word is

Results n n n If you just guess brain-dead: "every message contains a question",

Nonlinear SVMs n n n Generated with Lucent Technologies Demonstration 2 -D Pattern Recognition

Finding nonlinear surfaces n n n How to modify algorithm to find nonlinear surfaces?

Problems with this method n If dimensionality of space is high, lots of calculations

Slides: 22

Download presentation

Recent Results in Support Vector Machines Dave Musicant Graphic generated with Lucent Technologies Demonstration 2 -D Pattern Recognition Applet at http: //svm. research. bell-labs. com/SVT/SVMsvt. html

Simple Linear Perceptron Class -1 n Class 1 Goal: Find the best line (or hyperplane) to separate the training data. How to formalize? – In two dimensions, equation of the line is given by: – Better notation for n dimensions: treat each data point and the coefficients as vectors. Then equation is given by: 2

Simple Linear Perceptron (cont. ) n The Simple Linear Perceptron is a classifier as shown in the picture – Points that fall on the right are classified as “ 1” – Points that fall on the left are classified as “-1” n Therefore: using the training set, find a hyperplane (line) so that n This is a good starting point. But we can do better! Class -1 Class 1 3

Finding the Best Plane n Not all planes are equal. Which of the two following planes shown is better? n Both planes accurately classify the training set. The solid green plane is the better choice, since it is more likely to do well on future test data. The solid green plane is further away from the data. n n 4

Separating the planes n Construct the bounding planes: – Draw two parallel planes to the classification plane. – Push them as far apart as possible, until they hit data points. – The classification plane with bounding planes furthest apart is the best one. Class -1 Class 1 5

Recap: Finding the Best Plane n Details – All points in class 1 should be to the right of bounding plane 1. – All points in class -1 should be to the left of bounding plane -1. – Pick yi to be +1 or -1 depending on the classification. Then the above two inequalities can be written as one: – The distance between bounding planes should be maximized. – The distance between bounding planes is given by: Class -1 Class 1 6

The Optimization Problem n The previous slide can be rewritten as: n This is a mathematical program. – Optimization problem subject to constraints – More specifically, this is a quadratic program – There are high powered software tools for solving this kind of problem (both commercial and academic) – These general purpose tools are slow for this particular problem 7

Data Which is Not Linearly Separable n What if a separating plane does not exist? error n n Find the plane that maximizes the margin and minimizes the errors on the training points. Take original inequality and add a slack variable to measure error: 8

The Support Vector Machine n Push the planes apart and minimize the error at the same time: n C is a positive number that is chosen to balance these two goals. This problem is called a Support Vector Machine, or SVM. n 9

Terminology n Those points that touch the bounding plane, or lie on the wrong side, are called support vectors. n If all the data points except the support vectors were removed, the solution would turn out the same. The SVM is mathematically equivalent to force and torque equilibrium (hence the name support vectors). n 10

Example from Carleton College n n n 1850 students 4 year undergraduate liberal arts college Ranked 4 th in the nation by US News and World Report 15 -20 computer science majors per year All research assistants are full-time undergraduates 11

Student Research Example n n Goal: automatically generate “frequently asked questions” list from discussion groups Subgoal #1: Given a corpus of discussion group postings, identify those messages that contain questions – Recruit student volunteers to identify questions – Learn classification n Work by students Sarah Allen, Janet Campbell, Ester Gubbrud, Rachel Kirby, Lillie Kittredge 12

Building A Training Set 13

Building A Training Set n Which sentences are questions in the following text? From: oehler@yar. cs. wisc. edu (Wonko the Sane) I was recently talking to a possible employer ( mine! : -) ) and he made a reference to a 48 -bit graphics computer/image processing system. I seem to remember it being called IMAGE or something akin to that. Anyway, he claimed it had 48 -bit color + a 12 -bit alpha channel. That's 60 bits of info--what could that possibly be for? Specifically the 48 -bit color? That's 280 trillion colors, many more than the human eye can resolve. Is this an anti-aliasing thing? Or is this just some magic number to make it work better with a certain processor. 14

Representing the training set n Each document is a point Each potential word is a column (bag of words) n Other pre-processing tricks n – Remove punctuation – Remove "stop words" such as "is", "a", etc. – Use stemming to remove "ing" and "ed", etc. from similar words 15

Results n n n If you just guess brain-dead: "every message contains a question", get 55% right If you use a Support Vector Machine, get 66. 5% of them right What words do you think were strong indicators of questions? – anyone, does, any, what, thanks, how, help, know, there, do, question n What words do you think were strong contraindicators of questions? – re, sale, m, references, not, your 16

Nonlinear SVMs n n n Generated with Lucent Technologies Demonstration 2 -D Pattern Recognition Applet at http: //svm. research. bell-labs. com/SVT/SVMsvt. html Some datasets may not be best separated by a plane. How can we do nonlinear separating surfaces? Simple method: Map into a higher dimensional space, and do the same thing we have already done. 17

Finding nonlinear surfaces n n n How to modify algorithm to find nonlinear surfaces? First idea (simple and effective): map each data point into a higher dimensional space, and find a linear fit there Example: Find a quadratic surface for Use new coordinates in regular linear SVM A plane in this quadratic space is equivalent to a quadratic surface in our original space 18

Problems with this method n If dimensionality of space is high, lots of calculations – For a high polynomial space, combinations of coordinates explodes – Need to do all these calculations for all training points, and for each testing point – Infinite dimensional spaces impossible n Nonlinear surfaces can be used without these problems through the use of a kernel function. – Demonstration: http: //svm. cs. rhul. ac. uk/pagesnew/GPat. shtml 19

Example: Checkerboard 20

5 -Nearest Neighbor 21

Sixth degree polynomial kernel 22