Support vector machines for classification Radek Zka zikarimg

  • Slides: 12
Download presentation
Support vector machines for classification Radek Zíka zikar@img. cas. cz http: //bio. img. cas.

Support vector machines for classification Radek Zíka zikar@img. cas. cz http: //bio. img. cas. cz/zikar

Support vector machines for classification History Ø Statistical learning Ø SVM principles Ø SVM

Support vector machines for classification History Ø Statistical learning Ø SVM principles Ø SVM applications Ø SVM implementations Ø Examples Ø References Ø

History n n n Vapnik, V. , 1979, Estimation of dependencies based on empirical

History n n n Vapnik, V. , 1979, Estimation of dependencies based on empirical data Vapnik, V. , 1995, The nature of statistical learning theory Microarray gene expression data analysis, protein structural class. ~1999 -2000

Statistical learning n n Data Hypothesis => errors o n Expectation of the test

Statistical learning n n Data Hypothesis => errors o n Expectation of the test error (empirical risk) Learning machines o o o NN SVR ~ regression SVC ~ classification:

SVM principles (SVC) I. Training data (vector, scalar set) Ø [0. 32, 0. 1],

SVM principles (SVC) I. Training data (vector, scalar set) Ø [0. 32, 0. 1], -1; [0. 8, 0. 9, 2. 1], +1; [1. 1, 3. 1, 2. 1]; +1, … ü Model (parameters - Lagrange multipliers, hyperplane parameters) Ø a 1 = 0. 57, a 2 = 1. 37, …, w = [0. 91, 0. 81, 0. 74], b = 1. 2 ü Ø Ø Unclassified data (vector set) Classification using model parameters (scalars) ü y 1 = -1, y 2 = +0. 9, y 3 = +1

SVM principles (SVC) II. n n Data Functions Ø Hyperplane Ø Distance Ø Margin

SVM principles (SVC) II. n n Data Functions Ø Hyperplane Ø Distance Ø Margin Ø Lagrangian Params of hyperplane Classification

SVM principles (SVC) III. n n Linearly separable data Linearly non-separable data o o

SVM principles (SVC) III. n n Linearly separable data Linearly non-separable data o o o Generalized optimal separating hyperplane Generalisation in high dimensional space Kernel functions

SVM applications n Pattern recognition o n DNA array expression data analysis o n

SVM applications n Pattern recognition o n DNA array expression data analysis o n Features: words counts Features: expr. levels in diff. conditions Protein classification o Features: AA composition

SVM implementations I. Ø SVMlight - satyr. net 2. private: /usr/local/bin ü Ø bsvm

SVM implementations I. Ø SVMlight - satyr. net 2. private: /usr/local/bin ü Ø bsvm - satyr. net 2. private: /usr/local/bin ü Ø libsvm Ø v svm-train, svm-classify, svm-scale - satyr. net 2. private: /usr/local/bin ü Ø svm_learn, svm_classify svm-train, svm-predict, svm-scale, svm-toy my. SVM MATLAB svm toolbox Differences: available Kernel functions, optimization, multiple class. , user interfaces

SVM implementations II. n SVMlight o o n bsvm o n Multiple class. LIBSVM

SVM implementations II. n SVMlight o o n bsvm o n Multiple class. LIBSVM o n Simple text data format Fast, C routines GUI: svm-toy MATLAB svm toolbox o Graphical interface 2 D

Data format n Universal, simple, human readable text SVMlight n libsvm n o n

Data format n Universal, simple, human readable text SVMlight n libsvm n o n 2 D gr. interface bsvm o multi-class.

References n n n n Steve R. Gunn: SVM for Classification and Regression (1998)

References n n n n Steve R. Gunn: SVM for Classification and Regression (1998) Ch. J. C. Burges: A Tutorial on SVM for Pattern Recognition (1998) T. Evgeniou, M. Pontil, T. Poggio: Regularization Networks and SVM (2000) SVM for predicting protein structural class, BMC Bioinformatics, (2001), 2: 3 Knowledge-based analysis of microarray gene expression data by using support vector machines, PNAS, 97, 262 -267 SVM classification and validation of cancer tissue samples using microarray expression data, Bioinformatics, (2000), 10(16), 906 -914 http: //www. kernel-machines. org/publications. html