Novelty Detection OneClass SVM OCSVM 1 Outline Introduction
- Slides: 36
Novelty Detection & One-Class SVM (OCSVM) 1
Outline Introduction Quantile Estimation OCSVM – Theory OCSVM – Application to Jet Engines 2
Novelty Detection is An unsupervised learning problem (data unlabeled) About the identification of new or unknown data or signal that a machine learning system is not aware of during training 3
Example 1 “Normal” “Novel” 4
So what’s seems to be the problem? It’s a 2 -Class problem. “Normal vs. “Novel” 5
The Problem is That “All positive examples are alike but each negative example is negative in its own way”. 6
Example 2 Suppose we want to build a classifier that recognizes web pages about “pickup sticks”. How can we collect a training data? n We can surf the web and pretty easily assemble a sample to be our collection of positive examples. What about negative examples ? n The negative examples are… the rest of the web. That is ~(“pickup sticks web page”) So the negative examples come from an unknown # of negative classes. 7
Applications Many exist n n n n Intrusion detection Fraud detection Fault detection Robotics Medical diagnosis E-Commerce And more… 8
Possible Approaches Density Estimation: n n Estimate a density based on training data Threshold the estimated density for test points Quantile Estimation: n n Estimate a quantile of the distribution underlying the training data: for a fixed constant , attempt to find a small set such that Check whether test points are inside or outside 9
Quantile Estimation (QE) A quantile function with respect to is defined as : - a class of measurable subsets of - a real valued function. denotes the that attains the infimum 10
Quantile Estimation (QE) The empirical quantile function is defined as above where is the empirical distribution: - denotes the infimum on the training set. Thus the goal is to estimate that attains the through 11
Quantile Estimation Choosing Intelligently and is important On one hand large class many small sets that contain a fraction of the training examples. On the other hand, if we allowed just any set, the chosen set could consist of only the training points poor generalization 12
Complex vs. Simple 13
Support Vector Method for Novelty Detection Bernhard Schölkof, Robert Williams, Alex Smola, John Shawe-Taylor, John Platt 14
Problem Formulation Suppose we are given a training ample drawn from an underlying distribution We want to a estimate a “simple” subset such that for a test point drawn from the distribution , We approach the problem by trying to estimate a function which is positive on and negative on the complement 15
The SV Approach to QE The class is defined as the set of halfspaces in a feature space (via kernel ) Here we define , where are respectively a weight vector and an offset parameterizing a hyperplane in 16
“Hey, Just a second” If we use hyperplanes & offsets, doesn’t it mean we separate the “positive” sample? But, separate from what? From the Origin 17
OCSVM 18
OCSVM Serves as a penalizer like “C” in the 2 -class svm (recall thatthe data set) from the origin To separate we solve the following quadric program: subject to Notice that no “y”s are incorporated in the constraint since there are no labels 19
OCSVM The decision is therefore: Since the slack variables are penalized in the objective function, we can expect that if and solve the problem then will equal 1 for most example in the training set, while still stays small 20
OCSVM Using multipliers we get the Lagrangian, 21
OCSVM Setting the derivatives of yields: w. r. t to 0 1) 2) , 22
OCSVM Eq. 1 transforms into a kernel expansion: The offset eq can by exploiting Substituting. 1 be & 2 recovered into yields the dual that for any the corresponding problem: pattern satisfies: subject to , 23
- Property Assume the solution of the primal problem satisfies. The following statements hold: n is an upper bound on the fraction outliers. n is a lower bound on the fraction SVs n With probability 1, asymptotically, equals both the fraction of SVs and the fraction of outliers. (under certain conditions of P(x) and the kernel) 24
X axis – svm magnitude y axis – frequency Results – USPS (“ 0”) For = 50%, we get: 50% SVs 49% Outliers For = 5%, we get: 6% SVs 4% Outliers 25
OCSM - Shortcomings Implicitly assumes that the “negative” data lies around the origin. Ignores completely “negative” data even if such data partially exist. 26
Support Vector Novelty Detection Applied to Jet Engine Vibration Spectra Paul Hyton, Bernhard Schölkof, Lionel Tarassenko, Paul Anuzis 27
Intro. Jet engines have pass-off tests before they can be delivered to the customer. Through vibration tests an engine’s “vibration signature” can be extracted While normal vibration signatures are common, we may be short of abnormal signatures. Or even worse, the engine under test may show up a type of abnormality which has never been seen before. 28
Feature Selection A vibration gauges are attached to the engine’s case The engine under test is slowly accelerated from idle to full speed and decelerated back to idle The vibration signal is then recorded The final feature is calculated over a weighted average of the vibration for 10 different speed ranges Thus yielding a 10 -D vector 29
Algorithm Slightly more general than the regular OCSVM In addition to the “normal” data points we take into account some abnormal points Rather than separating from the origin we separate from the mean of 30
Primal Form subject to and the decision function is 31
Dual Form where and subject to , 32
2 D Toy Example 33
Training Data 99 Normal Engines were used as training data 40 Normal Engines were used as validation data 23 Abnormal Engines used as test data 34
Standard OCSM Results 35
Modified OCSM Results 36
- Ocsvm
- How to write novelty lead
- Trait theory consumer behaviour
- Novelty stage
- Internal dialectics examples
- Observatiematrix
- Novelty search
- Novelty news value
- Novelty seeking
- Novelty stage
- Margin in svm
- Svm exercise solutions
- Andrew ng support vector machine
- Svm cost function
- Sgu svm academic calendar
- Svm pwm
- Smartschool svm
- Latent svm
- Svm in weka
- Svm rvm
- Svm
- Svm martin
- Svm lecture
- Svm classifier
- Svm.fox_
- Quadprog matlab svm
- Svm kernel
- Svm advantages and disadvantages
- Soft margin svm sklearn
- Svm
- Convolution layer
- Svm advantages and disadvantages
- Tufts cluster
- Transductive support vector machines
- Svm lecture
- Example of a sentence outline
- Paragraphs "conclusion introduction" introductory