Anomaly Detection in Data Science Oneclass Classification with

  • Slides: 34
Download presentation
Anomaly Detection in Data Science One-class Classification with Privileged Information for Malware Detection Pavel

Anomaly Detection in Data Science One-class Classification with Privileged Information for Malware Detection Pavel Erofeev, IITP RAS, Airbus Group Russia

Find the Panda

Find the Panda

Anomaly Detection: Hadlum vs Hadlum ◎ The birth of a child to Mrs. Hadlum

Anomaly Detection: Hadlum vs Hadlum ◎ The birth of a child to Mrs. Hadlum happened 349 days after Mr. Haldum left for military service ◎ Average human pregnancy period is 280 days (40 weeks) ◎ Statistically, 39 days is an outlier

“ An outlier is an observation which deviates so much from other observations as

“ An outlier is an observation which deviates so much from other observations as to arouse suspicions that it was generated by different mechanism Howkins, 1980

Defining Anomaly Detection ◎ Digital representation vectors describing observations ◎ Mixture of “nominal” and

Defining Anomaly Detection ◎ Digital representation vectors describing observations ◎ Mixture of “nominal” and “abnormal” points ◎ Anomaly points are generated by different generative process than the nominal points

Possible Settings in CS ◎ Supervised (Know attacks) ○ Training data labeled with “nominal”

Possible Settings in CS ◎ Supervised (Know attacks) ○ Training data labeled with “nominal” or “anomaly” ◎ Clean (Zero-day attacks) ○ Training data are all “nominal”, test data may be contaminated with “anomaly” ◎ Unsupervised (Unknown attacks) ○ Training data consists of mixture of “nominal” and “anomaly” points

Real World Data Problems ◎ Data is multivariate ◎ There is usually more than

Real World Data Problems ◎ Data is multivariate ◎ There is usually more than one generating mechanism underlying the “normal” data ◎ Anomalies may represent a different class objects, so there sre many of them of ◎ Domain specific definition of what to count as anomaly ◎ Normality evaolves in time 7

Anomaly Taxonomy Point Anomaly 8

Anomaly Taxonomy Point Anomaly 8

Anomaly Taxonomy Contextual Anomaly 9

Anomaly Taxonomy Contextual Anomaly 9

Anomaly Taxonomy Causal Anomaly 10

Anomaly Taxonomy Causal Anomaly 10

Taxonomy

Taxonomy

Imbalanced classification ■ Normal data - a lot of samples ■ Abnormal - very

Imbalanced classification ■ Normal data - a lot of samples ■ Abnormal - very few ■ Standard methods do not work as expected ■ Standard metrics do not apply 12

Imbalanced classification ◎Weights for classes ○ Proved not to be helpful in most cases

Imbalanced classification ◎Weights for classes ○ Proved not to be helpful in most cases ◎Resampling methods ○ Oversampling (Bootstrap, SMOTE, etc. ) ○ Undersampling ◎How to choose which method to use? ◎How to choose resampling parameter? ○ We compared several methods ○ We proposed a meta-model that on average gives best results [Papanov , Erofeev, Burnaev, 2015]

Statistics-based models ◎ Assumption on normal data generation procedure (e. g. Gaussian distribution, etc.

Statistics-based models ◎ Assumption on normal data generation procedure (e. g. Gaussian distribution, etc. ) ◎ PCA is a method commonly used to extract most variant combinations in data ◎ PCA based anomaly detection is good for highly correlated environments 14

Density-based models ◎SVM-based and nearest neighbours based ◎How to choose best kernel parameter? 15

Density-based models ◎SVM-based and nearest neighbours based ◎How to choose best kernel parameter? 15

One-class SVM with Privileged Information Evgeny Burnaev Dmitry Smolyakov Skoltech, IITP RAS

One-class SVM with Privileged Information Evgeny Burnaev Dmitry Smolyakov Skoltech, IITP RAS

One-Class SVM

One-Class SVM

One-Class SVM

One-Class SVM

One-Class SVM

One-Class SVM

One-Class SVM Kernel Trick

One-Class SVM Kernel Trick

Kernel Trick

Kernel Trick

Hyper-parameter Influence

Hyper-parameter Influence

Decision Functions

Decision Functions

Learning with Privileged Info Example: Image classification with textual description

Learning with Privileged Info Example: Image classification with textual description

Learning with Privileged Info

Learning with Privileged Info

Learning with Privileged Info

Learning with Privileged Info

Learning with Privileged Info

Learning with Privileged Info

Microsoft Malware Classification Challenge Kaggle. competition data (2015)

Microsoft Malware Classification Challenge Kaggle. competition data (2015)

Problem Description ◎ 9 malware families ○ Rumnit, Lollipop, Kelihos ver 3, Vundo, Simda,

Problem Description ◎ 9 malware families ○ Rumnit, Lollipop, Kelihos ver 3, Vundo, Simda, Tracur, Kelihos ver 1, Obfuscator. ACY, Gatak ◎ Raw data ○ Hexadecimal representation of the raw binary content ○ Meta-data extracted from the binaries, including function calls, strings, etc.

Features ◎ Original features ○ Information from binary files such as ◉ Frequencies of

Features ◎ Original features ○ Information from binary files such as ◉ Frequencies of bytes ◉ Number of different N-grams, etc. ◎ Privileged features ○ Information from code disassemble such as ◉ Frequencies of commands ◉ Number of calls to external dlls ○ Bytecode as an image ◉ Features based on image texture which is commonly used for image classification

Features

Features

Experimental Setup

Experimental Setup

Results

Results

Thanks! Any questions? pavel. erofeev@phystech. e

Thanks! Any questions? pavel. erofeev@phystech. e