Data and Applications Security Developments and Directions Dr

Data Mining &Intrusion Detection Systems Mamoun Awad Dept. of Computer Science University of Texas

Outline l Intrusion l Data Detection Mining l Approach l Data set & Results

What is an intrusion? l An intrusion can be defined as “any set of

Intrusion Examples Virus l Buffer-overflows l ¡ l Denial of Service (DOS) ¡ l

Intrusion Detection System (IDS) l An Intrusion Detection System (IDS) inspects all inbound and

Attack Types ¡ Host-based l attacks Gain access to privileged services or resources on

IDS Categories l Intrusion detection systems are split into two groups: ¡ Anomaly detection

Problem Statement l Goal of Intrusion Detection Systems (IDS): ¡ l To detect an

Firewall Rules Order Protocol source IP source destination action Port IP Port

Our Approach Class Training Data SVM Class Training Problem? ? ? Testing Data

Our Approach Class Hierarchical Training Clustering (DGSOT) Data SVM Class Training Testing Data

Dynamically Growing Self-Organizing Tree Algorithm (DGSOT)

DGOST l Learning Process ¡ Winner Node ¡ Update the Tree ¡ Stopping Criteria

Support Vector Machine l Support Vector Machines (SVM) ¡ One of the most powerful

The value of support vectors and non-support vectors

The effect of adding new data points on the margins

Feature Mapping Feature mapping from two dimensional input space to a two dimensional feature

SVM Limitations ¡ Long l training time limits use. Clustering has a positive impact

Hierarchical clustering with SVM flow chart

Training set l 1998 DARPA data that originated from the MIT Lincoln Lab ¡

Data set / Attack Types l DOS ¡ l R 2 L ¡ l

Results Methods Weighted Accuracy Random Selection 62. 5% Pure SVM 62. 74% SVM+Rocchio Bundling

Relevant and Important Publications “A Dynamical Growing Self-Organizing Tree (DGSOT) for Hierarchical Clustering Gene

Relevant and Important Publications l R. Lippman J. Haines, D. Fried. , J. Korba,

Slides: 30

Download presentation

Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #20 Guest Lecture Data Mining for Intrusion Detection

Data Mining &Intrusion Detection Systems Mamoun Awad Dept. of Computer Science University of Texas at Dallas

Outline l Intrusion l Data Detection Mining l Approach l Data set & Results

What is an intrusion? l An intrusion can be defined as “any set of actions that attempt to compromise the: ¡ Integrity ¡ confidentiality, or ¡ availability of a resource”.

Intrusion Examples Virus l Buffer-overflows l ¡ l Denial of Service (DOS) ¡ l explicit attempt by attackers to prevent legitimate users of a service from using that service. Address spoofing ¡ l 2000 Outlook Express vulnerability. a malicious user uses a fake IP address to send malicious packets to a target. Many others ¡ R 2 L, U 2 R, Probe, …

Intrusion Detection System (IDS) l An Intrusion Detection System (IDS) inspects all inbound and outbound network activity and identifies suspicious patterns that may indicate a network or system attack from someone attempting to break into or compromise a system.

Attack Types ¡ Host-based l attacks Gain access to privileged services or resources on a machine. ¡ Network-based l attacks Make it difficult for legitimate users to access various network services

IDS Categories l Intrusion detection systems are split into two groups: ¡ Anomaly detection systems l Identify malicious traffic based on deviations from established normal network. ¡ Misuse l detection systems Identify intrusions based on a known pattern (signatures) for the malicious activity.

Problem Statement l Goal of Intrusion Detection Systems (IDS): ¡ l To detect an intrusion as it happens and be able to respond to it. False positives: ¡ ¡ A false positive is a situation where something abnormal (as defined by the IDS) happens, but it is not an intrusion. Too many false positives l l User will quit monitoring IDS because of noise. False negatives: ¡ A false negative is a situation where an intrusion is really happening, but IDS doesn't catch it.

Layered Security Mechanism

Problem Statement l Misuse Detection

Firewalls

Firewall Rules Order Protocol source IP source destination action Port IP Port

Hierarchical Distributed Firewall Setup

Problem Statement l Anomaly Detection

Our Approach Class Training Data SVM Class Training Problem? ? ? Testing Data

Our Approach Class Hierarchical Training Clustering (DGSOT) Data SVM Class Training Testing Data

Dynamically Growing Self-Organizing Tree Algorithm (DGSOT)

DGOST l Learning Process ¡ Winner Node ¡ Update the Tree ¡ Stopping Criteria

Support Vector Machine l Support Vector Machines (SVM) ¡ One of the most powerful classification techniques ¡ Find hyper-plane that separates classes ¡ Based on the idea of mapping data points to a high dimensional feature space where a separating hyper-plane can be found

The value of support vectors and non-support vectors

The effect of adding new data points on the margins

Feature Mapping Feature mapping from two dimensional input space to a two dimensional feature space.

SVM Limitations ¡ Long l training time limits use. Clustering has a positive impact on the training of an SVM -- each cluster is represented by only one reference • Reduce training time • Degrade generalization -- we use a fewer number of points.

Hierarchical clustering with SVM flow chart

Training set l 1998 DARPA data that originated from the MIT Lincoln Lab ¡ http: //kdd. ics. uci. edu/databases/kddcup 99. html l Size: 1012, 477 data point

Data set / Attack Types l DOS ¡ l R 2 L ¡ l unauthorized access from a remote machine, e. g. guessing password; U 2 R ¡ l denial-of-service unauthorized access to local super user (root) privileges, e. g. , various ``buffer overflow'' attacks; Probing ¡ surveillance and other probing, e. g. , port scanning.

Results Methods Weighted Accuracy Random Selection 62. 5% Pure SVM 62. 74% SVM+Rocchio Bundling 63. 09% SVM + DGSOT 63. 34% Average Accuracy 62. 61% 62. 75% 63. 11% 63. 36% Average Training Time 0. 049 hours 0. 51 hours 0. 93 hours 0. 26 hours Average FP rate Average FN rate 22. 40% 37. 38% 30. 75% 37, 24% 30. 98% 36. 89% 51. 56% 36. 64%

Relevant and Important Publications “A Dynamical Growing Self-Organizing Tree (DGSOT) for Hierarchical Clustering Gene Expression Profiles, ” Feng Luo, Latifur Khan , Farokh Bastani, I-Ling Yen and J. Zhou, the Bioinformatics Journal, Oxford University Press, UK, 20 16, (November 2004) 2605 -2617. l “Automatic Image Annotation and Retrieval using Weighted Feature Selection” Lei Wang and Latifur Khan to appear in a special issue in Multimedia Tools and Applications, Kulwer Publisher. l “Hierarchical Clustering for Complex Data” Latifur Khan and Feng Luo, to appear in International Journal on Artificial Intelligence Tools, World Scientific publishers. l “A New Intrusion Detection System using Support Vector Machines and Hierarchical Clustering” Latifur Khan, Mamoun Awad, and Bhavani Thuraisingham, to appear in VLDB Journal: The International Journal on Very Large Databases, ACM/Springer-Verlag Publishing. l

Relevant and Important Publications l R. Lippman J. Haines, D. Fried. , J. Korba, and K. Das, “The 1999 DARPA off-line intrusion detection evaluation” , Computer Networks, 34, pp. 579 -595, 2000.