Data and Applications Security Developments and Directions Dr

  • Slides: 30
Download presentation
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas

Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #20 Guest Lecture Data Mining for Intrusion Detection

Data Mining &Intrusion Detection Systems Mamoun Awad Dept. of Computer Science University of Texas

Data Mining &Intrusion Detection Systems Mamoun Awad Dept. of Computer Science University of Texas at Dallas

Outline l Intrusion l Data Detection Mining l Approach l Data set & Results

Outline l Intrusion l Data Detection Mining l Approach l Data set & Results

What is an intrusion? l An intrusion can be defined as “any set of

What is an intrusion? l An intrusion can be defined as “any set of actions that attempt to compromise the: ¡ Integrity ¡ confidentiality, or ¡ availability of a resource”.

Intrusion Examples Virus l Buffer-overflows l ¡ l Denial of Service (DOS) ¡ l

Intrusion Examples Virus l Buffer-overflows l ¡ l Denial of Service (DOS) ¡ l explicit attempt by attackers to prevent legitimate users of a service from using that service. Address spoofing ¡ l 2000 Outlook Express vulnerability. a malicious user uses a fake IP address to send malicious packets to a target. Many others ¡ R 2 L, U 2 R, Probe, …

Intrusion Detection System (IDS) l An Intrusion Detection System (IDS) inspects all inbound and

Intrusion Detection System (IDS) l An Intrusion Detection System (IDS) inspects all inbound and outbound network activity and identifies suspicious patterns that may indicate a network or system attack from someone attempting to break into or compromise a system.

Attack Types ¡ Host-based l attacks Gain access to privileged services or resources on

Attack Types ¡ Host-based l attacks Gain access to privileged services or resources on a machine. ¡ Network-based l attacks Make it difficult for legitimate users to access various network services

IDS Categories l Intrusion detection systems are split into two groups: ¡ Anomaly detection

IDS Categories l Intrusion detection systems are split into two groups: ¡ Anomaly detection systems l Identify malicious traffic based on deviations from established normal network. ¡ Misuse l detection systems Identify intrusions based on a known pattern (signatures) for the malicious activity.

Problem Statement l Goal of Intrusion Detection Systems (IDS): ¡ l To detect an

Problem Statement l Goal of Intrusion Detection Systems (IDS): ¡ l To detect an intrusion as it happens and be able to respond to it. False positives: ¡ ¡ A false positive is a situation where something abnormal (as defined by the IDS) happens, but it is not an intrusion. Too many false positives l l User will quit monitoring IDS because of noise. False negatives: ¡ A false negative is a situation where an intrusion is really happening, but IDS doesn't catch it.

Layered Security Mechanism

Layered Security Mechanism

Problem Statement l Misuse Detection

Problem Statement l Misuse Detection

Firewalls

Firewalls

Firewall Rules Order Protocol source IP source destination action Port IP Port

Firewall Rules Order Protocol source IP source destination action Port IP Port

Hierarchical Distributed Firewall Setup

Hierarchical Distributed Firewall Setup

Problem Statement l Anomaly Detection

Problem Statement l Anomaly Detection

Our Approach Class Training Data SVM Class Training Problem? ? ? Testing Data

Our Approach Class Training Data SVM Class Training Problem? ? ? Testing Data

Our Approach Class Hierarchical Training Clustering (DGSOT) Data SVM Class Training Testing Data

Our Approach Class Hierarchical Training Clustering (DGSOT) Data SVM Class Training Testing Data

Dynamically Growing Self-Organizing Tree Algorithm (DGSOT)

Dynamically Growing Self-Organizing Tree Algorithm (DGSOT)

DGOST l Learning Process ¡ Winner Node ¡ Update the Tree ¡ Stopping Criteria

DGOST l Learning Process ¡ Winner Node ¡ Update the Tree ¡ Stopping Criteria

Support Vector Machine l Support Vector Machines (SVM) ¡ One of the most powerful

Support Vector Machine l Support Vector Machines (SVM) ¡ One of the most powerful classification techniques ¡ Find hyper-plane that separates classes ¡ Based on the idea of mapping data points to a high dimensional feature space where a separating hyper-plane can be found

The value of support vectors and non-support vectors

The value of support vectors and non-support vectors

The effect of adding new data points on the margins

The effect of adding new data points on the margins

Feature Mapping Feature mapping from two dimensional input space to a two dimensional feature

Feature Mapping Feature mapping from two dimensional input space to a two dimensional feature space.

SVM Limitations ¡ Long l training time limits use. Clustering has a positive impact

SVM Limitations ¡ Long l training time limits use. Clustering has a positive impact on the training of an SVM -- each cluster is represented by only one reference • Reduce training time • Degrade generalization -- we use a fewer number of points.

Hierarchical clustering with SVM flow chart

Hierarchical clustering with SVM flow chart

Training set l 1998 DARPA data that originated from the MIT Lincoln Lab ¡

Training set l 1998 DARPA data that originated from the MIT Lincoln Lab ¡ http: //kdd. ics. uci. edu/databases/kddcup 99. html l Size: 1012, 477 data point

Data set / Attack Types l DOS ¡ l R 2 L ¡ l

Data set / Attack Types l DOS ¡ l R 2 L ¡ l unauthorized access from a remote machine, e. g. guessing password; U 2 R ¡ l denial-of-service unauthorized access to local super user (root) privileges, e. g. , various ``buffer overflow'' attacks; Probing ¡ surveillance and other probing, e. g. , port scanning.

Results Methods Weighted Accuracy Random Selection 62. 5% Pure SVM 62. 74% SVM+Rocchio Bundling

Results Methods Weighted Accuracy Random Selection 62. 5% Pure SVM 62. 74% SVM+Rocchio Bundling 63. 09% SVM + DGSOT 63. 34% Average Accuracy 62. 61% 62. 75% 63. 11% 63. 36% Average Training Time 0. 049 hours 0. 51 hours 0. 93 hours 0. 26 hours Average FP rate Average FN rate 22. 40% 37. 38% 30. 75% 37, 24% 30. 98% 36. 89% 51. 56% 36. 64%

Relevant and Important Publications “A Dynamical Growing Self-Organizing Tree (DGSOT) for Hierarchical Clustering Gene

Relevant and Important Publications “A Dynamical Growing Self-Organizing Tree (DGSOT) for Hierarchical Clustering Gene Expression Profiles, ” Feng Luo, Latifur Khan , Farokh Bastani, I-Ling Yen and J. Zhou, the Bioinformatics Journal, Oxford University Press, UK, 20 16, (November 2004) 2605 -2617. l “Automatic Image Annotation and Retrieval using Weighted Feature Selection” Lei Wang and Latifur Khan to appear in a special issue in Multimedia Tools and Applications, Kulwer Publisher. l “Hierarchical Clustering for Complex Data” Latifur Khan and Feng Luo, to appear in International Journal on Artificial Intelligence Tools, World Scientific publishers. l “A New Intrusion Detection System using Support Vector Machines and Hierarchical Clustering” Latifur Khan, Mamoun Awad, and Bhavani Thuraisingham, to appear in VLDB Journal: The International Journal on Very Large Databases, ACM/Springer-Verlag Publishing. l

Relevant and Important Publications l R. Lippman J. Haines, D. Fried. , J. Korba,

Relevant and Important Publications l R. Lippman J. Haines, D. Fried. , J. Korba, and K. Das, “The 1999 DARPA off-line intrusion detection evaluation” , Computer Networks, 34, pp. 579 -595, 2000.