Data and Applications Security Developments and Directions Dr






























- Slides: 30
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #20 Guest Lecture Data Mining for Intrusion Detection
Data Mining &Intrusion Detection Systems Mamoun Awad Dept. of Computer Science University of Texas at Dallas
Outline l Intrusion l Data Detection Mining l Approach l Data set & Results
What is an intrusion? l An intrusion can be defined as “any set of actions that attempt to compromise the: ¡ Integrity ¡ confidentiality, or ¡ availability of a resource”.
Intrusion Examples Virus l Buffer-overflows l ¡ l Denial of Service (DOS) ¡ l explicit attempt by attackers to prevent legitimate users of a service from using that service. Address spoofing ¡ l 2000 Outlook Express vulnerability. a malicious user uses a fake IP address to send malicious packets to a target. Many others ¡ R 2 L, U 2 R, Probe, …
Intrusion Detection System (IDS) l An Intrusion Detection System (IDS) inspects all inbound and outbound network activity and identifies suspicious patterns that may indicate a network or system attack from someone attempting to break into or compromise a system.
Attack Types ¡ Host-based l attacks Gain access to privileged services or resources on a machine. ¡ Network-based l attacks Make it difficult for legitimate users to access various network services
IDS Categories l Intrusion detection systems are split into two groups: ¡ Anomaly detection systems l Identify malicious traffic based on deviations from established normal network. ¡ Misuse l detection systems Identify intrusions based on a known pattern (signatures) for the malicious activity.
Problem Statement l Goal of Intrusion Detection Systems (IDS): ¡ l To detect an intrusion as it happens and be able to respond to it. False positives: ¡ ¡ A false positive is a situation where something abnormal (as defined by the IDS) happens, but it is not an intrusion. Too many false positives l l User will quit monitoring IDS because of noise. False negatives: ¡ A false negative is a situation where an intrusion is really happening, but IDS doesn't catch it.
Layered Security Mechanism
Problem Statement l Misuse Detection
Firewalls
Firewall Rules Order Protocol source IP source destination action Port IP Port
Hierarchical Distributed Firewall Setup
Problem Statement l Anomaly Detection
Our Approach Class Training Data SVM Class Training Problem? ? ? Testing Data
Our Approach Class Hierarchical Training Clustering (DGSOT) Data SVM Class Training Testing Data
Dynamically Growing Self-Organizing Tree Algorithm (DGSOT)
DGOST l Learning Process ¡ Winner Node ¡ Update the Tree ¡ Stopping Criteria
Support Vector Machine l Support Vector Machines (SVM) ¡ One of the most powerful classification techniques ¡ Find hyper-plane that separates classes ¡ Based on the idea of mapping data points to a high dimensional feature space where a separating hyper-plane can be found
The value of support vectors and non-support vectors
The effect of adding new data points on the margins
Feature Mapping Feature mapping from two dimensional input space to a two dimensional feature space.
SVM Limitations ¡ Long l training time limits use. Clustering has a positive impact on the training of an SVM -- each cluster is represented by only one reference • Reduce training time • Degrade generalization -- we use a fewer number of points.
Hierarchical clustering with SVM flow chart
Training set l 1998 DARPA data that originated from the MIT Lincoln Lab ¡ http: //kdd. ics. uci. edu/databases/kddcup 99. html l Size: 1012, 477 data point
Data set / Attack Types l DOS ¡ l R 2 L ¡ l unauthorized access from a remote machine, e. g. guessing password; U 2 R ¡ l denial-of-service unauthorized access to local super user (root) privileges, e. g. , various ``buffer overflow'' attacks; Probing ¡ surveillance and other probing, e. g. , port scanning.
Results Methods Weighted Accuracy Random Selection 62. 5% Pure SVM 62. 74% SVM+Rocchio Bundling 63. 09% SVM + DGSOT 63. 34% Average Accuracy 62. 61% 62. 75% 63. 11% 63. 36% Average Training Time 0. 049 hours 0. 51 hours 0. 93 hours 0. 26 hours Average FP rate Average FN rate 22. 40% 37. 38% 30. 75% 37, 24% 30. 98% 36. 89% 51. 56% 36. 64%
Relevant and Important Publications “A Dynamical Growing Self-Organizing Tree (DGSOT) for Hierarchical Clustering Gene Expression Profiles, ” Feng Luo, Latifur Khan , Farokh Bastani, I-Ling Yen and J. Zhou, the Bioinformatics Journal, Oxford University Press, UK, 20 16, (November 2004) 2605 -2617. l “Automatic Image Annotation and Retrieval using Weighted Feature Selection” Lei Wang and Latifur Khan to appear in a special issue in Multimedia Tools and Applications, Kulwer Publisher. l “Hierarchical Clustering for Complex Data” Latifur Khan and Feng Luo, to appear in International Journal on Artificial Intelligence Tools, World Scientific publishers. l “A New Intrusion Detection System using Support Vector Machines and Hierarchical Clustering” Latifur Khan, Mamoun Awad, and Bhavani Thuraisingham, to appear in VLDB Journal: The International Journal on Very Large Databases, ACM/Springer-Verlag Publishing. l
Relevant and Important Publications l R. Lippman J. Haines, D. Fried. , J. Korba, and K. Das, “The 1999 DARPA off-line intrusion detection evaluation” , Computer Networks, 34, pp. 579 -595, 2000.