Network Intrusion Detection Using Random Forests Jiong Zhang



















- Slides: 19

Network Intrusion Detection Using Random Forests Jiong Zhang Mohammad Zulkernine School of Computing Queen's University Kingston, Ontario, Canada

Outline n n n n Motivation Intrusion detection system Data mining meets intrusion detection Proposed architecture Challenges and solutions Experimental results Conclusion and future work PST 2005 Jiong Zhang and Mohammad Zulkernine 2

Motivation n Intrusion Prevention System (firewall) can not prevent all attacks. Intruder Victim Intruder Firewall Internet PST 2005 Jiong Zhang and Mohammad Zulkernine 3

Motivation (contd. ) Statistical data for intrusions • Total losses of 2004 (reported): $141, 496, 560. n Source: FBI survey for Year 2004 • 50% of security breaches are undetected. n PST 2005 Source: FBI Statistics for Year 2000 Jiong Zhang and Mohammad Zulkernine 4

Intrusion Detection Techniques n Misuse Detection • • • n Extracts patterns of known intrusions Cannot detect novel intrusions Has low false positive rate Anomaly Detection • Builds profiles for normal activities • Uses the deviations from the profiles to detect attacks • Can detect unknown attacks • Has high false positive rate PST 2005 Jiong Zhang and Mohammad Zulkernine 5

Network Intrusion Detection System (NIDS) n n PST 2005 Monitors network traffic to detect intrusions Monitors more targets on a network Detects some attacks that hostbased systems miss Does not affect network operations Jiong Zhang and Mohammad Zulkernine 6

Current NIDS Many current NIDSs (like snort) : n Rule-based n Unable to detect novel attacks n High maintenance cost PST 2005 Jiong Zhang and Mohammad Zulkernine 7

Rule Based vs. Data Mining n Rule based systems Intrusion Data n Security Experts Rules Data mining based systems Labeled Data PST 2005 Data Mining Engine Jiong Zhang and Mohammad Zulkernine Patterns 8

Data Mining Meets Intrusion Detection n n PST 2005 Extract patterns of intrusions for misuse detection Build profiles of normal activities for anomaly detection Build classifiers to detect attacks Some IDSs have successfully applied data mining techniques in intrusion detection Jiong Zhang and Mohammad Zulkernine 9

Proposed Architecture Database (On line) Networks Packets Sensors Audited data On-line Pre. Processors Alarms Feature vectors Detector Patterns Training data Data Set Off-line Preprocessor Feature vectors Alarmer On line Off line Pattern Builder Database (Off line) Architecture of the proposed NIDS PST 2005 Jiong Zhang and Mohammad Zulkernine 10

Random Forests n n n Unsurpassable in accuracy among the current data mining algorithms Runs efficiently on large data set with many features Gives the estimates of what features are important No nominal data problem No over-fitting PST 2005 Jiong Zhang and Mohammad Zulkernine 11

Imbalanced Intrusion Problems n • Higher error rate for minority intrusions • Some minority intrusions are more dangerous • Need to improve the performance for the minority intrusions Proposed Solution n • Down-sample the majority intrusions and over-sample the minority intrusions PST 2005 Jiong Zhang and Mohammad Zulkernine 12

Feature Selection n PST 2005 Essential for improving detection rate Reduces the computational cost Many NIDSs select features by intuition or the domain knowledge Jiong Zhang and Mohammad Zulkernine 13

Feature Selection over the KDD’ 99 Dataset n n PST 2005 Calculate variable importance using random forests. Select the 38 most important features in detection. Jiong Zhang and Mohammad Zulkernine 14

Some Features n The two most important features • Feature 3. service type, such as http, telnet, and ftp • Feature 23. count, # connections to the same host as the current one during past two seconds n The three least important features • Feature 7. land, 1 if connection is from/to the same host/port; 0 otherwise • Feature 20. num_outbound_cmds, # of outbound commands in an ftp session • Feature 21. is_hot_login, 1 if the login belongs to the “hot” list; 0 otherwise PST 2005 Jiong Zhang and Mohammad Zulkernine 15

Parameter Optimization for Random Forests n n PST 2005 Optimize the parameter Mtry of random forests to improve detection rate. Choose 15 as the optimal value, which reaches the minimum of the oob error rate. Jiong Zhang and Mohammad Zulkernine 16

Performance Comparison on the KDD’ 99 Dataset n n PST 2005 Our approach provides lower overall error rate and cost compared to the best KDD’ 99 result. Feature selection can improve the performance of intrusion detection. Jiong Zhang and Mohammad Zulkernine 17

Conclusion and Future Work n n n PST 2005 Random forests algorithm can help improve detection performance and select features. Sampling techniques can reduce the time to build patterns and increase the detection rate of minority intrusions. In future, we will focus on anomaly detection and a multiple classifier architecture. Jiong Zhang and Mohammad Zulkernine 18

PST 2005 Jiong Zhang and Mohammad Zulkernine 19