Building ICAS with Hadoop and HBase WeiYu Chen
Building ICAS with Hadoop and HBase Wei-Yu Chen, Yao-Tsung Wang National Center for High-Performance Computing, Taiwan {waue, jazz}@nchc. org. tw DATE: 25/6/09
Outline Background & Issues Motivation & Proposal ICAS Architecture ICAS Procedure Experiment Results Pros and Cons Conclusions 2
IDS Introduction Intrusion Detection System (IDS) u Detecting unwanted attempts at accessing, manipulating or disabling of computer systems through Internet. Alert u Be produced when IDS detect something as malicious. 3
HIDS v. s. NIDS Host IDS u Easy to control and maintain Network IDS (NIDS) Monitoring network traffic both incoming and outgoing. u Alerts are much more and more complex. u 4
Host IDS is easily to maintain 5
Network IDS can only show many alerts! 6
What’s problem about Alerts of NIDS ? Difficult to realize the overall accidents 2. Huge Data less Efficient 3. Ignoring the crucial information easily !!! 4. Got Nothing if the database were crash 1. 7
Our Motivation To resolve above problems come with huge amount of anomaly information generated by NIDS So, we need: Reducing redundancy u Merge relation u Higher capability u Fault tolerance u 8
Our IDEA - ICAS, IDS Cloud Analysis System Applying Cloud Computing technique Higher capability u Fault tolerance u Making alerts algorism to generate manifest report Reducing redundancy u Merge relation u 9
System Architecture ICAS Component Overview 10
System Architecture Snort SNORT is an open source network intrusion prevention and detection system The most widely deployed intrusion detection 11
System Architecture Why is Snort The most popular (over 1 m downloads and 200 k registered users) Open source network IDS Support Windows and Linux Light weight and easy to extend High accuracy and performance 12
System Architecture Hadoop Apache Hadoop Core is a software platform that lets one easily write and run applications that process vast amounts of data. Inspired by Google's Map. Reduce and Google File System (GFS) papers Implements Map. Reduce and Hadoop Distributed File System (HDFS) Operates <key, value> pairs 13
System Architecture Why is Hadoop The most popular open source Cloud platform Support its API for developments Scalable, economical, efficient, and reliable Scaling Hadoop to 4000 nodes at Yahoo! (2008 -09) u Hadoop Sorts a Petabyte in 16. 25 Hours and a Terabyte in 62 Seconds (2009 -05) u 14
System Architecture HBase is the Hadoop database An open-source, distributed, column-oriented store modeled after the Google paper, Big. Table 15
System Architecture Why is HBase The Hadoop database u Output can installed into HBase directly Support its API for development 16
System Architecture Four Components Regular Parser u Parsing original snort log and transfer to HDFS (hadoop file system) Analysis Procedure u Dispatch job if pool is not empty and insert the result into database Data Mapper u <key, value> mapping Data Reducer <“key 1”, value 1…value. N> u <“key 2”, value 1…value. N> u 17
Program Procedure 18
Alert Integration Procedure 19
Key - Values The victim IP addresses A unique ID used to identify attack method in Snort rules The time when the attack was launghed TCP/IP protocol Victim ports The IP address where malicious one launghed attack Attack was lunched from this port 20
Alert Merge Example Destination IP Attack Signature Source IP Destination Port Source Port Packet Protocol Timestamp Host_1 Trojan Sip 1 80 4077 tcp T 1 Host_1 Trojan Sip 2 80 4077 tcp T 2 Host_1 Trojan Sip 1 443 5002 tcp T 3 Host_2 Trojan Sip 1 443 5002 tcp T 4 Host_3 D. D. O. S Sip 3 53 6007 udp T 5 Host_3 D. D. O. S Sip 4 53 6008 tcp T 5 Host_3 D. D. O. S Sip 5 53 6007 udp T 5 Host_3 D. D. O. S Sip 6 53 6008 tcp T 5 Key Values Host_1 Trojan Sip 1, Sip 2 80, 443 4077, 5002 tcp T 1, T 2, T 3 Host_2 Trojan Sip 1 443 5002 tcp T 4 Host_3 D. D. O. S. Sip 3, Sip 4, Sip 5 , Sip 6 53 6007, 6008 tcp, udp T 5 21
Experiment Environment Machine: ( 6 nodes) u CPU : Intel quad-core, Memory : 2 G OS : Linux : Ubuntu 8. 04 server Software Hadoop : core 0. 16. 4 u Hbase : 0. 1. 3 u Java : 6 u Alerts Data Sets MIT Lincoln Laboratory, Lincoln Lab Data Sets u Computer Security group at UCDavis, tcpdump file u 22
Experimental Result The Consuming Time of Each Number of Data Sets 23
Experimental Result Throughput Data Overall 24
Pros & Cons Legible Efficient Scalable Economical Reliable Non-realtime Latency Immature 25
Hadoop Development Issues Absolute dependency of Hadoop Version ( Neither backward nor upward compatibility) u u u ICAS can work on Hadoop 0. 16. 4. ICAS has 8 errors and 8 deprecations on Hadoop 0. 18. 3 ICAS has 26 errors and 22 deprecations on hadoop 0. 20. 0 A word-count sample code on hadoop 0. 20 can’t work for hadoop 0. 18 HBase’s “A” version is only correspond to Hadoop’s “A” version (upper or lower not work) Sample codes are hardly to find 26
Conclusions v. s. Future Works ICAS supplies a efficient way to analyze and merge huge number of alerts based on cloud platform. Support more types of IDS Visualize the final results Prepare more large-scale and complete experiment 27
Thank You ! & Question ? DATE: 25/06/09 28
- Slides: 28