Netflow DataMining Techniques Chris Poetzel Argonne National Laboratory
Netflow Data-Mining Techniques Chris Poetzel Argonne National Laboratory cpoetzel@anl. gov Scott Pinkerton 22 July 2004 ESCC Meeting
Netflow Data Mining • Argonne Background Information • Sliding Window Analysis • Using Contextual Knowledge to adjust data-mining • Incident Investigation • Integration, Integration • Future • Conclusions 22 July 2004 ESCC Meeting 2
ANL Background • Utilize OSU’s Flow-Tools written by Mark Fullmer • Collecting from 14 different Router/Switches at ANL-East • ~600 GB currently stored and growing • 1 Year retention period desired – backing off as we add devices • Current collection/Analysis Station: IBM 360, Red. Hat Linux, 8 GB Ram, 4 1. 6 Mhz CPU 22 July 2004 ESCC Meeting 3
Sliding Window Analysis • The raw volume of Netflow Data can make data-mining long and cumbersome • Implemented a 5 minute Sliding Window for analysis – Every minute, check previous 5 minutes of data (via cron jobs) – Reduces processing time (~20 secs) – Catches vast majority of scans/probes in near real-time 22 July 2004 ESCC Meeting 4
Contextual Knowledge • Which way is the data flowing? • Contextual knowledge will affect what we search for & what we do with the results Destination IN OUT IN Source 22 July 2004 ESCC Meeting 5
OUT -> IN • OUT -> IN – Receive many class B/C scans a day – Only Watch for scans on open FW ports • Dynamically read FW config every ½ hour to determine open ports in FW – Use Netflow Data to look for scans on open FW ports • Fast Scans: Script executed every minute looking at past 5 minutes of data to catch Fast Scanners • Slow Scans: Script run every hour looking at previous 24 hours of data to catch Slow Scanners – Once scanner detected, send IP for FW shun 22 July 2004 ESCC Meeting 6
IN -> OUT • Looking for problem machines at the Lab – 1 st approximation is to look at machines which have contacted large # of Internet hosts in a short period of time – Can indicate a compromised/infected machine • Exclude a number of internal machines based on apriori knowledge – email servers, domain controllers, network scanning machines (ignore) 22 July 2004 ESCC Meeting 7
IN -> IN • Requires collection on multiple internal switches/routers • Detect Internal Scanning – Cron job runs every hour – Infected host scanning local subnet/supernet – Detect unauthorized internal network scans • Post-Mortem Forensic Value – What did an internally compromised machine do once it was compromised – Track down cross-contamination 22 July 2004 ESCC Meeting 8
OUT -> OUT • May not apply to every site • Co-location personal or transport traffic constitute OUT -> OUT traffic on a network • Scans in the OUT <-> OUT direction are detected and the appropriate network admin/security personal are notified 22 July 2004 ESCC Meeting 9
Incident Investigation 1/2 • What to do when an incident happens? (Besides pull your hair out) • Netflow Data is invaluable in cyber security investigations. • Start by classifying IP addresses into a taxonomy – Possible Bad Guy – Possible Victims – Possible Intermediary (stepping stone, rootkit resource site, etc) – This process can be aided by host syslog, etc. 22 July 2004 ESCC Meeting 10
Incident Investigation 2/2 • By identifying the possible victims, the process of containment and clean-up becomes much easier • Netflow has become an invaluable tool for our cyber security team 22 July 2004 ESCC Meeting 11
Integration³ • To improve Signal-to-Noise ratio of cyber security events, correlating netflow data with other data sources has been very helpful – IDS logs – ARP/CAM Tables – MAC “persistence” – Firewall Logs – DHCP/VPN Logs – Host based Syslog 22 July 2004 ESCC Meeting 12
IDS & Netflow Logs • Used to cross validate either an IDS or a Netflow alarm with each other • IDS alarms usually give specific points of attack • Netflow can be used to provide background or framework of attack • Netflow + IDS can provide a better perspective of cyber security events • Store IDS and Netflow Logs in same directory structure to make searching easier 22 July 2004 ESCC Meeting 13
VPN/Dail. UP Scan/Virus Detection • Marriage of Many Data Sources • Each Dailup/VPN login initiates a virus scan of connected host • Dailup/VPN connected host is monitored via netflow for outbound scanning activity • If remotely connected host is determined to be virally infected or doing malicious behavior, connection is terminated and user account is locked • All actions are performed via automated scripts, 22 July 2004 ESCC Meeting 14 no human intervention
Future • Host Profiling Via Netflow – Determine what “normal” behavior for a host is and then alert when it varies from the norm – Some IDS products are attempting this approach (Network Flight Recorder, Lancope) • Visualization of Netflow Data – Charts, Graphs, Animations of Network Conversations – Work Being done by NCSA • Better Integration with other data sources 22 July 2004 ESCC Meeting 15
Conclusions • Collecting Netflow data to support Cyber Security activities is tremendously helpful. • It is an invaluable data source for performing post-mortem forensic analysis, as well as an extremely helpful tool for performing real-time detection, notification, and active response – blocking an IP address. 22 July 2004 ESCC Meeting 16
Thanks • Chris Poetzel – cpoetzel@anl. gov – 630 -252 -7431 • Scott Pinkerton – pinkerton@anl. gov – 630 -252 -9770 22 July 2004 ESCC Meeting 17
- Slides: 17