DARPA Challenges for Anomaly Detection of Program Exploits

  • Slides: 32
Download presentation
DARPA Challenges for Anomaly Detection of Program Exploits Anup K. Ghosh, Ph. D. DARPA/ATO

DARPA Challenges for Anomaly Detection of Program Exploits Anup K. Ghosh, Ph. D. DARPA/ATO JHU Workshop on Intrusion Detection Johns Hopkins University June 13, 2002

DARPA Overview • Detecting Code-Driven Threats • Prior work in program anomaly detection •

DARPA Overview • Detecting Code-Driven Threats • Prior work in program anomaly detection • Applying anomaly detection to Windows processes • Challenges to anomaly detection

DARPA Code-Driven Threats

DARPA Code-Driven Threats

DARPA Background • 20 years of intrusion detection research has yielded tools sometimes capable

DARPA Background • 20 years of intrusion detection research has yielded tools sometimes capable of detecting malicious hackers • A viable anti-virus commercial industry has emerged in the same period in the wake of PC viruses

DARPA However… • While both approaches are very good at detecting known attacks/viruses… •

DARPA However… • While both approaches are very good at detecting known attacks/viruses… • They do not perform well in – detecting novel attacks/malicious code – scaling to Internet-wide attacks – responding in computer time

DARPA A New Threat… Code-driven attacks • Malicious hackers spend their time breaking into

DARPA A New Threat… Code-driven attacks • Malicious hackers spend their time breaking into systems one at a time • Code-driven attacks are written once, unleashed everywhere

DARPA How Big is this Problem? • Code Red costed an estimated $2. 6

DARPA How Big is this Problem? • Code Red costed an estimated $2. 6 billion • Worms will continue to exploit vulnerabilities in online software Newly reported vulnerabilities to CERT CC from 1995 to 2001. Copyright IEEE, Security and Privacy - 2002, supplement to IEEE Computer.

DARPA Why Don’t Existing Defenses Work? • Code-driven attacks: – Go through firewalls unimpeded

DARPA Why Don’t Existing Defenses Work? • Code-driven attacks: – Go through firewalls unimpeded – Go unnoticed by intrusion detection systems – Propagate too fast for anti-virus vendors to disseminate signatures in time – Have complete access to our network and file systems – Execute with our own privileges – Can send sensitive information out over networks – Can spy on our computer and Web usage patterns

DARPA The Future is Ominous --- Nimda was a harbinger • Future worms will

DARPA The Future is Ominous --- Nimda was a harbinger • Future worms will be: – – – Architecture independent Stealthy to its victims using process hiding Autonomous, so it can independently migrate Intelligent, so it can learn new exploits on the fly Polymorphic, to avoid signature detection Programmable, to learn vulnerabilities and be remotely controllable

DARPA This Problem Requires New Thinking • Consider: – Intrusion detection techniques are designed

DARPA This Problem Requires New Thinking • Consider: – Intrusion detection techniques are designed to handle Internet and network-based attacks – Anti-virus software is • But, neither handle code-driven designed to address malicious attacks effectively code attacks • We need to either learn from the strengths of these approaches, or to develop a new approach entirely

DARPA Prior Work in Program-Based Anomaly Detection

DARPA Prior Work in Program-Based Anomaly Detection

DARPA Intrusion Detection Approaches Misuse Detection • scan packets, logs, commands for known malicious

DARPA Intrusion Detection Approaches Misuse Detection • scan packets, logs, commands for known malicious patterns. (pattern matching) • Upside: known attacks can be detected. • Downside: unknown, novel threats not detected. Reactionary. Anomaly Detection • Detect intrusions by statistical aberrations from normal usage. • Upside: novel or unknown intrusions can be detected. • Downside: well-known intrusions may go undetected

DARPA Network-Based vs. Host-Based Intrusion Detection Network-based • Scans network packet logs for signatures

DARPA Network-Based vs. Host-Based Intrusion Detection Network-based • Scans network packet logs for signatures of intrusive activities. • Increasing bandwidth is a challenge. • End-to-end encryption could obsolete this approach. Host-based • Scans machine audit logs for signatures of intrusive activities. • Traditionally monitors users’ behavior. • Many sensors/hosts require enterprise management.

DARPA Process-Based Anomaly Detection • Premise of process-based approach: “Abnormally behaving programs are a

DARPA Process-Based Anomaly Detection • Premise of process-based approach: “Abnormally behaving programs are a primary indicator of computer misuse. ” • Approach: – build program behavior profiles for monitored programs and use these to detect intrusions.

DARPA Goals of Process-Based Anomaly Detection Learn Benign Program Behavior Flag Deviations from Learned

DARPA Goals of Process-Based Anomaly Detection Learn Benign Program Behavior Flag Deviations from Learned Behavior Generalize from Observed Behavior

DARPA Cigital’s Three Systems for Anomaly Detection Recurrent Neural Network State Tester String Transducer

DARPA Cigital’s Three Systems for Anomaly Detection Recurrent Neural Network State Tester String Transducer

DARPA Summary of Cigital System Performance Scope: Detects program misuse --- mainly U 2

DARPA Summary of Cigital System Performance Scope: Detects program misuse --- mainly U 2 R attacks. Recurrent Neural State Tester Network String Transducer 100% of U 2 R attacks at a rate of 3 FA/day. 100% of U 2 R attacks at a rate of 9 FA/day.

DARPA Comparison, Strengths, Weaknesses • Systems perform comparably --- short training time for string

DARPA Comparison, Strengths, Weaknesses • Systems perform comparably --- short training time for string transducer and state tester make them more desirable. • Detects program misuse attacks very reliably with few false alarms. • Will not detect either programs that are not monitored or attacks that are legitimate uses of programs.

DARPA Performance as a Function of Training Data Table lookup • The horizontal axis

DARPA Performance as a Function of Training Data Table lookup • The horizontal axis represents the percentage of available data used for training. • The vertical axis is the percentage of sessions creating false alarms when all possible attacks are detected State tester String transducer

DARPA Applying Anomaly Detection to Windows Processes

DARPA Applying Anomaly Detection to Windows Processes

DARPA Approach for Windows NT • Collects system events and identifies anomalous patterns •

DARPA Approach for Windows NT • Collects system events and identifies anomalous patterns • Ported to use Windows NT/2000 baseobject audit data • Cigital algorithms show high performance with low false positive rates.

DARPA Using strace for NT • Data needs to be collected as it is

DARPA Using strace for NT • Data needs to be collected as it is created and streamed to the ID system – NT auditing does not meet these requirements • Advantages of using strace for NT • Provides additional information such as Thread IDs • Can be altered to stream data directly to our system • Selectively captures system calls that we need • Can be turned On/Off on-the-fly

DARPA Collecting Data in Real-Time • Streams of events arrive from multiple processes and

DARPA Collecting Data in Real-Time • Streams of events arrive from multiple processes and multiple threads and need to be sorted accordingly. Events 6 5 Process Splitter 4 3 2 1 4 2 6 3 5 1

DARPA Performing Anomaly Detection • Data from each application must be matched with the

DARPA Performing Anomaly Detection • Data from each application must be matched with the appropriate model and the state must be updated by the ID algorithm. State 4 2 State New State Algorithm Model

DARPA Performance Against Code Red • 11 -fold x-validation • Includes 2 Code Red

DARPA Performance Against Code Red • 11 -fold x-validation • Includes 2 Code Red attack traces

DARPA Anomaly Detection Challenges

DARPA Anomaly Detection Challenges

DARPA Training • Statistical and machine learning techniques that require baseline behavior profiles require

DARPA Training • Statistical and machine learning techniques that require baseline behavior profiles require extensive training. – Time consuming – Determines quality of results – Training in one environment may not map well to another environment – Over training is a problem for some classes of machine learning

DARPA False Positives • Operators have low thresholds for false positives • An acceptable

DARPA False Positives • Operators have low thresholds for false positives • An acceptable rate might be < 1 per day

DARPA Identification • Anomaly detection approaches tell you when something is wrong, not what

DARPA Identification • Anomaly detection approaches tell you when something is wrong, not what is wrong, what specific attack is executing, nor where it is coming from.

DARPA Real-Time Response • Once an intrusion is detected, systems need to identify, alert,

DARPA Real-Time Response • Once an intrusion is detected, systems need to identify, alert, isolate, and respond according to local security policies.

DARPA Summary • Much work has been performed in processbased anomaly detection • Many

DARPA Summary • Much work has been performed in processbased anomaly detection • Many challenges remain… • Foremost among them, can we leverage process-based anomaly detection to detect future code-driven threats?

DARPA Questions? Anup Ghosh aghosh@darpa. mil For more info, see: C. Michael & A.

DARPA Questions? Anup Ghosh aghosh@darpa. mil For more info, see: C. Michael & A. Ghosh, “Simple statebased approaches to program-based anomaly detection”, to appear in ACM Transactions on Information and System Security (TISSEC), 2002.