BehaviorBased Malware Detection Mihai Christodorescu University of Wisconsin
Behavior-Based Malware Detection Mihai Christodorescu University of Wisconsin, Madison mihai@cs. wisc. edu University of Wisconsin, Madison
The Rising Malware Tide • Malware is software with unwanted functionality. Viruses, trojans, backdoors, bots, adware, spyware, browser hijackers, downloaders, droppers, keyloggers, password stealers, . . . • “Blended” threats 100, 000 machines are infected. [Vint Cerf, World Economic Forum 2007] University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 2
Organized Cyber-Crime • Boom in online fraud: – – Spamming Trade in stolen data Financial fraud ID theft Malware is the tool of the trade. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 3
The Changing Threat Landscape 1995: Hobby malware, for fun • Show programming prowess • Single author Creator of the Melissa worm 2007: Professional malware, for profit • Collaborative development ? • Bug-fix releases, code reuse Botnets: distributed computing has finally arrived. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 4
Let’s Go Phishing! Botnet = infected hosts remotely controlled by attacker 10, 000 to 1. 5 M bots per botnet! Phishing in 5 easy steps: Email list Email msg. Send email Host site Steal money $300 -$700 botnet-hour “Mule” gets 10% Malware Economy University of Wisconsin, Madison [Sonic. Wall] Mihai Christodorescu – “Behavior-based Malware Detection” 5
Failure of Signature Detectors Malware detectors still use signatures. Virus Scanner Internet Known Paradigm shift in. Malware malware creation, ac 028 c 0 e 86009 d 8 edfac 075 fb e 81 cfd 72 ef 50 b 91000 f 7 f 15052 b 9 0: *: 504 b 03040 a 0001000800*. . . *: 188420: 181779: *: 8 ad 6900 f 5088 cab 9356678 e 43 c. . . 3: *: 3 e 3 c 623 e 6 c 696 e 6 b 3 c 2 f 6. . . yet no change in malware detection! New Malware 1 New Malware 2 Malware is obfuscated/transformed easily. Software diversity used successfully by malware. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 6
Focus On Behavior New malware & malware families 100, 000 10, 000 8, 821 11, 136 20, 731 31, 726 53, 950 86, 876 Number of variants grows exponentially. 1, 000 325 335 274 Number of families stays constant. 202 (est. ) 100 10 Family = malware with a common code base. 1 Time 2001 2002 2003 2004 2005 2006 A family is a collection of behaviors. Labs, Symantec] A behavior can be shared by many[Kaspersky families. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 7
My Research Contributions Behavior–based malware–detection technique that is provably resilient to obfuscation • Malspec: new formalism for behavior 1 malspec = 30+ malware instances • Efficient, semantics-aware detection • Analysis of resilience to obfuscation proof of soundness and completeness University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 8
Making the World a Better Place Before: • 1 signature = 1 malware e. g. , 420 signatures for email worms “Happy New Year, ” Stration , Bagle, Netsky, My. Doom, Sober After: • 1 malspec = a class of malware e. g. , 1 malspec for email worms (including all of the above) Raise the bar for new malware University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 9
Behavior-based Malware Detection Four problems: 1. 2. 3. 4. University of Wisconsin, Madison Behavior specification Efficient detection Behavior mining Detector analysis Mihai Christodorescu – “Behavior-based Malware Detection” 10
Specifying Behavior Syntactic Semantic Byte signatures allow for fast detection. – But not resilient to obfuscation. High-level descriptions require expensive Execution of program detection. – Resilient to obfuscation. “ M causes the system to reach a state where a copy of M has been sent by email. ” University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 11
Malicious Behavior Specification Syntactic Semantic Balance semantics and syntax: • Syntactic elements must be necessary. • Semantic elements must be sufficient. § § University of Wisconsin, Madison Instructions Library calls System calls. . . Mihai Christodorescu – “Behavior-based Malware Detection” 12
Malspec: Self-Propagation by Email push call Connect push call add push call push push call Send 10 h eax edi connect esi eax [ebp+h. Mem] wsprintf. A esp, 0 Ch [ebp+h. Mem] lstrlen. A 0 eax [ebp+h. Mem] ebx eax ecx edi send Netsky. B University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 13
Malspec: Self-Propagation by Email push push call add push call push push call 10 h eax edi connect esi eax [ebp+h. Mem] wsprintf. A esp, 0 Ch [ebp+h. Mem] lstrlen. A 0 eax [ebp+h. Mem] ebx eax ecx edi send Connect = Syntactic component X : = Arg 1 describes temporal constraints. + Semantic component describes dependency constraints. Send Arg 1 = X & Arg 2 = “EHLO. *” Netsky. B University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 14
Building a Real Malspec “Send Email” “Read Own Exe. Image” X: =socket() connect(X) S: =process_name() send(X, “EHLO”) Z: =open(S) send(X, “DATA”) Y: =read(Z) send(X, T) University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 15
Building a Real Malspec “Send Email” “Read Own Exe. Image” X: =socket() connect(X) S: =process_name() send(X, “EHLO”) Z: =open(S) send(X, “DATA”) Y: =read(Z) send(X, T) University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 16
Malspec: Self-Propagation by Email AND-OR graph X: =socket() connect(X) send(X, “EHLO”) send(X, “DATA”) send(X, T) University of Wisconsin, Madison Automating Malspec Creation: Z: =open(S) Malspec Mining S: =process_name() Y: =read(Z) Malware Sample — Benign Program Mihai Christodorescu – “Behavior-based Malware Detection” 17
Malspec Constraints X: =socket() connect(X) Dependence constraint: X after socket = X before connect S: =process_name() Local constraint send(X, “EHLO”) send(X, “DATA”) Z: =open(S) Y: =read(Z) Dependence constraint send(X, T) University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 18
Malspecs Benefits Symbolic variables X: =socket() connect(X) S: =process_name() Z: =open(S) send(X, “EHLO”) send(X, “DATA”) Y: =read(Z) send(X, T) Constraint-based execution order Independent of obfuscation artifacts Expressive to describe even obfuscated behavior. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 19
Malspec Detection Strategies • Static analysis X: =socket() connect(X) S: =process_name() Z: =open(S) send(X, “EHLO”) send(X, “DATA”) • Dynamic analysis Y: =read(Z) • Host-based IDS send(X, T) • Inline Reference Monitors Malspecs are independent of detection method. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 20
Behavior-based Malware Detection Four problems: 1. 2. 3. 4. University of Wisconsin, Madison Behavior specification Efficient detection Behavior mining Detector analysis Mihai Christodorescu – “Behavior-based Malware Detection” 21
Detection of Malicious Behavior X: =socket() Binary File connect(X) S: =process_name() Z: =open(S) send(X, “EHLO”) send(X, “DATA”) Y: =read(Z) send(X, T) Malware Detector Goal: Find a program path that matches the malspec. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 22
Find A Malicious Program Path X: =socket() connect(X) S: =process_name() Z: =open(S) send(X, “EHLO”) send(X, “DATA”) Y: =read(Z) send(X, T) Interprocedural Control-Flow Graph University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 23
1) Match Malspec Operations X: =socket() connect(X) send(X, “EHLO”) send(X, “DATA”) S: =process_name() Z: =open(S) Y: =read(Z) send(X, T) University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 24
2) Match Malspec Constraints X: =socket() connect(X) send(X, “EHLO”) S: =process_name() Z: =open(S) Program Constraint: The program fragment preserves the program expression bound to Z. Like a semantic def-use constraint. University of Wisconsin, Madison send(X, “DATA”) Y: =read(Z) Malspec Constraint: send(X, T) Z after open = Z before read Mihai Christodorescu – “Behavior-based Malware Detection” 25
2) Match Malspec Constraints Semantic nop wrt E = program fragment preserving an expression E. Program Constraint: The program fragment preserves the program expression bound to Z. University of Wisconsin, Madison Solution: Compare fragment with known semantic nops. Mihai Christodorescu – “Behavior-based Malware Detection” 26
My Implementation: SAFE [Christodorescu & Jha, USENIX Security 2003] Malspec Binary File Disassembler CFG constructor IDA Pro Code. Surfer SAFE Malware Detector Malspec operations Malspec constraints Graph matching Semantic-nop matching CFG Pattern DB Yes / No University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 27
SAFE: Better Than Byte Signatures [Christodorescu & Jha, USENIX Security 2003] • Obfuscate known malware • Test it against SAFE and commercial antivirus tools Several known malware Chernobyl Obfuscated malware Chernobyl Chernobyl z 0 mbie z 0 mbie F 0 sf 0 r 0 F 0 sf 0 r 0 Hare Hare Commercial AV: 0% University of Wisconsin, Madison z 0 mbie F 0 sf 0 r 0 Hare SAFE detector: 100% Mihai Christodorescu – “Behavior-based Malware Detection” 28
SAFE Benefits [Christodorescu & Jha, USENIX Security 2003] SAFE features: Obfuscation resilience: • Flow sensitivity • Code reordering • Binding procedure • Register renaming • Obfuscation pattern DB • (Limited) junk code University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 29
Limitations of SAFE • Restricted to equality constraints Malspec constraint: Z after open = Z before read • Restricted to a known set of obfuscations Pattern DB = push X pop X inc X dec X add X, 0. . . Enumerating obfuscation patterns is limiting. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 30
Better Constraint Satisfaction Need an Oracle. . . Program Constraint: The program fragment preserves the program expression bound to Z. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 31
Advances in Decision Procedures Dramatic improvements in SAT solvers: – – SATO GRASP z. Chaff Berk. Min [Zhang, CADE 1997] [Marques-Silva & Sakallah, 1999] [Moskewicz et al. , DAC 2001] [Goldberg & Novikov, DATE 2002] SAT-based Bounded Model Checking: [Clarke et al. , FMSD 2001] – SAT-specific speedups – Richer logics [Strichman, CHARME 2001] [Seshia et al. , DAC 2003] A decision procedure can approx. an Oracle. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 32
Using Decision Procedures P add push esp, 0 Ch [ebp+h. Mem] Decision procedure Program Constraint: The program fragment preserves the program expression bound to Z. £ True/False University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 33
My Semantic-Nop Detector [Christodorescu et al. , Security Oakland 2003] 2005] [Christodorescu & Jha, USENIX Malspec Binary File Malspec operations Malspec constraints Graph matching Constraint Semantic-nop satisfaction matching CFG Disassembler constructor IDA Pro CFG Simplify UCLID Semantic-Nop Malware Detector [Detlefs et al. , “Simplify, ” 2004] [Lahiri & Seshia, CAV 2004] University of Wisconsin, Madison Pattern DB Yes / No Mihai Christodorescu – “Behavior-based Malware Detection” 34
Semantic-Nop Detection Results [Christodorescu et al. , Oakland 2005] Generate malspec Detect related variants Netsky. B Mass-mailing malspec 1 C, D, O, P, T, W Bagle. I Mass-mailing malspec 2 J, N, O, P, R, Y Sem. -nop detector provides forward detection. Resilient to changes within malware families. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 35
Dynamic Detection Results • Very recent results from the dynamic detection prototype. 1 malspec for mass-mailing: Bagle: Sasser: Gen-1, -54, B, Z, AG, AU B 1 malspec for self-installation: 266 different malware instances University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 36
Semantic-Nop Detection Benefits [Christodorescu et al. , Oakland 2005] Semantic-Nop features: Obfuscation resilience: • Flow sensitivity • Code reordering • Binding procedure • Register renaming • Decision procedures • Junk code • Rich constraints • Code substitution University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 37
Detection Performance Powerful decision procedures are expensive. SAFE pattern matching 1– 9 s Idea: Use expensive Simplify decision procedures theorem only if cheap decision procedures do prover 300– 800 s not provide. UCLID a definitive answer. bounded model checker University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 38
Stack of Decision Procedures Program fragment Constraint “No, code does not satisfy constraint!” SAFE pattern matching ? Yes Random execution ? No Simplify theorem prover ? Yes UCLID bounded model checker Yes/No Average cost, same decision power. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 39
Performance Results Detection times in seconds Malware Netsky (B, C, D, O, P, T, W) Bagle (I, J, N, O, P, R, Y) Bagle (obfuscated variants) Minimum Average Maximum 60. 56 99. 57 140. 08 36. 00 56. 41 97. 13 74. 81 140. 14 186. 50 Test setup: 1 GHz CPU, 1 GB RAM Comparison: Commercial signature-based detector: <1 s Decision procedure-based detector: >300 s University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 40
Behavior-based Malware Detection Four problems: 1. 2. 3. 4. University of Wisconsin, Madison Behavior specification Efficient detection Behavior mining Detector analysis Mihai Christodorescu – “Behavior-based Malware Detection” 41
Detection vs. Obfuscation Goal: Prove a detection technique is resilient to obfuscation. Behavior-based detection vs. Code-reordering obfuscation Register-renaming obfuscation Junk-insertion obfuscation Code-substitution obfuscation University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 42
Detection vs. Obfuscation Goal: Prove a detection technique is resilient to obfuscation. Malware detection using model checking [Kinder et al. , DIMVA 2005] vs. Code-reordering obfuscation Tedious, error-prone, time-consuming. Register-renaming obfuscation Need aobfuscation principled approach! Junk-insertion Code-substitution obfuscation University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 43
Challenges for Formal Analysis • Language gap – Detectors and obfuscators are defined using different languages. • Quantity – Many obfuscation schemes exist. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 44
Assessing Detector Resilience [Christodorescu et al. , POPL 2007] Resilience = Soundness + Completeness Soundness: If P is detected as a variant of M, then P is an obfuscated version of M. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 45
Assessing Detector Resilience [Christodorescu et al. , POPL 2007] Resilience = Soundness + Completeness Soundness: If P contains a path that is an obfuscated version of a path in malicious program M, then P is an obfuscated version of M. The path in P is “equivalent” to the path in M. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 46
Assessing Detector Resilience [Christodorescu et al. , POPL 2007] Resilience = Soundness + Completeness Soundness: If P contains a path that is an obfuscated version of a path in malicious program M, then P is an obfuscated version of M. Under abstract interpretation, the paths in P and M map to the same abstract path. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 47
Assessing Detector Resilience [Christodorescu et al. , POPL 2007] Resilience = Soundness + Completeness Soundness: Completeness is similar. Under abstract interpretation, the paths in P and M map to theway same path. I introduced a uniform toabstract prove resilience. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 48
Resilience of Behavior Detection [Christodorescu et al. , POPL 2007] Behavior-based Detector depends on 3 oracles: Disassembler CFG constructor Decision procedure I showed relative-completeness (and relative-soundness) of behavior detection with respect to: – – Code-reordering obfuscation Register-renaming obfuscation Code-substitution obfuscation Junk-insertion obfuscation University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 49
Summary: Behavior Detection Malware Detector Binary File Disassembler Malspec Decision Procedure CFG Constructor SAFE pattern matching Random execution Simplify theorem prover UCLID bounded model checker Efficient resilience to common obfuscations and across malware variants. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 50
Behavior-based Malware Detection 1. 2. 3. 4. â University of Wisconsin, Madison Behavior specification Efficient detection Behavior mining Detector analysis Future Work Mihai Christodorescu – “Behavior-based Malware Detection” 51
Be By sig te na tu re s ba ha se vio d r Short Term: Hybrid Detection Syntactic Static Analysis + Semantic Dynamic Analysis [Madou et al. , DRM 2005] [Halfond & Orso, ICSE 2005] [Giffin et al. , Oakland 2004] University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 52
Hybrid = Static + Dynamic Static Analysis: • All program paths • Before execution • Imprecise Dynamic Analysis: • Current program path • Runtime overhead • Precise Hybrid Analysis: “Will this execution be malicious? ” 1. Apply static analysis when precise 2. Instrument the imprecise program paths 3. Trigger dynamic analysis where instrumented Runtime detection with minimal overhead. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 53
Long-Term: Software Accountability • What will this program do? • What is this program doing? • What did this program do? Behaviors as first-order (system) objects. Adaptive self-protection University of Wisconsin, Madison Cooperative defenses Execution undo Mihai Christodorescu – “Behavior-based Malware Detection” 54
Other Research Malware Analysis: – – Testing malware detectors String analysis for binaries Malspec mining Malware normalization [ISSTA 2004] [PASTE 2005] [in preparation] [in submission] Software Protection: – Attacking license-checking code – Strengthening self- checksumming [PPL 2001] [ACSAC 2005] Fraud detection on e. Bay [CCS 2005] Cooperative intrusion detection [book chapter] University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 55
Thank you! Mihai Christodorescu University of Wisconsin, Madison mihai@cs. wisc. edu University of Wisconsin, Madison
Here Be Dragons! University of Wisconsin, Madison
What Is Malware? Malicious Software viruses, trojans, backdoors, bots, adware, spyware, browser hijackers, downloaders, droppers, drive-by downloads, keyloggers, password stealers, infectors, . . . University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 58
Growing Malware Threat New malware 100, 000 86, 876 75, 000 53, 950 50, 000 31, 726 20, 731 25, 000 8, 821 11, 136 0 Time 2001 2002 2003 2004 2005 2006 Source: Kaspersky Labs University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 59
Low Prevalence of New Behaviors New malware & malware families 100, 000 86, 876 Malware family: 75, 000 50, 000 Malware instances that are modifications of the same previous 53, 950 malware. [Mc. Afee] 31, 726 20, 731 25, 000 8, 821 11, 136 0 Time 2001 2002 2003 2004 2005 2006 [Kaspersky Labs, Symantec] University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 60
Offense—Defense Mismatch Evolving threats, stagnant defenses 20 years ago Today Infrequent outbreaks Many, daily malware Created for fun and bragging rights Created for profit and control Limited evasion Evasion via obfuscation Byte signature scanning University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 61
Malware Race Malware writers use obfuscation and evolution to generate lots of malware. Polymorphism Metamorphism University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 62
Malware Race Malware writers use obfuscation and evolution to generate lots of malware. Polymorphism Metamorphism Server-side polymorphism University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 63
“Happy New Year!” 1, 000+ infected machines [Symantec] Infected machines connect together over P 2 P and form a botnet. Even small attacks lead to Internet-level threats. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 64
Malicious Activity • Self-installs. “Happy New Year!” • Disables security software. Stration 2006 Bagle 2005 Netsky 2004 My. Doom 2004 Sober 2003 • Gathers email addresses. • Emails copies of itself. Same behavior persists across generations. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 65
A Day in the Life of a Bot Master Botnet = compromised machines remotely controlled by one attacker Size ranges from 10, 000 to 1. 5 M machines! • • Check computers infected overnight Remotely install spyware (+$0. 15/install) • • • Download stolen data ($3/cc#, $20/cc#+dob) Generate a new version of the bot code Launch email run with new bot University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 66
Current Malware Defenses Virus Scanner Internet Suspicious Program New Malware Virus Analyst Signature Database More than ac 028 c 0 e 86009 d 8 edfac 075 fb e 81 cfd 72 ef 50 b 91000 f 7 f 15052 b 9 200, 000 0: *: 504 b 03040 a 0001000800*. . . signatures! *: 188420: 181779: *: 8 ad 6900 f 5088 cab 9356678 e 43 c. . . 3: *: 3 e 3 c 623 e 6 c 696 e 6 b 3 c 2 f 6. . . Current detectors fail to detect new variants. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 67
Malware Writers Avoid Detection ac 028 c 0 e 86009 d 8 edfac 075 fb e 81 cfd 72 ef 50 b 91000 f 7 f 15052 b 9 0: *: 504 b 03040 a 0001000800*. . . *: 188420: 181779: *: 8 ad 6900 f 5088 cab 9356678 e 43 c. . . 3: *: 3 e 3 c 623 e 6 c 696 e 6 b 3 c 2 f 6. . . Malware Modified Malware Transformations for evasion: – Recompile – Add/modify code – Obfuscations Change syntax, preserve semantics! Attackers know how to evade syntactic detectors. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 68
Low Prevalence of New Behaviors New malware & malware families 100, 000 10, 000 8, 821 11, 136 20, 731 31, 726 53, 950 86, 876 1, 000 325 335 274 202 (est. ) 100 10 Lots of malware = few behaviors × many transformations 1 Time 2001 2002 2003 2004 2005 2006 Labs, Symantec] Goal: Detect behavior, ignore[Kaspersky obfuscation. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 69
How does malware arrive on a system? • User starts it • Drive-by dowloads • Remote vulnerabilities University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 70
Malware Race Malware writers use obfuscation and evolution to generate lots of malware. Common obfuscation techniques: nop insertion, register renaming, junk insertion, instruction reordering, encryption, compression, branch inversion, equivalent instruction substitution, basic block reordering, . . . University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 71
Obfuscation vs. Byte Signatures 7 minutes Need many sigs. in a short period of time! Source: Commtouch Labs University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 72
Evasion Through Junk Insertion lea push nop push call nop pop nop lea push pop call push nop lea nop push pop call lea pop push nop lea push call eax, [ebp+Data] offset a. Services_exe eax offset a. Services_exe _strcat ecx eax, [ebp+Data] eax ecx _strcat edi eax, [ebp+Existing. File. Name] eax ecx ds: Copy. File. A eax, [ebp+Data] ecx edi eax 8 D 68 50 E 8 59 8 D 59 57 50 8 D 50 FF 85 D 8 FE FF FF 78 8 E 40 00 69 06 00 00 85 D 8 FE FF FF 85 D 4 FD FF FF 15 C 0 60 40 00 Signature eax, [ebp+Existing. File. Name] eax ds: Copy. File. A University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 73
Evasion Through Reordering lea eax, [ebp+Data] nop jmp label_one push offset a. Services_exe nop label_two: nop lea eax, [ebp+Data] push eax. . . call _strcateax push nop call ds: Copy. File. A nop jmp label_three nop pop ecx label_one: lea eax, [ebp+Data]. . . pop ecx call _strcat push edi. . . push eaxlabel_two jmp nop lea eax, [ebp+Existing. File. Name] label_three: . . . push eax call ds: Copy. File. A University of Wisconsin, Madison 8 D 85 90* 68 78 90* 50 90* E 8 69 90* 59 90*. . . 90* 50 90* FF 15 D 8 FE FF FF 8 E 40 00 06 00 00 C 0 60 40 00 Regex Signature Mihai Christodorescu – “Behavior-based Malware Detection” 74
Evasion Through Encryption lea eax, [ebp+Data] esi, data_area jmp label_one mov ecx, 37 again: label_two: xor byte ptr [esi+ecx], 0 x 01 lea loop eax, again[ebp+Data]. . . jmp data_area push. eax call. ds: Copy. File. A jmp. label_three data_area: label_one: db 8 C 84 D 9 FF. . . . call. _strcat. . jmp db label_two FE 14 C 1 61. . . label_three: . . . 8 D 85 90* 68 78 90* 50 90* E 8 69 90* 59 90*. . . 90* 50 90* FF 15 D 8 FE FF FF 8 E 40 00 06 00 00 C 0 60 40 00 Regex Signature University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 75
Behavior-based Malware Detection Four problems: 1. 2. 3. 4. Behavior How to describe specification what is malicious? Behavior How to detection maliciousness? Behavior How to automatically mining infer behavior? Detector How to prove analysis resilience of detection? University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 76
Recent Approaches to Detection Defining what “malicious” means: – Byte distributions • N-gram distributions [Li et al. , SMC 2005] • Library dependencies and strings [Schultz et al. , Oakland 2001] – System/library call sequences [Mukkamala et al. , ACSAC 2004] [Xu et al. , HIS 2004] – Sequences of instructions [Kinder et al. , DIMVA 2005] University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 77
By sig te na tu re s Specifying Malicious Behavior Syntactic Semantic • False dependencies on code layout, register allocation, and opcode and library usage University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 78
Pu se re m ly an tic By sig te na tu re s Specifying Behavior Syntactic Semantic “ Execution of program M causes the system to reach a state where a copy of M has been sent by email. ” • Too slow for real-time detection because of huge state space (program × library × OS) University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 79
A Complete Malspec socket(A 1, A 2, A 3) connect(B 1, B 2, B 3) write(C 1, C 2, C 3) write(D 1, D 2, D 3) E=process_name() open(F 1, F 2) read(G 1, G 2, G 3) write(H 1, H 2, H 3) University of Wisconsin, Madison A 1=PF_INET. . . B 1=A 1. . . C 1=B 1 C 2 “EHLO. *”. . . H 1=B 1 H 2=Base 64(G 2) Mihai Christodorescu – “Behavior-based Malware Detection” 80
X: =socket() connect(X) write(X, “EHLO”) write(X, “DATA”) S: =process_name() Z: =open(S) Y=read(Z) write(X, T) University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 81
Implicit Constraints X: =socket() connect(X) Implicit constraint: X : = socket() connect(X’) where X’=X. Z: =open(N) write(X, “EHLO”) write(X, “DATA”) Y=read(Z) write(X, T) University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 82
Malspec Class of Malware All Programs Email Programs that satisfy the malspec constraints Email Worms University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 83
Matching Malspecs: Constraints X: =socket() connect(X) write(X, “EHLO”) write(X, “DATA”) S: =process_nam Z: =open(S) Y: =read(Z) Does the code sequence satisfy the (implicit) constraint? write(X, T) Verification Oracle University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 84
Matching Malspecs: Constraints (II) X: =socket() Does the code sequence satisfy the constraint? connect(X) Verification Oracle write(X, “EHLO”) write(X, “DATA”) S: =process_nam Z: =open(S) Y: =read(Z) write(X, T) University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 85
Building A Verification Oracle Program Fragment Verification Oracle Yes/No Constraint φ Program verification University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 86
A Simple Decision Procedure • Match known obfuscation patterns against the control-flow graph E. g. : equality constraints push X pop X University of Wisconsin, Madison inc X dec X φ: X’=X add X, 0. . . Mihai Christodorescu – “Behavior-based Malware Detection” 87
SAFE Implementation [USENIX Security 2003] Program Fragment SAFE Decision Procedure Constraint φ push X pop X University of Wisconsin, Madison inc X dec X Yes/No add X, 0. . . Mihai Christodorescu – “Behavior-based Malware Detection” 88
SAFE-Based Detection Results [USENIX Security 2003] Detection rates of obfuscated variants Malware Norton Mc. Afee Command Antivirus Virus. Scan Antivirus SAFE Chernobyl 0% 0% 0% 100% Z 0 mbie 0% 0% 0% 100% F 0 sf 0 r 0 0% 0% 0% 100% Hare 0% 0% 0% 100% Resilient to simple obfuscations. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 89
Binary Program Analysis Binary File Disassembler CFG constructor University of Wisconsin, Madison Interprocedural Control Flow Graphs (CFGs) Mihai Christodorescu – “Behavior-based Malware Detection” 90
Binary Program Analysis Binary File Disassembler CFG constructor University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 91
Pattern Matching of Malspecs X: =socket() connect(X) write(X, “EHLO”) write(X, “DATA”) S: =process_nam Z: =open(S) Y: =read(Z) write(X, T) University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 92
Pattern Matching of Constraints X: =socket() connect(X) write(X, “EHLO”) write(X, “DATA”) S: =process_nam Z: =open(S) Y: =read(Z) write(X, T) University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 93
Pattern Matching of Constraints (I) Program Fragment Malspec Fragment Bindings: X eax at Malspec constraint: X after = X before X: =socket() connect(X) University of Wisconsin, Madison Program constraint: eax after = eax before Mihai Christodorescu – “Behavior-based Malware Detection” 94
Pattern Matching of Constraints (II) Program Fragment Bindings: X edx at Malspec Fragment Malspec constraint: X after = X before S: =process_name() Z: =open(S) University of Wisconsin, Madison Program constraint: edx after = edx before Mihai Christodorescu – “Behavior-based Malware Detection” 95
Pattern Matching of Constraints (II) Program Fragment Program constraint: edx after = edx before True on some path from List of obfuscation patterns University of Wisconsin, Madison push X pop X inc X dec X to ? add X, 0. . . Mihai Christodorescu – “Behavior-based Malware Detection” 96
Improving on SAFE Disassembler = IDA Pro CFG = Code. Surfer constructor Features: • Flow sensitivity • Binding procedure • Obfuscation patterns University of Wisconsin, Madison Need a principled approach Mihai Christodorescu – “Behavior-based Malware Detection” 97
A Better Mouse Trap Program Fragment Program constraint: edx after = edx before Use decision procedures to approximate oracles. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 98
A Semantic-Nop Decision Procedure Constraint φ: X=Z “Any code that preserves the value from X to Z. ” University of Wisconsin, Madison X: =socket() connect(Y) write(Z, “EHLO”) Mihai Christodorescu – “Behavior-based Malware Detection” 99
A Semantic-Nop Decision Procedure Constraint φ: X=Z + Bindings X: eax Z: memory[esp-2] Constraint φ’: eax=memory[esp-2] University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 100
Sem. Nop Implementation Program Fragment Constraint φ’ Sem. Nop Decision Procedure Random execution “No, fragment does not satisfy φ’ !” University of Wisconsin, Madison Yes/No Simplify theorem prover “Yes, fragment does satisfy φ’ !” Mihai Christodorescu – “Behavior-based Malware Detection” 101
Decision Procedures for Detection [Oakland 2005] Program Fragment Decision Procedure Constraint Simplify theorem prover Yes/No UCLID bounded model checker [Detlefs et al. , “Simplify, ” 2004] [Lahiri & Seshia, CAV 2004] University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 102
Checking Constraint Satisfaction Program Fragment Bindings: X edx at Malspec Fragment Malspec constraint: X after = X before S: =process_name() Z: =open(S) University of Wisconsin, Madison Program constraint: edx after = edx before Mihai Christodorescu – “Behavior-based Malware Detection” 103
Checking Constraint Satisfaction Program Fragment Program constraint: edx after = edx before True on some path from Decision Procedure to ? Yes/No If yes, program fragment is a semantic nop. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 104
UCLID-Based Detection Results [Preliminary data] Malspec 1: Kills anti-virus processes. Malspec 2: Performs self-install. Bagle: J K My. Doom: F H L Netsky: B O W M Resilient to cross-family distinctions. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 105
Performance and Precision • Combine decision procedures to balance cost and precision. SAFE Sem. Nop Program Fragment UCLID No Constraint φ Yes University of Wisconsin, Madison Yes /No Yes Mihai Christodorescu – “Behavior-based Malware Detection” 106
Decision Procedure Stack Program Fragment “No, code does not satisfy constraint!” SAFE pattern matching Random execution Simplify theorem prover UCLID bounded model checker Constraint Yes No Yes Average cost, same decision power. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 107
Stack of Decision Procedures Cache results Program Fragment Constraint “No, code does not satisfy constraint!” SAFE pattern matching Yes Random execution No Simplify theorem prover Yes UCLID bounded model checker Yes Average cost, same decision power. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 108
Detection vs. Obfuscation Goal: Prove a detection technique is resilient to obfuscation. Obstacles: 1. Language gap Detectors and obfuscators are defined using different languages. 2. Quantity Many obfuscation schemes exist. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 109
Detection vs. Obfuscation [POPL 2007] • Bridge the language gap: Trace semantics Detection algorithm D abstract interpretation α Malspec behavior Some program behavior Obfuscation trace transformation O University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 110
Assessing Resilience to Obfuscation [POPL 2007] A detector is resilient to an obfuscation if it can “ignore” the effects of obfuscation. Sound: If detected, then it is malicious. Complete: If malicious, then it is detected. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 111
Resilience of Behavior Detection [POPL 2007] Behavior-based Detector depends on 3 oracles: CFG Decision Disassembler constructor procedure Relative-completeness (-soundness) with respect to several obfuscation classes: – – Code reordering Register renaming Code substitution Junk insertion University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 112
Detection vs. Obfuscation Goal: Determine if a detection technique is resilient to obfuscation. Solutions: Obstacles: 1. Common Languagelanguage: gap Trace Semantics Detection and Detectors abstract obfuscators interpretation are defined using different languages. Obfuscation trace transformation 2. Classes Quantityof obfuscation Many obfuscation schemes exist. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 113
Assessing Resilience to Obfuscation [POPL 2007] A detector is resilient to an obfuscation if it can “ignore” the effects of obfuscation. Sound: Detected as malware Has malicious behavior Complete: Has malicious behavior Detected as malware University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 114
Resilience of Behavior Detection [POPL 2007] • Detection abstract interpretation Disassembler CFG Decision constructor procedures • Obfuscation class trace transformation Relative-completeness (-soundness) with respect to code reordering, register renaming, semantic nops, code substitution. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 115
Assessing Resilience to Obfuscation [POPL 2007] A detector is resilient to an obfuscation if it can “abstract away” the obfuscation’s effects. Detector control-flow graphs Detector trace semantics Obfuscation University of Wisconsin, Madison instruction bytes Obfuscation Mihai Christodorescu – “Behavior-based Malware Detection” 116
Trace Semantics Binary File . . . University of Wisconsin, Madison Instruction + Program state Mihai Christodorescu – “Behavior-based Malware Detection” 117
Obfuscation as Trace Transformation Malware trace Obfuscated malware trace University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 118
Detection as Abstract Interpretation αR Program trace Malware trace αR : abstraction for program restriction R University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 119
Detection as Abstract Interpretation α αR Obfuscated malware trace Program trace α Malware trace αR : abstraction for program restriction R α : abstraction that discards obfuscation effects and preserves “maliciousness” University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 120
Assessing Resilience to Obfuscation [POPL 2007] A detector is resilient to an obfuscation if it can “ignore” the effects of obfuscation. Sound: Complete: Behavior-based detector is sound and complete. University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 121
Future Research Goals • Short-term: Contextual Detection • Long-term: Self-Defense – Containment – Collaborative defenses University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 122
By sig te na tu re s Short Term: Contextual Detection Dynamic Analysis Semantics Be By sig te na tu re s ba ha se vio d r Syntax Static Analysis University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 123
Short Term: Contextual Detection Will the rest of this execution be malicious? Static Analysis Dynamic Analysis Is this program path valid? [Madou et al. , DRM 2005] [Halfond & Orso, ICSE 2005] [Giffin et al. , Oakland 2004] University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 124
Long Term: Self-Protection • Adaptive Defenses Can the interface to the protected system be altered to implement a particular security policy? • Cooperative Defenses How can an individual system detect and react to network-wide malicious behavior? University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 125
Long Term: Self-Defense • Containment Host • Collaboration Host 2 Host 3 University of Wisconsin, Madison Host N Mihai Christodorescu – “Behavior-based Malware Detection” 126
Long Term: Containment Many mechanisms exist, no good policy! Principle of Least Privilege must be the default mode of operation. How do we automatically construct and check containment policies? University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 127
Long Term: Collaborative Defense Attackers know the value of distributed/parallel computing. Different hosts have different views of the network. How do we share attack-relevant data in a privacy-preserving way? University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 128
Mining Malspecs from Malware Dynamic differential analysis Malware system-call trace Benign program system-call trace Malspec: University of Wisconsin, Madison Mihai Christodorescu – “Behavior-based Malware Detection” 129
- Slides: 129