Malgram Malware Analysis Malware Unpacking Static Analysis Code

  • Slides: 41
Download presentation
Malgram Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation Phillip Porras and Hassen

Malgram Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation Phillip Porras and Hassen Saidi Computer Science Lab SRI International

Objectives • Now that we have various ways of knowing what the malware does

Objectives • Now that we have various ways of knowing what the malware does when running on an infected system, we aim at answering two fundamental questions: 1. How does it do it? 1. What are the full capability of the malware: both observed behavior and yet to be triggered behavior

Dynamic vs Static Malware Analysis • Dynamic Analysis – Techniques that profile actions of

Dynamic vs Static Malware Analysis • Dynamic Analysis – Techniques that profile actions of binary at runtime – Only provides partial ``effects-oriented profile’’ of malware potential • Static Analysis – Techniques that apply program analysis to the binary code – Can provide complementary insights – Potential for more comprehensive assessment

Malgram Report • …go interactive

Malgram Report • …go interactive

From Binary To Semantically Rich C Code Raw Binary Disassembly

From Binary To Semantically Rich C Code Raw Binary Disassembly

From Binary To Semantically Rich C Code Complete Disassembly

From Binary To Semantically Rich C Code Complete Disassembly

From Binary To Semantically Rich C Code Decompiled C code

From Binary To Semantically Rich C Code Decompiled C code

Challenges in Static Analysis Raw Binary Complete Disassembly Decompiled C code

Challenges in Static Analysis Raw Binary Complete Disassembly Decompiled C code

Malware Obfuscation • Most malware is obfuscated • Packing is the most used obfuscation

Malware Obfuscation • Most malware is obfuscated • Packing is the most used obfuscation technique • Packing is often combined with other advanced forms of obfuscation: • Binary Rewrite to create semantically equivalent code with vastly different structure • Call obfuscation in general and API obfuscation in particular • Chuncking or “code spaghettisation” • Dead code (or functionally irrelevant code) Page 9

Challenges in Static Analysis Raw Binary Challenge: Does the binary represents the full malware

Challenges in Static Analysis Raw Binary Challenge: Does the binary represents the full malware binary logic. Disassembly

Unpacking Result Unpacking Page 11

Unpacking Result Unpacking Page 11

Packed vs Unpacked • go interactive…

Packed vs Unpacked • go interactive…

Coarse-grained Execution Monitoring • Generalized unpacking principle – Execute binary till it has sufficiently

Coarse-grained Execution Monitoring • Generalized unpacking principle – Execute binary till it has sufficiently revealed itself – Dump the process execution image for static analysis • Monitoring execution progress – Eureka employs a Windows driver that hooks to SSDT (System Service Dispatch Table) – Callback invoked on each NTDLL system call – Filtering based on malware process pid

Statistics-based Unpacking • Observations – Statistical properties of packed executable differ from unpacked executable

Statistics-based Unpacking • Observations – Statistical properties of packed executable differ from unpacked executable – As malware executes code-to-data ratio increases • Complications – Code and data sections are interleaved in PE executables – Data directories (import tables) look similar to data but are often found in code sections – Properties of data sections vary with packers

Statistics-based Unpacking (3) Bigram Calc 117 KB Explorer 1010 KB Ipconfig 59 KB lpr

Statistics-based Unpacking (3) Bigram Calc 117 KB Explorer 1010 KB Ipconfig 59 KB lpr 11 KB Mshearts 131 KB Notepad 72 KB Ping 21 KB Shutdown 23 KB Taskman 19 KB FF 15 call 246 3045 184 24 192 415 58 132 126 FF 75 push 235 2494 272 33 274 254 41 63 85 E 8 _ _ _ 0 xff call 1583 2201 181 19 369 180 87 49 41 E 8 _ _ _ 0 x 00 call 746 1091 152 62 641 108 57 66 50

Evaluation (ASPack)

Evaluation (ASPack)

Evaluation (Mole. Box)

Evaluation (Mole. Box)

API Resolution • User-level malware programs require system calls to perform malicious actions •

API Resolution • User-level malware programs require system calls to perform malicious actions • Use Win 32 API to access user level libraries • Obfuscations impede malware analysis using disassemblers and decompilers – Packers use non-standard linking and loading of dlls – Obfuscated API resolution

Standard API Resolution Imports in IAT identified by IDA by looking at Import Table

Standard API Resolution Imports in IAT identified by IDA by looking at Import Table

Resolving API Calls Using Dataflow Analysis • Identify register based indirect calls Get. Environment.

Resolving API Calls Using Dataflow Analysis • Identify register based indirect calls Get. Environment. String. W def use

Evaluation Metrics • Measuring analyzability – Code-to-data ratio • Use disassembler to separate code

Evaluation Metrics • Measuring analyzability – Code-to-data ratio • Use disassembler to separate code and data. • Most successfully unpacked malware have code-to-data ratio over 50% – API resolution success • Percentage of API calls that have been resolved from the set of all call sites. • Higher percentage implies more the malware is amenable to static anlaysis.

Challenges in Static Analysis Challenge: Can we isolate subroutines? Disassembly Complete Disassembly

Challenges in Static Analysis Challenge: Can we isolate subroutines? Disassembly Complete Disassembly

Binary Rewrites • go interactive …

Binary Rewrites • go interactive …

From Raw Binary To Decompiled C Code Raw Binary Complete Disassembly Decompiled C code

From Raw Binary To Decompiled C Code Raw Binary Complete Disassembly Decompiled C code

Renaissance: Improving C Code Readability void *sub_9 AB 966(int a 1, void *source, unsigned

Renaissance: Improving C Code Readability void *sub_9 AB 966(int a 1, void *source, unsigned int a 3) { int v 3, v 4, v 5, v 6, v 8; v 3 = a 1; v 4 = *(_DWORD *)(a 1 + 20) + 8 * a 3; v 5 = (*(_DWORD *)(a 1 + 20) >> 3) & 0 x 3 F; *(_DWORD *)(a 1 + 20) = v 4; if ( v 4 < 8 * a 3 ) ++*(_DWORD *)(a 1 + 24); *(_DWORD *)(a 1 + 24) += a 3 >> 29; if ( v 5 + a 3 <= 0 x 3 F ) { v 6 = 0; } else { v 6 = 64 - v 5; memcpy((void *)(v 5 + a 1 + 28), source, 64 - v 5); sub_9 A 9 F 13(a 1, (void *)(a 1 + 28)); if ( v 6 + 63 < a 3 ) { v 8 = v 6 + 63; do { sub_9 A 9 F 13(v 3, (char *)source + v 8 - 63); v 8 += 64; v 6 += 64; } while ( v 8 < a 3 ); } v 5 = 0; } return memcpy((void *)(v 5 + v 3 + 28), (char *)source + v 6, a 3 - v 6); void *sub_9 AB 966(unsigned int *destination 1, unsigned int *source, size_t num 1) { unsigned int *destination 2; size_t num 3, num 2, num 4, num 5; destination 2 = destination 1; num 3 = destination 1[20] + 8 * num 1; num 2 = (destination 1[20] >> 3) & 0 x 3 F; destination 1[20] = num 3; if ( num 3 < 8 * num 1 ) ++destination 1[24]; destination 1[24] += num 1 >> 29; if ( num 2 + num 1 <= 0 x 3 F ) { num 4 = 0; } else { num 4 = 64 - num 2; memcpy( &destination 1[num 2 + 28], source, 64 - num 2); sub_9 A 9 F 13( destination 1, &destination 1[28] ); if ( num 4 + 63 < num 1 ) { num 5 = num 4 + 63; do { sub_9 A 9 F 13( destination 2, &source[num 5 - 63] ); num 5 += 64; num 4 += 64; } while ( num 5 < num 1 ); } num 2 = 0; } return memcpy( &destination 2[num 2 + 28], &source[num 4], num 1 - num 4 ); } Hex Rays + Renaissance

1. Typing and naming variables void *sub_9 AB 966(int a 1, void *source, unsigned

1. Typing and naming variables void *sub_9 AB 966(int a 1, void *source, unsigned int a 3) { int v 3, v 4, v 5, v 6, v 8; v 3 = a 1; v 4 = *(_DWORD *)(a 1 + 20) + 8 * a 3; v 5 = (*(_DWORD *)(a 1 + 20) >> 3) & 0 x 3 F; *(_DWORD *)(a 1 + 20) = v 4; if ( v 4 < 8 * a 3 ) ++*(_DWORD *)(a 1 + 24); *(_DWORD *)(a 1 + 24) += a 3 >> 29; if ( v 5 + a 3 <= 0 x 3 F ) { v 6 = 0; } else { v 6 = 64 - v 5; memcpy((void *)(v 5 + a 1 + 28), source, 64 - v 5); sub_9 A 9 F 13(a 1, (void *)(a 1 + 28)); if ( v 6 + 63 < a 3 ) { v 8 = v 6 + 63; do { sub_9 A 9 F 13(v 3, (char *)source + v 8 - 63); v 8 += 64; v 6 += 64; } while ( v 8 < a 3 ); } v 5 = 0; } return memcpy((void *)(v 5 + v 3 + 28), (char *)source + v 6, a 3 - v 6); void *sub_9 AB 966(unsigned int *destination 1, unsigned int *source, size_t num 1) { unsigned int *destination 2; size_t num 3, num 2, num 4, num 5; destination 2 = destination 1; num 3 = destination 1[20] + 8 * num 1; num 2 = (destination 1[20] >> 3) & 0 x 3 F; destination 1[20] = num 3; if ( num 3 < 8 * num 1 ) ++destination 1[24]; destination 1[24] += num 1 >> 29; if ( num 2 + num 1 <= 0 x 3 F ) { num 4 = 0; } else { num 4 = 64 - num 2; memcpy( &destination 1[num 2 + 28], source, 64 - num 2); sub_9 A 9 F 13( destination 1, &destination 1[28] ); if ( num 4 + 63 < num 1 ) { num 5 = num 4 + 63; do { sub_9 A 9 F 13( destination 2, &source[num 5 - 63] ); num 5 += 64; num 4 += 64; } while ( num 5 < num 1 ); } num 2 = 0; } return memcpy( &destination 2[num 2 + 28], &source[num 4], num 1 - num 4 ); } Hex Rays + Renaissance

2. Highlighting important vars void *sub_9 AB 966(int a 1, void *source, unsigned int

2. Highlighting important vars void *sub_9 AB 966(int a 1, void *source, unsigned int a 3) { int v 3, v 4, v 5, v 6, v 8; v 3 = a 1; v 4 = *(_DWORD *)(a 1 + 20) + 8 * a 3; v 5 = (*(_DWORD *)(a 1 + 20) >> 3) & 0 x 3 F; *(_DWORD *)(a 1 + 20) = v 4; if ( v 4 < 8 * a 3 ) ++*(_DWORD *)(a 1 + 24); *(_DWORD *)(a 1 + 24) += a 3 >> 29; if ( v 5 + a 3 <= 0 x 3 F ) { v 6 = 0; } else { v 6 = 64 - v 5; memcpy((void *)(v 5 + a 1 + 28), source, 64 - v 5); sub_9 A 9 F 13(a 1, (void *)(a 1 + 28)); if ( v 6 + 63 < a 3 ) { v 8 = v 6 + 63; do { sub_9 A 9 F 13(v 3, (char *)source + v 8 - 63); v 8 += 64; v 6 += 64; } while ( v 8 < a 3 ); } v 5 = 0; } return memcpy((void *)(v 5 + v 3 + 28), (char *)source + v 6, a 3 - v 6); void *sub_9 AB 966(unsigned int *destination 1, unsigned int *source, size_t num 1) { unsigned int *destination 2; size_t num 3, num 2, num 4, num 5; destination 2 = destination 1; num 3 = destination 1[20] + 8 * num 1; num 2 = (destination 1[20] >> 3) & 0 x 3 F; destination 1[20] = num 3; if ( num 3 < 8 * num 1 ) ++destination 1[24]; destination 1[24] += num 1 >> 29; if ( num 2 + num 1 <= 0 x 3 F ) { num 4 = 0; } else { num 4 = 64 - num 2; memcpy( &destination 1[num 2 + 28], source, 64 - num 2); sub_9 A 9 F 13( destination 1, &destination 1[28] ); if ( num 4 + 63 < num 1 ) { num 5 = num 4 + 63; do { sub_9 A 9 F 13( destination 2, &source[num 5 - 63] ); num 5 += 64; num 4 += 64; } while ( num 5 < num 1 ); } num 2 = 0; } return memcpy( &destination 2[num 2 + 28], &source[num 4], num 1 - num 4 ); } Hex Rays + Renaissance

3. Improvements to decompilation void *sub_9 AB 966(int a 1, void *source, unsigned int

3. Improvements to decompilation void *sub_9 AB 966(int a 1, void *source, unsigned int a 3) { int v 3, v 4, v 5, v 6, v 8; v 3 = a 1; v 4 = *(_DWORD *)(a 1 + 20) + 8 * a 3; v 5 = (*(_DWORD *)(a 1 + 20) >> 3) & 0 x 3 F; *(_DWORD *)(a 1 + 20) = v 4; if ( v 4 < 8 * a 3 ) ++*(_DWORD *)(a 1 + 24); *(_DWORD *)(a 1 + 24) += a 3 >> 29; if ( v 5 + a 3 <= 0 x 3 F ) { v 6 = 0; } else { v 6 = 64 - v 5; memcpy((void *)(v 5 + a 1 + 28), source, 64 - v 5); sub_9 A 9 F 13(a 1, (void *)(a 1 + 28)); if ( v 6 + 63 < a 3 ) { v 8 = v 6 + 63; do { sub_9 A 9 F 13(v 3, (char *)source + v 8 - 63); v 8 += 64; v 6 += 64; } while ( v 8 < a 3 ); } v 5 = 0; } return memcpy((void *)(v 5 + v 3 + 28), (char *)source + v 6, a 3 - v 6); void *sub_9 AB 966(unsigned int *destination 1, unsigned int *source, size_t num 1) { unsigned int *destination 2; size_t num 3, num 2, num 4, num 5; destination 2 = destination 1; num 3 = destination 1[20] + 8 * num 1; num 2 = (destination 1[20] >> 3) & 0 x 3 F; destination 1[20] = num 3; if ( num 3 < 8 * num 1 ) ++destination 1[24]; destination 1[24] += num 1 >> 29; if ( num 2 + num 1 <= 0 x 3 F ) { num 4 = 0; } else { num 4 = 64 - num 2; memcpy( &destination 1[num 2 + 28], source, 64 - num 2); sub_9 A 9 F 13( destination 1, &destination 1[28] ); if ( num 4 + 63 < num 1 ) { num 5 = num 4 + 63; do { sub_9 A 9 F 13( destination 2, &source[num 5 - 63] ); num 5 += 64; num 4 += 64; } while ( num 5 < num 1 ); } num 2 = 0; } return memcpy( &destination 2[num 2 + 28], &source[num 4], num 1 - num 4 ); } Hex Rays + Renaissance

4. Caller → Callee type info void *sub_9 AB 966(int a 1, void *source,

4. Caller → Callee type info void *sub_9 AB 966(int a 1, void *source, unsigned int a 3) { int v 3, v 4, v 5, v 6, v 8; v 3 = a 1; v 4 = *(_DWORD *)(a 1 + 20) + 8 * a 3; v 5 = (*(_DWORD *)(a 1 + 20) >> 3) & 0 x 3 F; *(_DWORD *)(a 1 + 20) = v 4; if ( v 4 < 8 * a 3 ) ++*(_DWORD *)(a 1 + 24); *(_DWORD *)(a 1 + 24) += a 3 >> 29; if ( v 5 + a 3 <= 0 x 3 F ) { v 6 = 0; } else { v 6 = 64 - v 5; memcpy((void *)(v 5 + a 1 + 28), source, 64 - v 5); sub_9 A 9 F 13(a 1, (void *)(a 1 + 28)); if ( v 6 + 63 < a 3 ) { v 8 = v 6 + 63; do { sub_9 A 9 F 13(v 3, (char *)source + v 8 - 63); v 8 += 64; v 6 += 64; } while ( v 8 < a 3 ); } v 5 = 0; } return memcpy((void *)(v 5 + v 3 + 28), (char *)source + v 6, a 3 - v 6); void *sub_9 AB 966(unsigned int *destination 1, unsigned int *source, size_t num 1) { unsigned int *destination 2; size_t num 3, num 2, num 4, num 5; destination 2 = destination 1; num 3 = destination 1[20] + 8 * num 1; num 2 = (destination 1[20] >> 3) & 0 x 3 F; destination 1[20] = num 3; if ( num 3 < 8 * num 1 ) ++destination 1[24]; destination 1[24] += num 1 >> 29; if ( num 2 + num 1 <= 0 x 3 F ) { num 4 = 0; } else { num 4 = 64 - num 2; memcpy( &destination 1[num 2 + 28], source, 64 - num 2); sub_9 A 9 F 13( destination 1, &destination 1[28] ); if ( num 4 + 63 < num 1 ) { num 5 = num 4 + 63; do { sub_9 A 9 F 13( destination 2, &source[num 5 - 63] ); num 5 += 64; num 4 += 64; } while ( num 5 < num 1 ); } num 2 = 0; } return memcpy( &destination 2[num 2 + 28], &source[num 4], num 1 - num 4 ); } Hex Rays + Renaissance

Evaluation

Evaluation

Challenges in Static Analysis Raw Binary Complete Disassembly Decompiled C code

Challenges in Static Analysis Raw Binary Complete Disassembly Decompiled C code

The Need for Rapid Crypto-Algorithm Isolation RC 4 Rustock Zeus Conficker AES Truecrypt Waledac

The Need for Rapid Crypto-Algorithm Isolation RC 4 Rustock Zeus Conficker AES Truecrypt Waledac SSL Agobot (IRC over SSL) Custom Crypto / Encoding Pushdo Kraken mebroot Mega-D Serpent Truecrypt XOR-Custom Lethic Virut Hydraq Torpig Cascades Truecrypt RSA variants Nugashe Conficker Waledac Blowfish - 448 bit Clampi Twofish Truecrypt HASH Whirlpool Truecrypt HASH MD 6 conficker BC HASH SHA 1 conficker A Truecrypt

Intra. Module is. Crypto() Constant Detector Intra-module Analyzer Constant Data Loading Padding Analysis is.

Intra. Module is. Crypto() Constant Detector Intra-module Analyzer Constant Data Loading Padding Analysis is. Crypto Score = is. Const + is. Padded + Crypt API fn (Large. Var, Loop Detection, Opcodes, Big. Math) Microsoft Crypto. API CAPICON Unknown Computation Large Local Variables crypto. Fn. Detection () Loop Detection Big Number Math Opcode Analysis crypto. Fn. Detection () – At least 2 matches

Constant detection Blowfish Camelia CAST 256 CRC 32 DES GOST HAVAL MARS MD 2

Constant detection Blowfish Camelia CAST 256 CRC 32 DES GOST HAVAL MARS MD 2 PKCS_MD 5 PKCS_RIPEMD 160 PKCS_SHA 256 PKCS_SHA 384 PKCS_SHA 512 PKCS_Tiger Raw. DES RC 2 Rijndael SAFER SHA 1 SHA 256 SHA 512 SHARK SKIPJACK Square Tiger Twofish WAKE Whirlpool zlib AES MD 6 Direct Reference

Blowfish Camelia CAST Data array CAST 256 contains CRC 32 Known crypto DES content

Blowfish Camelia CAST Data array CAST 256 contains CRC 32 Known crypto DES content GOST HAVAL MARS MD 2 PKCS_MD 5 PKCS_RIPEMD 160 PKCS_SHA 256 PKCS_SHA 384 PKCS_SHA 512 PKCS_Tiger Raw. DES RC 2 Rijndael SAFER SHA 1 SHA 256 SHA 512 SHARK SKIPJACK Square Tiger Twofish WAKE Whirlpool zlib AES MD 6 Indirect Load Unknown Computation Load Array This could be Encryption Or Decryption

Inter-module Analyzer func Color. Node (Subgraph) { if (exists uncolored subgraph) Color. Node (subgraph)

Inter-module Analyzer func Color. Node (Subgraph) { if (exists uncolored subgraph) Color. Node (subgraph) foreach leaf in subgraph { is. Crypto(Leaf) } If (exist green leaf) then color root green if (exist orange leaf) then color root orange if (exist > 2 red leaves) then color root red } AES func crypto. String (per subroutine) if node contains known crypto implementation substring, label node with corresponding crypto library. MD 6 Vowpal wabbit

IDA Pro Call Graph w/ Crypto-routine detection

IDA Pro Call Graph w/ Crypto-routine detection

Example Running SRI Crypt Finder (c) SRI International Finding crypto constants and subroutines in

Example Running SRI Crypt Finder (c) SRI International Finding crypto constants and subroutines in binary files automatic discovery of crypto functions as unknown computations 4 BABF 1: found sparse constants for SHA-1 50 C 254: found const array sbox_AES (used in AES) 50 E 354: found const array rsbox_AES (used in AES) 50 F 574: found const array Twofish_q (used in Twofish) 50 F 7 A 4: found const array MARS_Sbox (used in MARS) 510 EA 4: found const array zinflate_length. Extra. Bits (used in zlib) 510 F 18: found const array zinflate_distance. Extra. Bits (used in zlib) 511918: found const array CRC 32_m_tab (used in CRC 32) 514 F 98: found const array CRC 32_m_tab (used in CRC 32) Found 9 known constant arrays in total. Scanning code for crypto subroutines found crypto in Function @ 407334 found crypto in Function @ 40 E 5 B 4 found crypto in Function @ 47 D 954 found crypto in Function @ 47 ED 34 found crypto in Function @ 4816 F 4 found crypto in Function @ 4 B 6624 found crypto in Function @ 4 B 9980 found crypto in Function @ 4 CCBD 4 found crypto in Function @ 4 CCD 4 C found crypto in Function @ 4 CE 208 found crypto in Function @ 4 CE 7 CC found crypto in Function @ 4 CEBE 8 found crypto in Function @ 4 D 9 B 00 found crypto in Function @ 4 D 9 EE 4 Done labelling crypto subroutines Found 14 subroutine(s) with possible crypto

Running SRI Crypt Finder

Running SRI Crypt Finder

Report Generation • go interactive

Report Generation • go interactive