Malgram Malware Analysis Malware Unpacking Static Analysis Code
- Slides: 41
Malgram Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation Phillip Porras and Hassen Saidi Computer Science Lab SRI International
Objectives • Now that we have various ways of knowing what the malware does when running on an infected system, we aim at answering two fundamental questions: 1. How does it do it? 1. What are the full capability of the malware: both observed behavior and yet to be triggered behavior
Dynamic vs Static Malware Analysis • Dynamic Analysis – Techniques that profile actions of binary at runtime – Only provides partial ``effects-oriented profile’’ of malware potential • Static Analysis – Techniques that apply program analysis to the binary code – Can provide complementary insights – Potential for more comprehensive assessment
Malgram Report • …go interactive
From Binary To Semantically Rich C Code Raw Binary Disassembly
From Binary To Semantically Rich C Code Complete Disassembly
From Binary To Semantically Rich C Code Decompiled C code
Challenges in Static Analysis Raw Binary Complete Disassembly Decompiled C code
Malware Obfuscation • Most malware is obfuscated • Packing is the most used obfuscation technique • Packing is often combined with other advanced forms of obfuscation: • Binary Rewrite to create semantically equivalent code with vastly different structure • Call obfuscation in general and API obfuscation in particular • Chuncking or “code spaghettisation” • Dead code (or functionally irrelevant code) Page 9
Challenges in Static Analysis Raw Binary Challenge: Does the binary represents the full malware binary logic. Disassembly
Unpacking Result Unpacking Page 11
Packed vs Unpacked • go interactive…
Coarse-grained Execution Monitoring • Generalized unpacking principle – Execute binary till it has sufficiently revealed itself – Dump the process execution image for static analysis • Monitoring execution progress – Eureka employs a Windows driver that hooks to SSDT (System Service Dispatch Table) – Callback invoked on each NTDLL system call – Filtering based on malware process pid
Statistics-based Unpacking • Observations – Statistical properties of packed executable differ from unpacked executable – As malware executes code-to-data ratio increases • Complications – Code and data sections are interleaved in PE executables – Data directories (import tables) look similar to data but are often found in code sections – Properties of data sections vary with packers
Statistics-based Unpacking (3) Bigram Calc 117 KB Explorer 1010 KB Ipconfig 59 KB lpr 11 KB Mshearts 131 KB Notepad 72 KB Ping 21 KB Shutdown 23 KB Taskman 19 KB FF 15 call 246 3045 184 24 192 415 58 132 126 FF 75 push 235 2494 272 33 274 254 41 63 85 E 8 _ _ _ 0 xff call 1583 2201 181 19 369 180 87 49 41 E 8 _ _ _ 0 x 00 call 746 1091 152 62 641 108 57 66 50
Evaluation (ASPack)
Evaluation (Mole. Box)
API Resolution • User-level malware programs require system calls to perform malicious actions • Use Win 32 API to access user level libraries • Obfuscations impede malware analysis using disassemblers and decompilers – Packers use non-standard linking and loading of dlls – Obfuscated API resolution
Standard API Resolution Imports in IAT identified by IDA by looking at Import Table
Resolving API Calls Using Dataflow Analysis • Identify register based indirect calls Get. Environment. String. W def use
Evaluation Metrics • Measuring analyzability – Code-to-data ratio • Use disassembler to separate code and data. • Most successfully unpacked malware have code-to-data ratio over 50% – API resolution success • Percentage of API calls that have been resolved from the set of all call sites. • Higher percentage implies more the malware is amenable to static anlaysis.
Challenges in Static Analysis Challenge: Can we isolate subroutines? Disassembly Complete Disassembly
Binary Rewrites • go interactive …
From Raw Binary To Decompiled C Code Raw Binary Complete Disassembly Decompiled C code
Renaissance: Improving C Code Readability void *sub_9 AB 966(int a 1, void *source, unsigned int a 3) { int v 3, v 4, v 5, v 6, v 8; v 3 = a 1; v 4 = *(_DWORD *)(a 1 + 20) + 8 * a 3; v 5 = (*(_DWORD *)(a 1 + 20) >> 3) & 0 x 3 F; *(_DWORD *)(a 1 + 20) = v 4; if ( v 4 < 8 * a 3 ) ++*(_DWORD *)(a 1 + 24); *(_DWORD *)(a 1 + 24) += a 3 >> 29; if ( v 5 + a 3 <= 0 x 3 F ) { v 6 = 0; } else { v 6 = 64 - v 5; memcpy((void *)(v 5 + a 1 + 28), source, 64 - v 5); sub_9 A 9 F 13(a 1, (void *)(a 1 + 28)); if ( v 6 + 63 < a 3 ) { v 8 = v 6 + 63; do { sub_9 A 9 F 13(v 3, (char *)source + v 8 - 63); v 8 += 64; v 6 += 64; } while ( v 8 < a 3 ); } v 5 = 0; } return memcpy((void *)(v 5 + v 3 + 28), (char *)source + v 6, a 3 - v 6); void *sub_9 AB 966(unsigned int *destination 1, unsigned int *source, size_t num 1) { unsigned int *destination 2; size_t num 3, num 2, num 4, num 5; destination 2 = destination 1; num 3 = destination 1[20] + 8 * num 1; num 2 = (destination 1[20] >> 3) & 0 x 3 F; destination 1[20] = num 3; if ( num 3 < 8 * num 1 ) ++destination 1[24]; destination 1[24] += num 1 >> 29; if ( num 2 + num 1 <= 0 x 3 F ) { num 4 = 0; } else { num 4 = 64 - num 2; memcpy( &destination 1[num 2 + 28], source, 64 - num 2); sub_9 A 9 F 13( destination 1, &destination 1[28] ); if ( num 4 + 63 < num 1 ) { num 5 = num 4 + 63; do { sub_9 A 9 F 13( destination 2, &source[num 5 - 63] ); num 5 += 64; num 4 += 64; } while ( num 5 < num 1 ); } num 2 = 0; } return memcpy( &destination 2[num 2 + 28], &source[num 4], num 1 - num 4 ); } Hex Rays + Renaissance
1. Typing and naming variables void *sub_9 AB 966(int a 1, void *source, unsigned int a 3) { int v 3, v 4, v 5, v 6, v 8; v 3 = a 1; v 4 = *(_DWORD *)(a 1 + 20) + 8 * a 3; v 5 = (*(_DWORD *)(a 1 + 20) >> 3) & 0 x 3 F; *(_DWORD *)(a 1 + 20) = v 4; if ( v 4 < 8 * a 3 ) ++*(_DWORD *)(a 1 + 24); *(_DWORD *)(a 1 + 24) += a 3 >> 29; if ( v 5 + a 3 <= 0 x 3 F ) { v 6 = 0; } else { v 6 = 64 - v 5; memcpy((void *)(v 5 + a 1 + 28), source, 64 - v 5); sub_9 A 9 F 13(a 1, (void *)(a 1 + 28)); if ( v 6 + 63 < a 3 ) { v 8 = v 6 + 63; do { sub_9 A 9 F 13(v 3, (char *)source + v 8 - 63); v 8 += 64; v 6 += 64; } while ( v 8 < a 3 ); } v 5 = 0; } return memcpy((void *)(v 5 + v 3 + 28), (char *)source + v 6, a 3 - v 6); void *sub_9 AB 966(unsigned int *destination 1, unsigned int *source, size_t num 1) { unsigned int *destination 2; size_t num 3, num 2, num 4, num 5; destination 2 = destination 1; num 3 = destination 1[20] + 8 * num 1; num 2 = (destination 1[20] >> 3) & 0 x 3 F; destination 1[20] = num 3; if ( num 3 < 8 * num 1 ) ++destination 1[24]; destination 1[24] += num 1 >> 29; if ( num 2 + num 1 <= 0 x 3 F ) { num 4 = 0; } else { num 4 = 64 - num 2; memcpy( &destination 1[num 2 + 28], source, 64 - num 2); sub_9 A 9 F 13( destination 1, &destination 1[28] ); if ( num 4 + 63 < num 1 ) { num 5 = num 4 + 63; do { sub_9 A 9 F 13( destination 2, &source[num 5 - 63] ); num 5 += 64; num 4 += 64; } while ( num 5 < num 1 ); } num 2 = 0; } return memcpy( &destination 2[num 2 + 28], &source[num 4], num 1 - num 4 ); } Hex Rays + Renaissance
2. Highlighting important vars void *sub_9 AB 966(int a 1, void *source, unsigned int a 3) { int v 3, v 4, v 5, v 6, v 8; v 3 = a 1; v 4 = *(_DWORD *)(a 1 + 20) + 8 * a 3; v 5 = (*(_DWORD *)(a 1 + 20) >> 3) & 0 x 3 F; *(_DWORD *)(a 1 + 20) = v 4; if ( v 4 < 8 * a 3 ) ++*(_DWORD *)(a 1 + 24); *(_DWORD *)(a 1 + 24) += a 3 >> 29; if ( v 5 + a 3 <= 0 x 3 F ) { v 6 = 0; } else { v 6 = 64 - v 5; memcpy((void *)(v 5 + a 1 + 28), source, 64 - v 5); sub_9 A 9 F 13(a 1, (void *)(a 1 + 28)); if ( v 6 + 63 < a 3 ) { v 8 = v 6 + 63; do { sub_9 A 9 F 13(v 3, (char *)source + v 8 - 63); v 8 += 64; v 6 += 64; } while ( v 8 < a 3 ); } v 5 = 0; } return memcpy((void *)(v 5 + v 3 + 28), (char *)source + v 6, a 3 - v 6); void *sub_9 AB 966(unsigned int *destination 1, unsigned int *source, size_t num 1) { unsigned int *destination 2; size_t num 3, num 2, num 4, num 5; destination 2 = destination 1; num 3 = destination 1[20] + 8 * num 1; num 2 = (destination 1[20] >> 3) & 0 x 3 F; destination 1[20] = num 3; if ( num 3 < 8 * num 1 ) ++destination 1[24]; destination 1[24] += num 1 >> 29; if ( num 2 + num 1 <= 0 x 3 F ) { num 4 = 0; } else { num 4 = 64 - num 2; memcpy( &destination 1[num 2 + 28], source, 64 - num 2); sub_9 A 9 F 13( destination 1, &destination 1[28] ); if ( num 4 + 63 < num 1 ) { num 5 = num 4 + 63; do { sub_9 A 9 F 13( destination 2, &source[num 5 - 63] ); num 5 += 64; num 4 += 64; } while ( num 5 < num 1 ); } num 2 = 0; } return memcpy( &destination 2[num 2 + 28], &source[num 4], num 1 - num 4 ); } Hex Rays + Renaissance
3. Improvements to decompilation void *sub_9 AB 966(int a 1, void *source, unsigned int a 3) { int v 3, v 4, v 5, v 6, v 8; v 3 = a 1; v 4 = *(_DWORD *)(a 1 + 20) + 8 * a 3; v 5 = (*(_DWORD *)(a 1 + 20) >> 3) & 0 x 3 F; *(_DWORD *)(a 1 + 20) = v 4; if ( v 4 < 8 * a 3 ) ++*(_DWORD *)(a 1 + 24); *(_DWORD *)(a 1 + 24) += a 3 >> 29; if ( v 5 + a 3 <= 0 x 3 F ) { v 6 = 0; } else { v 6 = 64 - v 5; memcpy((void *)(v 5 + a 1 + 28), source, 64 - v 5); sub_9 A 9 F 13(a 1, (void *)(a 1 + 28)); if ( v 6 + 63 < a 3 ) { v 8 = v 6 + 63; do { sub_9 A 9 F 13(v 3, (char *)source + v 8 - 63); v 8 += 64; v 6 += 64; } while ( v 8 < a 3 ); } v 5 = 0; } return memcpy((void *)(v 5 + v 3 + 28), (char *)source + v 6, a 3 - v 6); void *sub_9 AB 966(unsigned int *destination 1, unsigned int *source, size_t num 1) { unsigned int *destination 2; size_t num 3, num 2, num 4, num 5; destination 2 = destination 1; num 3 = destination 1[20] + 8 * num 1; num 2 = (destination 1[20] >> 3) & 0 x 3 F; destination 1[20] = num 3; if ( num 3 < 8 * num 1 ) ++destination 1[24]; destination 1[24] += num 1 >> 29; if ( num 2 + num 1 <= 0 x 3 F ) { num 4 = 0; } else { num 4 = 64 - num 2; memcpy( &destination 1[num 2 + 28], source, 64 - num 2); sub_9 A 9 F 13( destination 1, &destination 1[28] ); if ( num 4 + 63 < num 1 ) { num 5 = num 4 + 63; do { sub_9 A 9 F 13( destination 2, &source[num 5 - 63] ); num 5 += 64; num 4 += 64; } while ( num 5 < num 1 ); } num 2 = 0; } return memcpy( &destination 2[num 2 + 28], &source[num 4], num 1 - num 4 ); } Hex Rays + Renaissance
4. Caller → Callee type info void *sub_9 AB 966(int a 1, void *source, unsigned int a 3) { int v 3, v 4, v 5, v 6, v 8; v 3 = a 1; v 4 = *(_DWORD *)(a 1 + 20) + 8 * a 3; v 5 = (*(_DWORD *)(a 1 + 20) >> 3) & 0 x 3 F; *(_DWORD *)(a 1 + 20) = v 4; if ( v 4 < 8 * a 3 ) ++*(_DWORD *)(a 1 + 24); *(_DWORD *)(a 1 + 24) += a 3 >> 29; if ( v 5 + a 3 <= 0 x 3 F ) { v 6 = 0; } else { v 6 = 64 - v 5; memcpy((void *)(v 5 + a 1 + 28), source, 64 - v 5); sub_9 A 9 F 13(a 1, (void *)(a 1 + 28)); if ( v 6 + 63 < a 3 ) { v 8 = v 6 + 63; do { sub_9 A 9 F 13(v 3, (char *)source + v 8 - 63); v 8 += 64; v 6 += 64; } while ( v 8 < a 3 ); } v 5 = 0; } return memcpy((void *)(v 5 + v 3 + 28), (char *)source + v 6, a 3 - v 6); void *sub_9 AB 966(unsigned int *destination 1, unsigned int *source, size_t num 1) { unsigned int *destination 2; size_t num 3, num 2, num 4, num 5; destination 2 = destination 1; num 3 = destination 1[20] + 8 * num 1; num 2 = (destination 1[20] >> 3) & 0 x 3 F; destination 1[20] = num 3; if ( num 3 < 8 * num 1 ) ++destination 1[24]; destination 1[24] += num 1 >> 29; if ( num 2 + num 1 <= 0 x 3 F ) { num 4 = 0; } else { num 4 = 64 - num 2; memcpy( &destination 1[num 2 + 28], source, 64 - num 2); sub_9 A 9 F 13( destination 1, &destination 1[28] ); if ( num 4 + 63 < num 1 ) { num 5 = num 4 + 63; do { sub_9 A 9 F 13( destination 2, &source[num 5 - 63] ); num 5 += 64; num 4 += 64; } while ( num 5 < num 1 ); } num 2 = 0; } return memcpy( &destination 2[num 2 + 28], &source[num 4], num 1 - num 4 ); } Hex Rays + Renaissance
Evaluation
Challenges in Static Analysis Raw Binary Complete Disassembly Decompiled C code
The Need for Rapid Crypto-Algorithm Isolation RC 4 Rustock Zeus Conficker AES Truecrypt Waledac SSL Agobot (IRC over SSL) Custom Crypto / Encoding Pushdo Kraken mebroot Mega-D Serpent Truecrypt XOR-Custom Lethic Virut Hydraq Torpig Cascades Truecrypt RSA variants Nugashe Conficker Waledac Blowfish - 448 bit Clampi Twofish Truecrypt HASH Whirlpool Truecrypt HASH MD 6 conficker BC HASH SHA 1 conficker A Truecrypt
Intra. Module is. Crypto() Constant Detector Intra-module Analyzer Constant Data Loading Padding Analysis is. Crypto Score = is. Const + is. Padded + Crypt API fn (Large. Var, Loop Detection, Opcodes, Big. Math) Microsoft Crypto. API CAPICON Unknown Computation Large Local Variables crypto. Fn. Detection () Loop Detection Big Number Math Opcode Analysis crypto. Fn. Detection () – At least 2 matches
Constant detection Blowfish Camelia CAST 256 CRC 32 DES GOST HAVAL MARS MD 2 PKCS_MD 5 PKCS_RIPEMD 160 PKCS_SHA 256 PKCS_SHA 384 PKCS_SHA 512 PKCS_Tiger Raw. DES RC 2 Rijndael SAFER SHA 1 SHA 256 SHA 512 SHARK SKIPJACK Square Tiger Twofish WAKE Whirlpool zlib AES MD 6 Direct Reference
Blowfish Camelia CAST Data array CAST 256 contains CRC 32 Known crypto DES content GOST HAVAL MARS MD 2 PKCS_MD 5 PKCS_RIPEMD 160 PKCS_SHA 256 PKCS_SHA 384 PKCS_SHA 512 PKCS_Tiger Raw. DES RC 2 Rijndael SAFER SHA 1 SHA 256 SHA 512 SHARK SKIPJACK Square Tiger Twofish WAKE Whirlpool zlib AES MD 6 Indirect Load Unknown Computation Load Array This could be Encryption Or Decryption
Inter-module Analyzer func Color. Node (Subgraph) { if (exists uncolored subgraph) Color. Node (subgraph) foreach leaf in subgraph { is. Crypto(Leaf) } If (exist green leaf) then color root green if (exist orange leaf) then color root orange if (exist > 2 red leaves) then color root red } AES func crypto. String (per subroutine) if node contains known crypto implementation substring, label node with corresponding crypto library. MD 6 Vowpal wabbit
IDA Pro Call Graph w/ Crypto-routine detection
Example Running SRI Crypt Finder (c) SRI International Finding crypto constants and subroutines in binary files automatic discovery of crypto functions as unknown computations 4 BABF 1: found sparse constants for SHA-1 50 C 254: found const array sbox_AES (used in AES) 50 E 354: found const array rsbox_AES (used in AES) 50 F 574: found const array Twofish_q (used in Twofish) 50 F 7 A 4: found const array MARS_Sbox (used in MARS) 510 EA 4: found const array zinflate_length. Extra. Bits (used in zlib) 510 F 18: found const array zinflate_distance. Extra. Bits (used in zlib) 511918: found const array CRC 32_m_tab (used in CRC 32) 514 F 98: found const array CRC 32_m_tab (used in CRC 32) Found 9 known constant arrays in total. Scanning code for crypto subroutines found crypto in Function @ 407334 found crypto in Function @ 40 E 5 B 4 found crypto in Function @ 47 D 954 found crypto in Function @ 47 ED 34 found crypto in Function @ 4816 F 4 found crypto in Function @ 4 B 6624 found crypto in Function @ 4 B 9980 found crypto in Function @ 4 CCBD 4 found crypto in Function @ 4 CCD 4 C found crypto in Function @ 4 CE 208 found crypto in Function @ 4 CE 7 CC found crypto in Function @ 4 CEBE 8 found crypto in Function @ 4 D 9 B 00 found crypto in Function @ 4 D 9 EE 4 Done labelling crypto subroutines Found 14 subroutine(s) with possible crypto
Running SRI Crypt Finder
Report Generation • go interactive
- Cuckoo sandbox vm
- Malwr sandbox
- Code commit code build code deploy
- Klocwork static code analysis
- Static code analysis binary
- How to unpack a prompt
- Unpacking common core standards
- Unpacking competencies using 5ps
- Unpacking standards template
- Gif unpacking in census
- Unpackıng
- Unpacking the teks
- Cte teks texas 2019
- Charada
- How might a valet satisfy a valet-serviced guest?
- Unpacking the prompt
- Texas cte teks
- Advanced malware analysis course
- Malware analysis
- Cuckoo sandbox online
- Mbrojtja nga viruset kompjuterike
- Rhmd: evasion-resilient hardware malware detectors
- Malware radar
- Malware tabletop exercise
- Bad guys mdl
- Misp malware
- Wat is malware
- Blocklist pihole
- Malware, nella sicurezza informatica indica
- Rdg packer detector
- Bomb cryptor
- Malware detection
- Type de malware
- Ploutus d
- Malware programy
- Malware architecture
- Intro to malware
- Malware beats
- Malware versus virus
- Roger malware
- Malware researcher
- Threat past simple