Analysis Of Stripped Binary Code Laune Harris University
Analysis Of Stripped Binary Code Laune Harris University of Wisconsin – Madison lharris@cs. wisc. edu www. paradyn. org
Binary code 856 c : 55 856 d : 89 e 5 856 f : 83 ec 08 8572 : e 8 ddffffff 857 b : c 9 857 c : c 3 857 d : 55 857 e : 89 e 5 8581 : 83 ec 18 858 b : e 8 bfffffff 8591 : c 9 8592 : c 3 2
Binary code (with assembly) push %ebp mov %esp, %ebp sub 8, %esp call 857 d leave ret push %ebp mov %esp, %ebp sub %eax, %ebp call 866 c leave ret 856 c : 55 856 d : 89 e 5 856 f : 83 ec 08 8572 : e 8 ddffffff 857 b : c 9 857 c : c 3 857 d : 55 857 e : 89 e 5 8581 : 83 ec 18 858 b : e 8 bfffffff 8591 : c 9 8592 : c 3 3
Binary code (with symbol info) 856 c : 55 856 d : 89 e 5 856 f : 83 ec 08 8572 : e 8 ddffffff 857 b : c 9 857 c : c 3 857 d : 55 857 e : 89 e 5 8581 : 83 ec 18 858 b : e 8 bfffffff 8591 : c 9 8592 : c 3 main push %ebp mov %esp, %ebp sub 8, %esp call foo leave ret foo push %ebp mov %esp, %ebp sub %eax, %ebp call printf leave ret 4
A lot of code is stripped • Commercial applications (usually) • Proprietary libraries (often) • Viruses • OS libraries and utilities (depends on OS and OS version) 5
Steps in symbol reconstruction • Find and name functions • Find function size 6
Finding functions • Build a call graph and traverse it to find function start addresses • Opportunistic parsing: use existing symbol names and addresses where available • Works on a spectrum of binaries ranging from binaries with all symbols to fully stripped binaries 7
Call Graph creation main push %ebp 856 c: 8
Call Graph creation main push %ebp mov %esp, %ebp sub 8, %esp call 857 d leave ret 856 c: 856 d: 856 f: 8572: 857 b: 857 c: 9
Call Graph creation main func 857 d push %ebp mov %esp, %ebp sub 8, %esp call func 857 d leave ret push %ebp 856 c: 856 d: 856 f: 8572: 857 b: 857 c: 857 d: 10
Call Graph creation main func 857 d push %ebp mov %esp, %ebp sub 8, %esp call func 857 d leave ret push %ebp mov %esp, %ebp sub %eax, %ebp call 865 e call 866 d leave ret 856 c: 856 d: 856 f: 8572: 857 b: 857 c: 857 d: 857 e: 8581: 858 b: 8591: 8596: 8597: 11
Parsing Functions • Disassemble function’s code by traversing intra-procedural control flow graph • Highest address determines function size 12
Error Detection And Recovery • CFG exit points are sometimes hard to identify • Assume branches that are not obvious exits are intra-procedural • Errors result in overestimation of function size • Overlapping functions indicate error 13
Problems and Solutions • Functions that are only called indirectly • Problem: static call graph traversal does not discover these functions • Solution: examine gaps in text space and use heuristics to find functions 14
Problems and Solutions cont’d • Indirect Jumps • Problem: need to find targets to complete CFG • Solution: parse jump tables to find possible targets 15
Problems and Solutions cont’d • Exception handling code • Problem: creates code blocks that appear unreachable • Solution: get block addresses from exception table 16
Test Programs paradyn condor_starter size (MB) unstripped 5. 44 size (MB) stripped 3. 51 number of functions 13, 676 22. 60 2. 50 8, 168 2. 61 2. 20 4, 329 10. 44 0. 51 1, 163 om 3 0. 43 0. 30 732 alara 3. 65 0. 26 948 bubba 0. 09 0. 02 66 gimp eon 17
Evaluation • Parse time (includes CFG creation) • ~1. 4 x faster than prev. parser (with cfg) • ~1. 7 x slower than prev. parser (without cfg) • Stripped parse time • Varies: 1. 2 x - 1. 9 x slower than unstripped • Symbol recreation • 80% - 98% of original functions 18
Related Work • Binary rewriters/instrumentation tools • eel, emil, etch, goblin, leel, plto • Disassemblers (lots available) • IDAPro, Objdump, dumpbin, etc • Symbol table reconstructors • dress, objdump-output-beautifier 19
Status • Implemented on x 86 • Ready for measurement and instrumentation • Good start for security, but needs work 20
Future Work • Develop more accurate heuristics to identify code in unlit areas of the binary • Data flow analyses • Port to other platforms • Support unconventional function constructs • Comprehensive comparison with other tools • Evaluation on obfuscated code 21
- Slides: 21