A Framework for Binary Code Analysis and Static
A Framework for Binary Code Analysis, and Static and Dynamic Patching Barton P. Miller University of Wisconsin bart@cs. wisc. edu © 2006 Barton P. Miller Jeffrey Hollingsworth University of Maryland hollings@cs. umd. edu February 2006 Binary Code Analysis and Editing
Motivation § Binary code analysis is a basic tool of security analysts, application developers, system designers and tool developers. § Existing binary analysis tools have significant limitations. § We are designing and building a new foundation to support such analysis. • • Multi-platform Open architecture Extensible Open source © 2006 Barton P. Miller • • – 2– Testable Suitable for batch processing Accurate Efficient Binary Code Analysis and Editing
Why Binary Code? § Access to the source code often is not possible: • Proprietary software packages. • Stripped executables. • Proprietary libraries: communication (MPI, PVM), linear algebra (NGA), database query (SQL libraries). § Binary code is the only authoritative version of the program. • Changes occurring in the compile, optimize and link steps can create non-trivial semantic differences from the source and binary. § Worms and viruses are rarely provided with source code © 2006 Barton P. Miller – 3– Binary Code Analysis and Editing
Binary Analysis and Editing § Analysis: processing of the binary code to extract syntactic and symbolic information. • Symbol tables (if present) • Decode (disassemble) instructions • Control-flow information: basic blocks, loops, functions • Data-flow information: from basic register information to highly sophisticated (and expensive) analyses. © 2006 Barton P. Miller – 4– Binary Code Analysis and Editing
Binary Analysis and Editing § Binary rewriting: static (before execution) modification of a binary program: • Analyze the program and then insert, remove, or change the binary code, producing a new binary. § Dynamic instrumentation: dynamic (during execution) modification of a binary program: • Analyze the code of the running program and then insert, remove, or change the binary code, changing the execution of the program. • Can operate on running programs and servers. © 2006 Barton P. Miller – 5– Binary Code Analysis and Editing
Uses of Binary Analysis and Editing § Cyber-forensics • Analysis: understand the nature of malicious code • Binary-rewriting: produce a new version of the code that might be instrumented, sandboxed, or modified for study. • Dynamic instrumentation: same features, but can do it interactively on an executing program. • Hybrid static/dynamic: control execution and produce intermediate versions of the binary that can be re-executed (and further instrumented). § Program tracing: instructions, memory accesses, function calls, system calls, . . . § Debugging § Testing § Performance profiling § Performance modeling § Reverse engineering © 2006 Barton P. Miller – 6– Binary Code Analysis and Editing
Our Starting Point: Dyninst § A machine-independent library for machine level code patching. • Functions for binary code analysis • Functions for binary code patching § Clean abstractions to encapsulate the tool complexity. § Originally designed as part of the Paradyn performance profiling tool, but now widely used in many areas, including cyber-security. © 2006 Barton P. Miller – 7– Binary Code Analysis and Editing
Dynamic Instrumentation § Does not require recompiling or relinking • Saves time: compile and link times are significant in real systems. • Can instrument without the source code (e. g. , proprietary libraries). • Can instrument without linking (relinking is not always possible. § Instrument optimized code. © 2006 Barton P. Miller – 8– Binary Code Analysis and Editing
Dynamic Instrumentation (con’d) § Only instrument what you need, when you need • No hidden cost of latent instrumentation. • Enables “one pass” tools. § Can instrument running programs (such as Web or database servers) • Production systems. • Embedded systems. • Systems with complex start-up procedures. © 2006 Barton P. Miller – 9– Binary Code Analysis and Editing
The Basic Mechanism Application Program Trampoline Function foo Instrumentation Relocated Instruction(s) © 2006 Barton P. Miller – 10 – Binary Code Analysis and Editing
The Dyn. Inst Interface § Machine independent representation § Write-once, analyze/instrument-many (portable) § Object-based interface to insert new code: Abstract Syntax Trees (AST’s) § Hides most of the complexity in the API • Easy to build tools: e. g. , an MPI tracer: 250 lines of C++ code. © 2006 Barton P. Miller – 11 – Binary Code Analysis and Editing
Machine Independent Code Abstract Syntax Trees: SPARC Code Power Code sethi %hi(ctr) ld [. . . ], %o 1 add %o 1, 1 st %o 1, [. . . ] © 2006 Barton P. Miller incl ctr IA 32 Code – 12 – cau r 3, r 0, hi%ctr l r 4, lo%ctr(r 3) addi r 4, 1(r 4) st r 4, lo%ctr(r 3) Binary Code Analysis and Editing
Basic Dyn. Inst Operations § Code query routines: • Find control-flow elements: modules, procedures, loops, basic blocks, instructions – For functions, find entry, exit, call sites. – For loops, find entry, exit, body. • Find data elements: variables and parameters • Call graph (parent/child) queries • Intra-procedural control-flow graph • Other symbol table information, e. g. , line numbers. © 2006 Barton P. Miller – 13 – Binary Code Analysis and Editing
Basic Dyn. Inst Operations § Code modification routines: • Remove Function Call – Disable an existing function call in the application • Replace Function Call – Redirect a function call to a new function • Replace Function – Redirect all calls (current and future) to a function to a new function. • Wrap Function – Allow the new function to call the replaced one (potentially with all its original parameters). © 2006 Barton P. Miller – 14 – Binary Code Analysis and Editing
Basic Dyn. Inst Operations § Process control: • Attach/create process • Monitor process status changes • Callbacks fork/exec/exit § Inferior (application processor) operations: • Malloc/free – Allocate heap space in application process • Inferior RPC – Asynchronously execute a function in the application. • Load module – Cause a new. so/. dll to be loaded into the application. © 2006 Barton P. Miller – 15 – Binary Code Analysis and Editing
Basic Dyn. Inst Operations § Building AST code sequences: • Control structures: if and goto • Arithmetic and Boolean expressions • Get PID/TID operations • Read/write registers and global variables • Read/write parameters and return value • Function call © 2006 Barton P. Miller – 16 – Binary Code Analysis and Editing
Dyninst Automated Testing § A test suite of almost 100 operation-specific tests. § Runs each night on each platform on the nightly build. § Variations for different compilers, languages (C, C++, Fortran), stripped vs. non-stripped code, etc. § Results reported on the web (reachable from paradyn. org or dyninst. org home pages): http: //www. paradyn. org/testresults/dyntable. html © 2006 Barton P. Miller – 17 – Binary Code Analysis and Editing
Bin. Inst Design Goals § Tool-kit component architecture for binary analysis and editing § Open source § Open data structure definitions § Machine-independent abstract interfaces § Batch-enabled analyses § Static and dynamic code patching § All major analysis products are exportable § Enhanced testability and accompanying test suites © 2006 Barton P. Miller – 18 – Binary Code Analysis and Editing
Static Editing Scenario (Binary Rewriting) Symbol Table Dump Binary Code Binary Decode and Parsing Call Graph Intra-Proc CFG Code Queries and Instrumentation Requests AST Instr Control Code Gen Idiom Signatures Raw Disassembly © 2006 Barton P. Miller – 19 – Binary Code Analysis and Editing
Interactive Editing Scenario (Static or Dynamic) Symbol Table Dump Binary Code Binary Decode and Parsing Call Graph Intra-Proc CFG Instr Control Code Gen Idiom Signatures Raw Disassembly © 2006 Barton P. Miller – 20 – Binary Code Analysis and Editing
Dynamic Editing Scenario (Dynamic Instrumentation) Code Queries and Instrumentation Requests AST Intra-Proc CFG Instr Control Code Gen Idiom Signatures Process Control Stack Walker Symbol Table Dump Binary Code Binary Decode and Parsing Call Graph Raw Disassembly User Process © 2006 Barton P. Miller – 21 – Binary Code Analysis and Editing
Analysis Scenario Symbol Table Dump Binary Code Binary Decode and Parsing Call Graph Intra-Proc CFG Buffer Overrun Connector 2 Idiom Signatures Raw Disassembly © 2006 Barton P. Miller VSA – 22 – Code Surfer Other Tool Binary Code Analysis and Editing
Symbol Table Parser PE Symbol Table Dump ELF COFF Binary Code Instruction Decoder IA 32 © 2006 Barton P. Miller Call Graph Code Parser Intra Proc CFG Idiom Detector Idiom Signatures AMD 64 Power Code Queries and Instrumentation Requests Raw Disassembly – 23 – AST Code Gen Instr Control Process Control Stack Walker Binary Code Analysis and Editing
© 2006 Barton P. Miller – 24 – Binary Code Analysis and Editing
- Slides: 24