Correct Relocation Do You Trust a Mutated Binary
Correct Relocation: Do You Trust a Mutated Binary? Drew Bernat bernat@cs. wisc. edu March 17, 2005 Correct Relocation
Binary Manipulation • We want to: – Insert new code – Modify or delete code – These operations move program code • Binaries are brittle – Code movement may affect program semantics • We want to move code without breaking the program 2 Correct Relocation
Relocation • Relocation moves code while maintaining its original execution semantics – May radically transform the code • Does not rely on external information • Binary tools use relocation extensively – Execute original + relocated code (Dyninst) – Always execute relocated code (PIN, Valgrind, Dynamo. RIO, VMWare, DELI) Relocation is critical for binary manipulation 3 Correct Relocation
Relocation Examples foo: 0 x 1000: 0 x 1001: 0 x 3000: 0 x 1003: 0 x 1006: 0 x 4000: 0 x 1009: 0 x 100 b: 0 x 5000: 0 x 1011: push ebp mov esp, ebp mov 0 x 8(ebp), eax. . . 0 x 5, eax cmp ja 0 x 30. . . ebx_thunk call add ebx, eax. . . ebx_thunk: 0 x 2000: mov (esp), ebx 0 x 2003: ret 0 x 4000: ja -0 x 2 ff 7. . . 0 x 5000: push mov 0 x 1011, $0 x 1011 ebx 0 x 5005: jmp. . . ebx_thunk’: 0 x 6000: mov (esp), ebx 0 x 6003: call map_return 0 x 6008: ret 4 Correct Relocation
Current Approaches • Strict Relocation – – Maintains the semantics of each individual instruction Safe in nearly all cases Can impose severe slowdown Trades speed for strictness • Ad-Hoc Relocation – Emit more efficient code by partially emulating the original code – Pattern matching may fail and generate incorrect code – Trades strictness for speed 5 Correct Relocation
Benefits and Drawbacks Strict Relocation Ad-Hoc Relocation Partial Relocation Safe Fast Good Poor Good 6 Correct Relocation
Our Approach • Develop a formal model of relocation – Reason about the relationship of the moved code to: • Its new location • Surrounding code – Based on semantics of code instead of patternmatching against syntax • Strictness of emulation based on demands of the moved code (and surrounding code) 7 Correct Relocation
Effects of Code Movement • Moving certain instructions will change their semantics – Relative branches, loads, stores – We call these PC referencing instructions • Patching tools overwrite program code – Other code that references this code will be affected • Relocation may affect non-relocated code! 8 Correct Relocation
Effects of Moving Code foo: 0 x 1000: 0 x 1001: 0 x 1003: 0 x 1004: 0 x 1006: 0 x 1008: 0 x 100 d: 0 x 100 f: 0 x 1011: • No change push ebp mov esp, ebp mov 0 x 8(ebp), eax cmp 0 x 5, eax ja 0 x 30 call ebx_thunk add ebx, eax mov (eax), edx jmp edx • Relative branch • Relative load • Branch to result of relative load 9 Correct Relocation
Effects of Overwriting Code main: . . . 0 x 0050: call foo. . . foo: 0 xf 000: push 0 x 1002: jmp 0 xf 000 ebp 0 xf 001: . . . 0 x 1003: mov esp, ebp. . . bar: . . . 0 x 2010: mov (0 x 1000), eax 0 x 2015: add (0 x 1004), eax 10 Correct Relocation
Approach • Model – Relocated code, surrounding code – Properties of code affected by relocation • Analysis – Deriving these properties from the binary • Transformations – How do we modify code to run correctly and efficiently? 11 Correct Relocation
Model • Define properties of code that relocation affects – PC referencing – Dependence on moved or overwritten code • A single instruction may have multiple properties • These combinations of properties determine how to relocate the instruction – Or compensate non-relocated instructions 12 Correct Relocation
Program Regions • R = {ii, …, ij} A – Instructions to relocate R R’ • A = {ik, …, il} – Analyzed region – Surrounds R • U = {i 0, … in} – R - A – Unanalyzed region – Models limits of analysis U • R’ = {ip, . . . , iq} – Relocated instructions 13 Correct Relocation
Properties of Moved Code foo: 0 x 1000: 0 x 1001: 0 x 1003: 0 x 1004: 0 x 1006: 0 x 1008: 0 x 100 d: 0 x 100 f: 0 x 1011: • Direct (REF) push ebp mov esp, ebp mov 0 x 8(ebp), eax cmp 0 x 5, eax ja 0 x 30 call ebx_thunk add ebx, eax mov (eax), edx jmp edx – Control (REFC) – Data (REFD) – Predicate (REFP) • Indirect (REF*) – Control (REF*C) – Data (REF*D) – Predicate (REF*P) 14 Correct Relocation
Predicate PC References bool dl_open_check(char *name, void *calladdr) { // Check if the caller is // from libdl or libc bool safe_open = false; if (IN(libc_obj, calladdr) || IN(libdl_obj, calladdr) safe_open = true; if (!safe_open) return false; • Safety check in library load – Address of caller passed in – Checked against legal callers • Predicate expressions // Perform further checks. . . } 15 Correct Relocation
Properties of Overwritten Code main: . . . 0 x 0050: call foo. . . foo: . . . 0 x 1004: cmp 0 x 5, eax 0 x 1006: ja 0 x 30. . . bar: . . . 0 x 2010: mov (0 x 1000), eax 0 x 2015: add (0 x 1004), eax A R • Control (CF) – Instructions with successors in R {0 x 0050, 0 x 1004}CF • Data (DF) A 16 – Loads from R – Stores to R {0 x 2010, 0 x 2015}DF Correct Relocation
Properties Summary REF C P D CF DF C P D REF* 17 Correct Relocation
Analysis Overview 1. Choose R and A – – R: instruction, basic block, function, … A: how much do we analyze? 2. Identify sources of REF and REF* in R – Follow data dependence chains into A and U 3. Determine {. . . }CF and {. . . }DF – – Begin with interprocedural CFG and points-to analysis Be conservative and assume incomplete information 18 Correct Relocation
REF/REF* Analysis • Create the Program Dependence Graph foo: 0 x 1004: 0 x 1006: 0 x 1008: 0 x 100 d: 0 x 100 f: 0 x 1011: . . . cmp 0 x 5, eax ja 0 x 30 call ebx_thunk add ebx, eax mov (eax) edx jmp edx – Covering R + A • Identify source instructions • Follow data dependence edges – Into A (or U) REF*C 19 Correct Relocation
Transformation Goals • We want to emulate the smallest set of original code semantics • Transformations must maintain the properties determined by analysis – But any others are not required • Our approach: define transformations for each combination of properties 20 Correct Relocation
Granularity of Relocation • Current methods relocate by instruction – Maintain equivalence at the instruction boundary • “Unobserved” results • Relocate instructions as a group – Maintain boundary semantics of the code – Reduce complexity and improve efficiency 21 Correct Relocation
Partial Relocation Example 0 x 5000: push 0 x 5000: $0 x 1011 push add mov 0 x 1011, $0 x 1011 ebx eax 0 x 5005: jmp ebx_thunk’ 0 x 5005: mov add. . . (esp), ebx, eax ebx 0 x 500 a: add ebx, 0 x 5008: eax pop. . . 0 x 5009: add ebx, eax. . . ebx_thunk’: 0 x 6000: mov (esp), ebx 0 x 6003: call map_return 0 x 6008: ret REFD REFC 22 Correct Relocation
Research Plan • This work is preliminary – Properties are defined – Analysis requirements are defined • Still a lot to do – Determine transformations – Implementation in Dyninst – Performance analysis 23 Correct Relocation
Questions? 24 Correct Relocation
Relocating a Jump Table foo 2: foo 1: 0 x 1008: 0 xf 008: 0 x 100 d: 0 xf 00 e: 0 x 100 f: 0 xf 010: 0 x 1011: 0 xf 012: 0 x 1040: 0 xf 040: 0 x 1060: 0 xf 060: 0 x 1080: 0 xf 080: 0 x 10 a 0: 0 xf 0 a 0: foo 3: 0 xf 008: 0 xf 00 d: 0 xf 00 f: 0 xf 011: 0 xf 012: call jmp ebx_thunk mov <0 xf 008> 0 x 1008, ebx add ebx, eax mov (eax, 4), ebx jmp ebx <jump table data>. . . jmp <0 xf 040> <case 1>. . . <case jmp <0 xf 060> 2>. . . <case jmp <0 xf 080> 3>. . . <case jmp <0 xf 0 a 0> 4> <case 4>. . . 0 xf 040: 0 xf 060: 0 xf 080: 0 xf 0 a 0: 25 call ebx_thunk add ebx, eax mov (eax, 4), ebx jmp ebx <relocated jump table data> <reloc case 1> <reloc case 2> <reloc case 3> <reloc case 4> Correct Relocation
Complex Instructions • Instructions may have multiple properties – Example: a relative branch in R may be both CF and REFC • Some overlap is due to implicit control flow – Instructions in R may be tagged as REFC due to fallthrough to next instruction • We can model instructions as combinations of independent operations if necessary – Separate out the “next PC” calculation 26 Correct Relocation
- Slides: 26