unstrip Restoring Function Information to Stripped Binaries Using
unstrip: Restoring Function Information to Stripped Binaries Using Dyninst Emily Jacobson and Nathan Rosenblum Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin May 2 -4, 2011
Binary Tools Need Symbol Tables o Debugging Tools o GDB, IDA Pro… o Instrumentation Tools o PIN, Dyninst, … o Static Analysis Tools o Code. Surfer/x 86, … o Security Analysis Tools o IDA Pro, … unstrip: Restoring Function Information to Stripped 2
unstrip = stripped parsing + binary rewriting push %ebp mov %esp, %ebp sub %0 x 8, %esp mov 0 x 8(%ebp), %eax add $0 xfffffff 8, %esp push %eax call 80 c 3 bd 0 push %eax call 8057220 mov %ebp, %esp pop %ebp unstrip <targ 8056 f 50>: push %ebp mov %esp, %ebp sub %0 x 8, %esp mov 0 x 8(%ebp), %eax add $0 xfffffff 8, %esp push %eax call <targ 80 c 3 bd 0> push %eax call <targ 8057220> mov %ebp, %esp pop %ebp unstrip: Restoring Function Information to Stripped 3
New Semantic Information o Important semantic information: program’s interaction with the operating system (system calls) o These calls are encapsulated in wrapper functions Library fingerprinting: identify functions based on patterns learned from exemplar libraries unstrip: Restoring Function Information to Stripped 4
unstrip = library stripped fingerprinting parsing + binary rewriting push %ebp mov %esp, %ebp sub %0 x 8, %esp mov 0 x 8(%ebp), %eax add $0 xfffffff 8, %esp push %eax call 80 c 3 bd 0 push %eax call 8057220 mov %ebp, %esp pop %ebp unstrip <targ 8056 f 50>: push %ebp mov %esp, %ebp sub %0 x 8, %esp mov 0 x 8(%ebp), %eax add $0 xfffffff 8, %esp push %eax call <targ 80 c 3 bd 0> <getpid> push %eax call <targ 8057220> <kill> mov %ebp, %esp pop %ebp unstrip: Restoring Function Information to Stripped 5
Set up system call arguments Error check and return <accept>: mov %ebx, %edx mov %0 x 66, %eax mov $0 x 5, %ebx lea 0 x 4(%esp), %ecx int $0 x 80 mov %edx, %ebx cmp %0 xffffff 83, %eax jae 8048300 ret mov %esi, %esi Invoke a system call
<accept>: cmpl $0 x 0, %gs: 0 xc jne 80 f 669 c mov %ebx, %edx mov %0 x 66, %eax mov $0 x 5, %ebx lea 0 x 4(%esp), %ecx int $0 x 80 mov %edx, %ebx cmp %0 xffffff 83, %eax <accept>: jae 8048460 mov %ebx, ret %edx push %esi mov %0 x 66, %eax call libc_enable_asyncancel mov $0 x 5, %ebx mov %eax, %esi mov %ebx, %edx mov $0 x 66, %eax mov $0 x 5, %ebx lea 0 x 8(%esp), %ecx int $0 x 80 mov %edx, %ebx xchg %eax, %esi call libc_disable_acynancel mov %esi, %eax pop %esi cmp $0 xffffff 83, %eax jae syscall_error ret glibc 2. 5 on RHEL with GCC lea 0 x 4(%esp), %ecx 4. 1. 2 glibc 2. 2. 4 on RHEL int $0 x 80 mov %edx, %ebx <accept>: mov %ebx, %edx cmp %0 xffffff 83, %eax cmpl $0 x 0, %gs: 0 xc mov $0 x 66, %eax jne 80 f 669 c mov $0 x 5, %ebx jae 8048300 mov %ebx, %edx lea 0 x 8(%esp), %ecx ret call *0 x 8181578 mov %0 x 66, %eax mov %edx, %ebx mov $0 x 5, %ebx mov %esi, %esi lea 0 x 4(%esp), %ecx call *0 x 814 e 93 c mov %edx, %ebx cmp %0 xffffff 83, %eax jae 8048460 ret push %esi call libc_enable_asyncancel mov %eax, %esi xchg %eax, %esi call libc_disable_acynancel mov %esi, %eax pop %esi cmp $0 xffffff 83, %eax jae syscall_error ret glibc 2. 5 on RHEL with GCC The same function can be realized in a variety of ways in the binary
Semantic Descriptors o Instead, we’ll take a semantic approach o Record information that is likely to be invariant across multiple versions of the function <accept>: mov mov lea int mov cmp jae ret mov %ebx, %edx %0 x 66, %eax $0 x 5, %ebx 0 x 4(%esp), %ecx $0 x 80 %edx, %ebx %0 xffffff 83, %eax 8048300 {<socketcall, 5 >} %esi, %esi unstrip: Restoring Function Information to Stripped 8
Building Semantic Descriptors binary reboot: push %ebp mov %esp, %ebp sub $0 x 10, %esp push %edi push %ebx mov 0 x 8(%ebp), %edx mov $0 xfee 1 dead, %edi mov $0 x 28121969, %ecx push %ebx mov %edi, %ebx mov $0 x 58, %eax int $0 x 80 … 0 xfee 1 dead 0 x 58 %edi 0 x 28121969 EAX EBX ECX SYSTEM CALL {<reboot, 0 xfee 1 dead, 0 x 2812969> We parse an input binary, locate system calls and wrapper function calls, and employ dataflow analysis. unstrip: Restoring Function Information to Stripped 9
Building a Descriptor Database Locate wrapper functions glibc reference library <accept>: mov %ebx, %edx mov %0 x 66, %eax mov $0 x 5, %ebx lea 0 x 4(%esp), %ecx int $0 x 80 … {<socketcall, 5>}: accept Build semantic descriptors {<socketcall, 4>}: listen Descriptor Database {<getpid>}: getpid … unstrip: Restoring Function Information to Stripped 10
Building a Descriptor Database glibc reference glibc library reference library Build semantic descriptors <accept>: mov %ebx, %edx <accept>: mov %edx mov%ebx, %0 x 66, %eax 1 <accept>: %ebx, %edx mov %0 x 66, %eax mov $0 x 5, %ebx 1 mov lea 0 x 4(%esp), %ecx mov %0 x 66, %eax mov $0 x 5, %ebx lea 0 x 4(%esp), %ecx 1 int $0 x 80 mov $0 x 5, %ebx lea 0 x 4(%esp), %ecx int $0 x 80 … lea int $0 x 80 … 0 x 4(%esp), %ecx int … $0 x 80 … Locate wrapper functions {<socketcall, 5>}: accept {<socketcall, 4>}: listen {<getpid>}: getpid … … … Descriptor Database unstrip: Restoring Function Information to Stripped 11
Identifying Functions in a Stripped Binary Building a Descriptor Database glibc reference glibc library reference library Build semantic descriptors <accept>: mov %ebx, %edx <accept>: mov %edx mov%ebx, %0 x 66, %eax 1 <accept>: %ebx, %edx mov %0 x 66, %eax mov $0 x 5, %ebx 1 mov lea 0 x 4(%esp), %ecx mov %0 x 66, %eax mov $0 x 5, %ebx lea 0 x 4(%esp), %ecx 1 int $0 x 80 mov $0 x 5, %ebx lea 0 x 4(%esp), %ecx int $0 x 80 … lea int $0 x 80 … 0 x 4(%esp), %ecx int … $0 x 80 … Locate functions {<socketcall, 5>}: accept {<socketcall, 4>}: listen {<getpid>}: getpid … … … Descriptor Database unstrip: Restoring Function Information to Stripped 12
Identifying Functions in a Stripped Binary stripped binary For each wrapper function { 1. Build the semantic descriptor. 2. Search the database for a match (two stages). Descriptor Database 3. Add label to symbol table. } unstrippe d binary unstrip: Restoring Function Information to Stripped 13
Evaluation o To evaluate across three dimensions of variation, we constructed three data sets: o compiler version o library version o distribution vendor o In each set, we compiled a test binary for each glibc instance, built a descriptor database, and applied unstrip and IDA Pro FLIRT o Our evaluation measure is accuracy unstrip: Restoring Function Information to Stripped 14
Evaluation Results: Compiler Version Study 1 accuracy 0. 75 0. 5 unstrip IDA Pro 0. 25 0 3. 4. 4 4. 0. 2 4. 1. 2 4. 2. 1 GCC 3. 4. 4 Patterns Predicting Each Library unstrip: Restoring Function Information to Stripped 15
Evaluation Results: Library Version Study 1 accuracy 0. 75 0. 5 unstrip IDA Pro 0. 25 0 2. 2. 4 2. 3. 2 2. 3. 4 2. 5 2. 11. 1 glibc 2. 2. 4 Patterns Predicting Each Library unstrip: Restoring Function Information to Stripped 16
Evaluation Results: Distribution Study 1 accuracy 0. 75 0. 5 unstrip IDA Pro 0. 25 0 Fedora Mandrivia Open. Suse Ubuntu Fedora Patterns Predicting Each Library unstrip: Restoring Function Information to Stripped 17
For full details, tech report available online unstrip is available at: http: //www. paradyn. org/html/tools/unstrip. htm l Come see the unstrip demo today at 2: 00 or 2: 30 (in 1260 WID/MIR) unstrip: Restoring Function Information to Stripped 18
- Slides: 18