Buffer Overflow Proofing of Code Binaries By Ramya

Buffer Overflow Proofing of Code Binaries By Ramya Reguramalingam Graduate Student, Computer Science Advisor: Dr. Gopal Gupta

Contents l l What is a buffer overflow? Buffer overflow: Security concern Binary. Secure: An overview Binary. Secure: Implementation l l l Metadata Phase Mapping Phase Modification Phase Advantages Disadvantages Results

Memory Organization l Every programming language divides its memory into four segments Stack: For function calls l Heap: For dynamic allocation l Code: For program code l Data: For static and global variables l

Program Execution Stack l Sample Code void function (char *a, char* b, char* c) { char buffer 1[8]; } Stack at the start ff ff ESP Stack void main( ){ function (“foo”, “bar”, “ren”); } Heap Data 00 Code

Stack Organization: Before a Call l Sample Code void function (char* a, char* b, char* c){ char buffer 1[8]; } Stack before a call Stack Param 3 = “ren” Param 2 = “bar” void main( ){ function(“foo”, “bar”, “ren”); } Parameters Param 1 = “foo” Heap, Data & Code ESP

Stack Organization: After a Call l Sample Code void function (char* a, char* b, char* c){ char buffer 1[8]; } Stack after a function call Stack Param 3 = “foo” Param 2 = “bar” void main( ){ function(“foo”, “bar”, “ren”); } Local variables Param 1 = “ren” Return address ebp Local variables . . . Heap, Data & Code EBP ESP

Buffer Overflow l Sample Code void function (char *str){ char buffer 1[8]; strcpy (buffer 1, str); } Strcpy writes void main( ){ char large_str[256] ; for (int i=0; i<255; i++) large_str[i] = ‘A’; function(large_str); Label: } l New return address =4141 Stack showing buffer overflow Stack 41 41 41 Large_str (Size = 64) Label: 41 41 Return address 41 41 Pointer 41 41 Garbage ebp Buffer 1 (Size = 2)

Abusing the Buffer Overflow Step 1: Overwrite the return address with an address that points ‘back’ to the buffer area l Step 2: Insert code that you wish to execute in the buffer area l Step 3: Buffer start of inserted code with NOP instructions l Step 4: Eliminate any null values in inserted code l Stack used to abuse Buffer Overflow Stack Return Address ebp NOP mov eax, ebx add eax, 1

Buffer Overflow: Security Concern Percentage of buffer overflows listed in CERT advisories each year l Some examples include Windows 2003 server, sendmail, windows HTML conversion library l Percentage of Buffer Overflows Per Year as listed by CERT [1]

Buffer Overflow Solutions l RAD: RAD stores the return address in RAR area l l Stackguard: Stackguard inserts a ‘canary’ word to protect return address l l The ‘canary’ word can be compromised Splint: Splint allows the user to write annotations in the code that define allocated and used sizes l l It is a gcc compiler patch. All code has to recompiled User is required to write annotations Richard Wagner’s Prevention Method: Static analysis solution l Depends on code syntax and hence not complete

Binary. Secure: An Overview Buffer Overflow is achieved by overwriting the return address l If return addresses are recorded in a separate area, away from the buffer overflow, then they cannot be overwritten l So modify the memory organization to add a new return address stack, allocated in an area opposite to the direction in which buffer would write l When a function call returns, it uses the return address from this new stack l

Binary. Secure: Return Address The return address is saved as part of the program execution stack l The stack is allocated at the bottom of the Overflow program stack Direction l This stack is uncompromised as memory writes occur in the opposite direction l

Binary. Secure : The methodology l l l Input: Portable Executable (PE) file Output: Modified PE file The PE file is analyzed to determine all function calls in a file Code is added to the start and end of each function The modification copies the return address to a new location and on return retrieves from that location

Binary. Secure

Binary Secure: Specifications l These are some of the conditions that must hold Code must be re-entrant l Code should not modify the stack pointer l Processor: Intel x 386 l Compiler: Dev C++ compiler 4. 9. 9. 1 l Platform: Windows l

The Portable Executable Format Validation DOS Error Code e_lfanew NT Signature Number of Sections

Binary. Secure: Metadata Phase l Metadata phase is so called because information required for analysis is collected l This include entry points, size of code, Virtual address and relative address l The PE Explorer software is then used to obtain the disassembled form of the PE file

Binary. Secure: Mapping Phase l Analysis starts from the entry point to determine all the calls made by the code l Each call determines a function location l For each unique function location using the start address, the end address is determined l If instructions are added, all calls and jumps need to be changed accordingly

Mapping Phase: Passes l First l Pass: All function calls made Result: Start. Addr table l Second l Pass: End of each function call Result: End. Addr table l Third Pass: ‘Call’ and ‘jump’ opcodes Result: Opcodes to modify and increase value l Opcodes modification can be relative or absolute addresses l

Mapping Phase: Relative Addresses l Calling location > Called Location Function End / Start Address 40: Called Location Function End / Start Address 60: Calling Location -20 Function End / Start Address

Mapping Phase: Relative Addresses l Calling location < Called Location Function End / Start Address 40: Calling Location -20 Function End / Start Address 60: Called Location Function End / Start Address

Binary. Secure: Modification Phase l Instructions to copy the return address to the new ‘Binary. Secure’ stack are written to all function start l Instructions to retrieve the return address are added to each function end l Changes to all opcodes are made l Changes to the header are committed l Output: Modified PE file

Advantages l Binary code is analysed. This can be used on thirdparty software where one does not have access to source code. l l l Run-time checks require modification to the source code (Splint) Compiler modifications are costly and performing changes to all available compilers is not possible. (RAD, Stackguard) Return addresses are stored on the stack itself. Hence overhead incurred while accessing addresses in other areas is reduced.

Disadvantages l The stack has to store a list of return addresses. Storage overhead = depth of the flow graph is incurred. l The code is machine dependent. But, it covers machines from 80 x 86 upwards. A large number of machines fall in this category.

Results l Correctness Code with buffer overflow – buff. Test l The code has an overflow problem. The modified exe fixes that problem l l Performance Bubble. Sort, Calendar, Math l The modified PE provides the same result as the original l

Results - Demo

References 1. Smashing the stack for fun and profit http: //www. cs. ucsb. edu/~jzhou/security/overfl ow. html 2. Intel manuals http: //www. intel. com/design/Pentium 4/docum entation. htm 3. 80386 Programmer’s reference manual http: //www. scs. nyu. edu/aos/lab/i 386/

Thank You ! Thank you & Questions

PE File Binary Format

PE File Disassembled

Flow Graph Of Code

Mapping Phase: Relative Addresses l Calling location > Called Location Function End / Start Address 56: Called Location 76: Calling Location -20

Mapping Phase: Relative Addresses l Calling location > Called Location 40: Function End / Start Address 56: Called Location 76: Calling Location -20 • Calling Location should call 40 • Hence, offset should be -36

Mapping Phase: Relative Addresses l Calling location > Called Location 40: Called Location Function End / Start Address 76: Calling Location -20 • Calling Location should call 40 • Hence, offset should be -36

Mapping Phase: Relative Addresses l Calling location > Called Location 40: Called Location 60: Function End / Start Address 76: Calling Location -20 • Calling Location should call 40 • Hence, offset should be -36

Mapping Phase: Relative Addresses l Calling location > Called Location 40: Called Location 60: Calling Location -20 Function End / Start Address

Mapping Phase: Relative Addresses l Calling location < Called Location Function End / Start Address 56: Calling Location +20 76: Called Location

Mapping Phase: Relative Addresses l Calling location < Called Location 40: Function End / Start Address 56: Calling Location 76: Called Location +20

Mapping Phase: Relative Addresses l Calling location < Called Location 40: Calling Location +20 Function End / Start Address 76: Called Location • Calling Location should call 76 • Hence, offset should be +36

Mapping Phase: Relative Addresses l Calling location < Called Location 40: Calling Location +20 60: Function End / Start Address 76: Called Location • We don’t change this because we want it to call the modified function at 60

Mapping Phase: Relative Addresses l Calling location < Called Location 40: Calling Location +20 60: Called Location Function End / Start Address