This time By investigating Buffer overflows and other

This time By investigating Buffer overflows and other memory safety vulnerabilities • History • Memory layouts • Buffer overflow fundamentals *These slides are available courtesy of Dave Levin

Software security • Security is a form of dependability • Does the code do “what it should” • Distinguishing factor: an active, malicious attacker • Attack model • • The developer is trusted But the attacker can provide any inputs - Malformed strings Malformed packets etc. What harm could an attacker possibly cause?

We’re going to focus on C C is still very popular http: //www. tiobe. com

We’re going to focus on C Many mission critical systems are written in C • Most kernels & OS utilities • • • Many high-performance servers • • • Microsoft IIS Microsoft SQL server Many embedded systems • • fingerd X windows server Mars rover But the techniques apply more broadly • Wiibrew

We’re going to focus on C The harm can be substantial 1988 • 1999 2000 2001 2002 2003 Morris worm • • • Propagated across machines (too aggressively, thanks to a bug) One way it propagated was a buffer overflow attack against a vulnerable version of fingerd on VAXes • Sent a special string to the finger daemon, which caused it to execute code that created a new worm copy • Didn’t check OS: caused Suns running BSD to crash End result: $10 -100 M in damages, probation, community service Professor at MIT

We’re going to focus on C The harm can be substantial 1988 • 1999 2000 2001 2002 Code. Red • • Exploited an overflow in the MS-IIS server 300, 000 machines infected in 14 hours 2003

We’re going to focus on C The harm can be substantial 1988 • 1999 2000 2001 2002 SQL Slammer • • Exploited an overflow in the MS-SQL server 75, 000 machines infected in 10 minutes 2003

Buffer overflows are prevalent % of vulnerabilities that are buffer overflows http: //web. nvd. nist. gov/view/vuln/statistics

This class

Our goals • Understand how these attacks work, and how to defend against them • These require knowledge about: • • • The compiler The OS The architecture Analyzing security requires a whole-systems view

Memory layout

Refresher • How is program data laid out in memory? • What does the stack look like? • What effect does calling (and returning from) a function have on memory?

All programs are stored in memory 4 G The process’s view of memory is that it owns all of it 0 0 xffff In reality, these are virtual addresses; the OS/CPU map them to physical addresses 0 x 0000

The instructions themselves are in memory 0 xffff 4 G . . . 0 x 4 c 2 sub $0 x 224, %esp 0 x 4 c 1 push %ecx 0 x 4 bf mov %esp, %ebp Text 0 0 x 4 be push %ebp. . . 0 x 0000

Data’s location depends on how it’s created 0 xffff 4 G Set when process starts cmdline & env Stack int f() { int x; … Heap malloc(sizeof(long)); Runtime Uninit’d data Known at compile time Init’d data static int x; static const int y=10; Text 0 0 x 0000

We are going to focus on runtime attacks Stack and heap grow in opposite directions Allows us not to have to declare their size 0 x 0000 0 xffff Heap 3 2 1 Stack pointer push 1 push 2 push 3 return

Stack layout when calling functions • What do we do when we call a function? • • • What data need to be stored? Where do they go? How do we return from a function? • • What data need to be restored? Where do they come from? Code examples

The instructions themselves are in memory 0 xffff 4 G . . . 0 x 5 bf mov %esp, %ebp 0 x 5 be push %ebp. . . 0 x 4 a 7 mov $0 x 0, %eax 0 x 4 a 2 call <func> 0 x 49 b movl $0 x 804. . , (%esp) Text 0 0 x 493 movl $0 xa, 0 x 4(%esp). . . 0 x 0000 %eip

What Needs To Be Stored? • The old eip. register

Accessing variables void func(char *arg 1, int arg 2, int arg 3) {. . . loc 2++; Q: Where is (this) loc 2? . . . A: +8(%ebp) } 0 x 0000 … loc 2 loc 1 0 xffff ? ? ? Stack frame 0 xbffff 323 %ebp Undecidable at Frame pointer compile time %eip ? ? ? arg 1 arg 2 arg 3 caller’s data - I don’t know where loc 2 is, - and I don’t know how #args - but loc 2 is always 8 B after start of fram

What Needs To Be Stored? • The old eip. register • The old ebp register

Returning from functions int main() {. . . func(“Hey”, 10, -3); . . . Q: How do we } resume here? 0 x 0000 … 0 xffff %eip ? ? ? arg 1 arg 2 arg 3 caller’s data Stack frame %ebp Push next %eip before call

Returning from functions int main() {. . . func(“Hey”, 10, -3); . . . Q: How do we } restore %ebp? %esp 0 x 0000 %ebp ? ? ? %eip ? ? ? 0 xffff arg 1 %ebp Push %ebp before locals Set %ebp to current (%esp) Set %ebp to(%ebp) at return arg 2 arg 3 caller’s data %ebp

Stack layout when calling functions void func(char *arg 1, int arg 2, int arg 3) { char loc 1[4] int loc 2; int loc 3; } 0 x 0000 … loc 2 loc 1 Local variables pushed in the same order as they appear in the code 0 xffff ? ? ? arg 1 arg 2 arg 3 caller’s data Arguments pushed in reverse order of code

Memory layout summary • Calling function: 1. Push arguments onto the stack (in reverse) 2. Push the address of the instruction you want run after control returns to you 3. Jump to the function • Called function: 4. Push the old frame pointer onto the stack (%ebp) 5. Set my frame pointer (%ebp) to where the end of the stack is right now (%esp) 6. Push my local variables onto the stack • Returning function: 7. Reset the previous stack frame: %ebp = (%ebp) 8. Jump back to where they wanted us to: %eip = 4(%ebp)

Buffer overflows

Buffer overflows at 10, 000 ft • Buffer = • • Contiguous set of a given data type Common in C - • All strings are buffers of char’s Overflow = • Put more into the buffer than it can hold • Where does the extra data go? • Well now that you’re experts in memory layouts…

A buffer overflow example void func(char *arg 1) { char buffer[4]; strcpy(buffer, str); . . . } int main() { char *mystr = “Auth. Me!”; func(mystr); . . . } Upon return, sets %ebp to 0 x 0021654 d M 00 A 00 u 00 t 00 h buffer e ! 4 d %ebp 65 21 00 %eip &arg 1 SEGFAULT (0 x 00216551)

A buffer overflow example void func(char *arg 1) { int authenticated = 0; char buffer[4]; strcpy(buffer, str); if(authenticated) {. . . } int main() { char *mystr = “Auth. Me!”; func(mystr); . . . } Code still runs; user now ‘authenticated’ M e ! 00 A 00 u 00 t 00 h 4 d 00 65 00 21 00 00 buffer authenticated %ebp %eip &arg 1

User-supplied strings • In these examples, we were providing our own strings • But they come from users in myriad aways • • Text input Packets Environment variables File input…

What’s the worst that could happen? void func(char *arg 1) { char buffer[4]; strcpy(buffer, arg 1); . . . } All ours! 00 00 %ebp %eip &mystr buffer strcpy will let you write as much as you want (til a ‘’) What could you write to memory to wreak havoc?

Code injection

High-level idea void func(char *arg 1) { char buffer[4]; sprintf(buffer, arg 1); . . . } %eip Text . . . 00 00 %ebp %eip &arg 1 … Haxx 0 r c 0 d 3 buffer (1) Load my own code into memory (2) Somehow get %eip to point to it

This is nontrivial • Pulling off this attack requires getting a few things really right (and some things sorta right) • Think about what is tricky about the attack • The key to defending it will be to make the hard parts really hard

Challenge 1 Loading code into memory • It must be the machine code instructions (i. e. , already compiled and ready to run) • We have to be careful in how we construct it: • It can’t contain any all-zero bytes - • • Otherwise, sprintf / gets / scanf / … will stop copying How could you write assembly to never contain a full zero byte? It can’t make use of the loader (we’re injecting) It can’t use the stack (we’re going to smash it)

Challenge 2 Getting our injected code to run • We can’t insert a “jump into my code” instruction • We have to use whatever code is already running %eip Text . . . 00 00 %ebp %eip &arg 1 … buffer Thoughts? x 0 f x 3 c x 2 f. . .

Memory layout summary • Calling function: 1. Push arguments onto the stack (in reverse) 2. Push the address of the instruction you want run after control returns to you 3. Jump to the function • Called function: 1. Push the old frame pointer onto the stack (%ebp) 2. Set my frame pointer (%ebp) to where the end of the stack is right now (%esp) 3. Push my local variables onto the stack • Returning function: 1. Reset the previous stack frame: %ebp = (%ebp) 2. Jump back to where they wanted us to: %eip = 4(%ebp)

Hijacking the saved %eip Text %ebp. . . 00 00 0 xbff &arg 1 … %ebp %eip x 0 f x 3 c x 2 f. . . buffer 0 xbff But how do we know the address?

Hijacking the saved %eip What if we are wrong? %eip Text %ebp. . . 00 00 0 xbff &arg 1 … 0 xbdf %ebp %eip x 0 f x 3 c x 2 f. . . buffer 0 xbff This is most likely data, so the CPU will panic (Invalid Instruction)

Challenge 3 Finding the return address • If we don’t have access to the code, we don’t know how far the buffer is from the saved %ebp • One approach: just try a lot of different values! • Worst case scenario: it’s a 32 (or 64) bit memory space, which means 232 (264) possible answers • But without address randomization: • • The stack always starts from the same, fixed address The stack will grow, but usually it doesn’t grow very deeply (unless the code is heavily recursive)

Improving our chances: nop sleds nop is a single-byte instruction (just moves to the next instruction) %eip Text %ebp. . . 00 00 Jumping anywhere will work nop … … 0 xbff nop 0 xbdf %ebp %eip &arg 1 x 0 f x 3 c x 2 f. . . buffer 0 xbff Now we improve our chances of guessing by a factor of #nops

Putting it all together But it has to be something; we have to start writing wherever the input to gets/etc. begins. %eip Text padding. . . 00 00 good guess nop … … 0 xbff nop 0 xbdf %ebp %eip &arg 1 x 0 f x 3 c x 2 f. . . buffer nop sled malicious code

Putting it all together %eip Text padding. . . 00 00 good guess nop … … 0 xbdf nop %ebp %eip &arg 1 x 0 f x 3 c x 2 f. . . buffer nop sled malicious code

Defenses

Recall our challenges How can we make these even more difficult? • Putting code into the memory (no zeroes) • Getting %eip to point to our code (dist but to eip) • Finding the return address (guess the raw addr)

Detecting overflows with canaries %eip Text Not the expected value: abort. . . 00 00 02 0 xbdf 8 d e 2 10 %ebp %eip buffer &arg 1 nop nop … … x 0 f x 3 c x 2 f. . . canary What value should the canary have?

Canary values From Stack. Guard [Wagle & Cowan] 1. Terminator canaries (CR, LF, NULL, -1) • Leverages the fact that scanf etc. don’t allow these 2. Random canaries • • • Write a new random value @ each process start Save the real value somewhere in memory Must write-protect the stored value 3. Random XOR canaries • • Same as random canaries But store canary XOR some control info, instead

Recall our challenges How can we make these even more difficult? • Putting code into the memory (no zeroes) • Option: Make this detectable with canaries • Getting %eip to point to our code(dist but to eip) • Finding the return address (guess the raw addr)

Return-to-libc padding %eip Text . . . 00 00 known good location guess nop nop … … 0 x 17 f 0 xbdf 0 x 20 d %ebp %eip &arg 1 x 0 f x 3 c x 2 f. . . buffer libc nop sled malicious code . . . exec() printf() libc . . . “/bin/sh” . . .

Recall our challenges How can we make these even more difficult? • Putting code into the memory (no zeroes) • • Getting %eip to point to our code (dist but to eip) • • Option: Make this detectable with canaries Non-executable stack doesn’t work so well Finding the return address (guess the raw addr)

Address Space Layout Randomization (ASLR) • Basic idea: change the layout of the stack • Slow to adopt • • • Linux in 2005 Vista in 2007 (off by default for compatibility with older software OS X in 2007 (for system libraries), 2011 for all apps i. OS 4. 3 (2011) Android 4. 0 Free. BSD: no How would you overcome this as an attacker?

Defenses • Putting code into the memory (no zeroes) • • Getting %eip to point to our code (dist but to eip) • • Non-executable stack doesn’t work so well Finding the return address (guess the raw addr) • • Option: Make this detectable with canaries Address Space Layout Randomization Good coding practices

void safe() { char buf[80]; fgets(buf, 80, stdin); } void safer() { char buf[80]; fgets(buf, sizeof(buf), stdin); } void vulnerable() { char buf[80]; if(fgets(buf, sizeof(buf), stdin)==NULL) return; printf(buf); }

Format string vulnerabilities

printf format strings int i = 10; printf(“%d %pn”, i, &i); 0 x 0000 0 xffff … %ebp %eip &fmt 10 printf’s stack frame &i caller’s stack frame • printf takes variable number of arguments • printf pays no mind to where the stack frame “ends” • It presumes that you called it with (at least) as many arguments as specified in the format string

void vulnerable() { char buf[80]; if(fgets(buf, sizeof(buf), stdin)==NULL) return; printf(buf); } “%d %x" 0 x 0000 0 xffff … %ebp %eip &fmt caller’s stack frame

Format string vulnerabilities • printf(“ 100% dml”); • • printf(“%s”); • • Prints a series of stack entries as integers printf(“%08 x …”); • • Prints bytes pointed to by that stack entry printf(“%d %d …”); • • Prints stack entry 4 byes above saved %eip Same, but nicely formatted hex printf(“ 100% no way!”) • WRITES the number 3 to address pointed to by stack entry

Format string prevalence % of vulnerabilities that involve format string bugs http: //web. nvd. nist. gov/view/vuln/statistics

$What’s wrong with this code? #define BUF_SIZE 16 char buf[BUF_SIZE]; void vulnerable() { Negative$

What’s wrong with this code? #define BUF_SIZE 16 char buf[BUF_SIZE]; void vulnerable() { Negative int len = read_int_from_network(); char *p = read_string_from_network(); Okif(len > BUF_SIZE) { printf(“Too largen”); return; } memcpy(buf, p, len); } Implicit cast to unsigned void *memcpy(void *dest, const void *src, size_t n); typedef unsigned int size_t;

Integer overflow vulnerabilities

$What’s wrong with this code? void vulnerable() { size_t len; char *buf; HUGE len$

What’s wrong with this code? void vulnerable() { size_t len; char *buf; HUGE len = read_int_from_network(); buf = malloc(len + 5); Wrap-around read(fd, buf, len); . . . } Takeaway: You have to know the semantics of your programming language to avoid these errors

Integer overflow prevalence % of vulnerabilities that involve integer overflows http: //web. nvd. nist. gov/view/vuln/statistics

What’s wrong with this code? Suppose that it has higher privilege than the user uid euid int main() { char buf[1024]; ~attacker/mystuff. txt. . . if(access(argv[1], R_OK) != 0) { printf(“cannot access filen”); exit(-1); } ln -s /usr/sensitive ~attacker/mystuff. txt file = open(argv[1], O_RDONLY); read(file, buf, 1023); close(file); printf(“%sn”, buf); return 0; } “Time of Check/Time of Use” Problem (TOCTOU)

$Avoiding TOCTOU uid euid int main() { char buf[1024]; . . . if(access(argv[1], R_OK)$

Avoiding TOCTOU uid euid int main() { char buf[1024]; . . . if(access(argv[1], R_OK) != 0) { printf(“cannot access filen”); exit(-1); } euid = geteuid(); uid = getuid(); seteuid(uid); // Drop privileges file = open(argv[1], O_RDONLY); read(file, buf, 1023); close(file); seteuid(euid); // Restore privileges printf(buf); }

Memory safety attacks • Buffer overflows • • • Format string errors • • Can be used to read/write stack data Integer overflow errors • • Can be used to read/write data on stack or heap Can be used to inject code (ultimately root shell) Can be used to change the control flow of a program TOCTOU problem • Can be used to raise privileges

General defenses against memory-safety

$Secure coding practices char digit_to_char(int i) { char convert[] = “ 0123456789”; return convert[i];$

Secure coding practices char digit_to_char(int i) { char convert[] = “ 0123456789”; return convert[i]; } Think about all potential inputs, no matter how peculiar char digit_to_char(int i) { char convert[] = “ 0123456789”; if(i < 0 || i > 9) return ‘? ’; return convert[i]; } Check at runtime that your operations are safe

Secure coding practices char string[512]; strcpy(string, src); // Dangerous char string[512]; strlcpy(string, src, sizeof string); // Safer Use the corresponding bounded-sized versions of functions strncpy/strlcpy, strncat/strlcat, snprintf, fgets, . . . Again: new your system’s/language’s semantics

Defensive coding practices • Think defensive driving • • Avoid depending on anyone else around you If someone does something unexpected, you won’t crash (or worse) It’s about minimizing trust Each module takes responsibility for checking the validity of all inputs sent to it • • Even if you “know” your callers will never send a NULL pointer…. Better to throw an exception (or even exit) than run malicious code

How to program defensively • Code reviews, real or imagined • • Organize your code so it is obviously correct Re-write until it would be self-evident to a reviewer “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. • Remove the opportunity for programmer mistakes with better languages and libraries • • Java performs automatic bounds checking C++ provides a safe std: : string class

How to program defensively • Code analysis • • • Static: Many of the bugs we’ve shown could be easily detected (but we run into the Halting Problem) Dynamic: Run in a VM and look for bad writes (valgrind) Fuzz testing • Generate random inputs and see if the program fails Totally random - Start with a valid input file and mutate - Structure-driven input generation: take into account the intended format of the input (e. g. , “string int float string”) Typically involves many inputs (clusterfuzz. google. com) - •

This time We have finished Buffer overflows By looking at Overflow Defenses and other memory safety vulnerabilities • Buffer overflow defenses (and workarounds) • Format string vulnerabilities • Integer overflow vulnerabilities