Refining Buffer Overflow Detection via DemandDriven PathSensitive Analysis
Refining Buffer Overflow Detection via Demand-Driven Path-Sensitive Analysis Wei Le and Mary Lou Soffa University of Virginia 1
Motivation • Buffer overflow: 20 years since Morris Worm, still the most common exploit • Challenge: eliminate exploitable buffer overflows – Detect where buffer overflow can occur – Determine cause and remove it 2
Problems of Static Approaches • Detection Precision: false positives • Report for errors does not provide much information for diagnosis – report an overflow point in the program • Not fully automatic: manual annotation 3
Our Goals and Approaches • Goal: automatically identify paths on which a buffer overflow can occur and report the path segment that causes the overflow • Challenge: huge number of paths • Approach: – interprocedual path-sensitive for precision and help diagnosis – demand-driven for scalability 4
Five Types of Paths • Infeasible: no input can exercise the path • Safe: no input can overflow the buffer • Vulnerable: users can write any content to the buffer • Overflow-user-independent: the buffer content is statically determinable • Don’t-know: the buffer status cannot be judged statically 5
An Example Infeasible Safe Overflow 1 resolved wbuf y 2 rootd = 1 n 3 rootd = 0 4 LEN = 6 exit 5 strlen(wbuf)+rootd+1+ strlen(resolved) > LEN y n 6 rootd == 0 y 7 strcat(resolved, “/”) n 8 strcat(resolved, wbuf) wu-ftpd 2. 6. 2 realpath. c 6
Demand-Driven Analysis char resolved [LEN ] …… Q 052 (LEN-1<l, f) 2 1 y n rootd = 1 Q 1 (s+1<l, f) 7 6 rootd == 0 y strcat(resolved, “/”) n 8 strcat(resolved, wbuf) 3 Q 05 (LEN-1 -rootd<l, f) Q 15 (LEN-rootd<l, f) 5 strlen(wbuf)+rootd+1+ strlen(resolved) > LEN y n Q 053 (LEN-1<l, f) Q 153 (LEN<l, f) rootd = 0 Infeasible 4 exit Solved Q 0 Q 1 s: strlen(resolved)+strlen(wbuf) l: sizeof(resolved) f: wbuf Q 0 (s<l, f) 7
The Demand-Driven Model • PVS (potentially vulnerable statement) strcpy(a, b) • Query sizeof(a) > strlen(b), flag • Information for Updating Queries char a[9] • Propagation Rules interprocedural, loop, join point, infeasible • Resolving the Query false, flag = user input 8
Approach Program Feasibility Detection PVS Infeasible Paths Node Information Overflow Properties Raise Query Propagate Query Update Query Resolve Query Yes Propagate Results Label Paths No 9
Experiments • Purpose − Existence of the 5 types of paths − Benefit of demand-driven analysis • Implementation: Microsoft Phoenix APIs[phoenix] • Benchmarks − 9 programs, size 0. 4 -97. 3 K LOC − the Bug. Bench[06 lu] and Buffer Overflow Benchmark[03 Zitser] 10
Experimental Results Benchmark Path Types CNST Un. K 0 0 polymorph-0. 4. 0 ncompress-4. 2. 4 Vul 966 288 man-1. 5 h 1 gzip-1. 2. 4 bc-1. 06 squid-2. 3 wu-ftp sendmail BIND 16 0 1 0 0 >50, 000 0 0 4320 0 48 0 0 0 4 0 0 2 Safe 0 0 24 0 >30, 000 2 18, 624 648 0 11
Experimental Results • All defined types of paths exist • Problematic paths manifest certain complexity • Memory usage: 9 -65 MB • Time cost: 0. 24 -102. 6 s 12
User Scenario Entry PVS 13
User Scenario Entry Overflow User Independent Vulnerable PVS 14
User Scenario Entry Overflow User Independent Vulnerable PVS 15
Benchmark User Scenario Average Path Size #P #B polymorph-0. 4. 0 2. 5 25. 9 Entry Overflow User Independent Vulnerable ncompress-4. 2. 4 2. 0 27. 8 man-1. 5 h 1 1. 8 14. 3 gzip-1. 2. 4 3. 0 5 squid-2. 3 1. 0 6. 8 wu-ftp 3. 8 33. 6 sendmail 2. 0 35. 5 BIND 2. 0 23. 5 Root Cause PVS 16
Related Work • Static Detection for Buffer Overflow ARCHER[03 xie] BOON[00 wagner] ESPx[06 hackett] Prefast[ms] Prefix[00 bush] Splint[96 evans] • Path-Sensitive Analysis for Defects ARCHER[03 xie] ESPx[06 hackett] ESP [02 das] IPSSA[03 livshits] MOPS[02 check] Prefix[00 bush] • Demand-Driven Approach − A general framework[96 Duesterwald] − Application for dataflow computation[96 Duesterwald], infeasible detection[97 bodik], memory leak[06 Orlovich] , postmortem analysis[04 Manevich] 17
Conclusions • A categorization of five types of paths for buffer overflow • An interprocedual demand-driven pathsensitive diagnosis tool for identifying the type of paths through a potential overflow • Experimental results that demonstrate the path types existing in real program 18
Thank you and Questions? 19
- Slides: 19