PROGRAM ANALYSIS AND CYBER SECURITY Noam Rinetzky Slides
PROGRAM ANALYSIS AND CYBER SECURITY Noam Rinetzky Slides credit: Tom Ball, Dawson Engler, Roman Manevich, Erik Poll, Mooly Sagiv, Jean Souyris, Eran Tromer, Avishai Wool, Eran Yahav
Software is Everywhere
Software is Everywhere
Unreliable Software is Everywhere
December 31, 2008
Zune bug 1 while (days > 365) { 2 if (Is. Leap. Year(year)) { 3 if (days > 366) { 4 days -= 366; 5 year += 1; 6 } 7 } else { 8 days -= 365; 9 year += 1; 10 } 11 } December 31, 2008 7
Zune bug 1 while (366 > 365) { 2 if (Is. Leap. Year(2008)) { 3 if (366 > 366) { 4 days -= 366; 5 year += 1; 6 } 7 } else { 8 days -= 365; 9 year += 1; 10 } 11 } December 31, 2008 Suggested solution: wait for tomorrow 8
Therac-25 leads to 3 deaths and 3 injuries Software error exposes patients to radiation overdose (100 X of intended dose) 1985 to 1987
February 25, 1991 10
Patriot Bug - Rounding Error Time measured in 1/10 seconds Binary expansion of 1/10: 0. 0001100110011001100. . 24 -bit register 0. 000110011001100 error of 0. 0000000000001100. . . binary, or ~0. 000000095 decimal After 100 hours of operation error is 0. 000000095× 100× 3600× 10=0. 34 A Scud travels at about 1, 676 meters per second, and so travels more than half a kilometer in this time Suggested solution: reboot every 10 hours 11
Northeast Blackout 14 August, 2003
Toyota recalls 160, 000 Prius hybrid vehicles Programming error can activate all warning lights, causing the car to think its engine has failed October 2005
Boeing's 787 Vulnerable to Hacker Attack security vulnerability in onboard computer networks could allow passengers to access the plane's control systems January 2008
Unreliable Software is Exploitable h: A c a e r b n RSA h ork w t acked e N n R o i , info s S t A's co ut ta p S rmat h y c a l r a a p P e z o r d ion le R y n r b u a a te net n on ra SA des ta f o a S f aks b d e o t n h f k w c T Stuxnye-tth. W s e orm Statio r i o i r b r r p k e ersist suffe s as a y. Still utt o it ig. Choentrol at f h e red w s n a u id. Ireannt 's e. N Suocnleya Pla opl. O t c e t c h e i n hat r s e f r s S orma itees at att ful ad iv t a v i Mass 7 million p , Experts Say on" w ck, an anced affect 7 a d t s u " s o c t t h e o e r l t e ) s a Tahbe Stu 1 n ecurit in authe thatt 2 x 0 n 1 et worm, named after in l i r y c a n p a o b t n i A f c itials (M S ro fo(und in its code, is arch 2 ationer and Acecure. Ir. Dt o somehow the most 01 e 1 ad tack sophisticated cyb erweapon ever cre r, Adobe R )llow an at e d. tially a is laayte h P t t h n a s e ash th ot l Fla s F p t e r a d b (December 2010) o n a i o r Ad e a crash a ere are rep attacks v email o f y r tends mayed as an aus viso. Th RSAtato e c d g m k d e r A l e t u y s it n y co ver be behind major i l e d Secur lnerability affected s the wildn ietw ilesecurity problem )o frk s n l i e u x. v d s at Lockheed h ( t e s l i t of ce. Martin oi l Th l x p o E x r t t e f on so ng take c bility is bei in a Micro Lockheed Martin remo d a e r te access network, d e d n l e b vu m p ro e te cted by Secur. ID token file s, has been (. swf) ent. sh u t d o w n m attach 2011) (May 2011) h c r a (M
Percentage of Remotely Exploitable Vulnerabilities (source: IBM X-Force)
Buffer Overflow Exploits Memory addresses void foo (char *x) { … char buf[2]; strcpy(buf, x); Previous frame br } int main (int argc, char *argv[]) { foo(argv[1]); Return address da Saved FP ca } char* x ra buf[2] ab . /a. out abracadabra Segmentation fault Stack grows this way 17
Buffer Overflow Exploits int check_authentication(char *password) { int auth_flag = 0; char password_buffer[16]; strcpy(password_buffer, password); if(strcmp(password_buffer, ”pass 1") == 0) auth_flag = 1; if(strcmp(password_buffer, ”pass 2") == 0) auth_flag = 1; return auth_flag; } int main(int argc, char *argv[]) { if(check_authentication(argv[1])) { printf("n-=-=-=-=-=-=-=-n"); printf(" Access Granted. n"); printf("-=-=-=-=-=-=-=-n"); } else printf("n. Access Denied. n"); } (source: “hacking – the art of exploitation, 2 nd Ed”) 18
Input Validation evil input Application 1234567890123456 -=-=-=-=-=-=-= Access Granted. -=-=-=-=-=-=-=-
What can we do about it?
What can we do about it? I just want to say LOVE YOU SAN!!soo much Billy Gates why do you make this possible ? Stop making money and fix your software!! (W 32. Blaster. Worm / Lovesan worm) August 13, 2003 21
What can we do about it? Monitoring Testing Static Analysis Formal Verification Run time Design Time
Monitoring (runtime defenses) monitored application (Outlook) user space monitor open(“/etc/passwd”, “r”) OS Kernel Stack. Guard Pro. Police Point. Guard Security monitors (ptrace)
Testing build it; try it on a some inputs printf (“x == 0 => should not get that!”)
Testing Valgrind memory errors, race conditions, taint analysis Simulated CPU Shadow memory Invalid read of size 4 at 0 x 40 F 6 BBCC: (within /usr/libpng. so. 2. 1. 0. 9) by 0 x 40 F 6 B 804: (within /usr/libpng. so. 2. 1. 0. 9) by 0 x 40 B 07 FF 4: read_png_image(QImage. IO *) (kernel/qpngio. cpp: 326) by 0 x 40 AC 751 B: QImage. IO: : read() (kernel/qimage. cpp: 3621) Address 0 x. BFFFF 0 E 0 is not stack'd, malloc'd or free'd
Testing Valgrind memory errors, race conditions Parasoft Jtest/Insure++ memory errors + visualizer, race conditions, exceptions … IBM Rational Purify memory errors IBM Pure. Coverage detect untested paths Daikon dynamic invariant detection
Testing Useful and challenging Random inputs Guided testing (coverage) Bug reproducing But … Observe some program behaviors What can you say about other behaviors?
Formal verification Mathematical model of software : Var Z = [x 0, y 1] Logical specification { 0 < x } = { ε State | 0 < (x) } Machine checked formal proofs { 0 < x ∧ y = x } → { 0 < y } y: = y+1 { 1< y } { 0 < x } y: = x { 0 < x ∧ y = x } { ? } { 0 < x } y: = x ; y: =y+1 { 1 < y }
Formal verification Mathematical model of software State = Var Integer S = [x 0, y 1] Logical specification { 0 < x } = { S ε State | 0 < S(x) } Machine checked formal proofs { Q’ } → { P’ } stmt 2 { Q } { P } stmt 1 { Q’ } { P } stmt 1; stmt 2 { Q }
L 4. verified [Klein+, ’ 09] Microkernel IPC, Threads, Scheduling, Memory management Functional correctness (using Isabelle/HOL) + + + No null pointer de-references. No memory leaks. No buffer overflows. No unchecked user arguments … Kernel/proof co-design Implementation - 2. 5 py (8, 700 LOC) Proof – 20 py (200, 000 LOP)
What can we do about it? Monitoring Testing Static Analysis Formal Verification Run time Design Time
Static Analysis Lightweight formal verification Formalize software behavior in a mathematical model (semantics) Prove (selected) properties of the mathematical model Automatically, typically with approximation of the formal semantics
Why static analysis? Some errors are hard to find by testing arise in unusual circumstances/uncommon execution paths �buffer overruns, unvalidated input, exceptions, . . . involve non-determinism �race conditions Full-blown formal verification too expensive
What is Static analysis Develop theory and tools for program correctness and robustness Reason statically (at compile time) about the possible runtime behaviors of a program “The algorithmic discovery of properties of a program by inspection of its source text 1” -- Manna, Pnueli 1 Does not have to literally be the source text, just means w/o running it 34
Static Analysis x = ? if (x > 0) { y = 42; } else { y = 73; foo(); } assert (y == 42); Bad news: problem is generally undecidable 35
Static Analysis Central idea: use approximation Over Approximation Exact set of configurations/ behaviors Under Approximation universe 36
Over Approximation x = ? if (x > 0) { y = 42; } else { y = 73; foo(); } assert (y == 42); Over approximation: assertion may be violated 37
Precision main(…) { printf(“assertion may be vioaltedn”); } Lose precision only when required Understand where precision is lost 38
Example main(int i) { int x=3, y=1; do { y = y + 1; } while(--i > 0) assert 0 < x + y Determine what states can arise during any execution } Challenge: set of states is unbounded 39
Abstract Interpretation main(int i) { int x=3, y=1; do { y = y + 1; } while(--i > 0) assert 0 < x + y [Cousot, ’ 79] Recipe 1) Abstraction 2) Transformers 3) Exploration Determine what states can arise during any execution } Challenge: set of states is unbounded Solution: compute a bounded representation of (a superset) of program states 40
1) Abstraction main(int i) { int x=3, y=1; do { y = y + 1; } while(--i > 0) assert 0 < x + y } concrete state : Var Z abstract state #: Var {+, 0, -, ? } x y i 3 1 7 x y i + + + 3 2 6 … 41
2) Transformers main(int i) { int x=3, y=1; do { y = y + 1; } while(--i > 0) assert 0 < x + y } concrete transformer x y i y = y + 1 3 1 0 x y i 3 2 0 abstract transformer x y i y = y + 1 x y i + + 0 0 + ? 0 + 0 0 + + 0 + ? 0 + - 42
3) Exploration x y i main(int i) { int x=3, y=1; do { y = y + 1; } while(--i > 0) assert 0 < x + y } x y i ? ? ? + + ? + + ? 43
3) Exploration’ x y i main(int i) { int x=3, y=1; do { y = y - 2; y = y + 3; } while(--i > 0) assert 0 < x + y; } ? ? ? + + ? ? False alarms (false positive) 44
Goal: exploring program states bad states reachable states initial states 45
Technique: explore abstract states bad states reachable states initial states 46
Technique: explore abstract states bad states reachable states initial states 47
Technique: explore abstract states bad states reachable states initial states 48
Technique: explore abstract states bad states reachable states initial states 49
Sound: cover all reachable states bad states reachable states initial states 50
Imprecise abstraction bad states reachable states False alarms (false positive) initial states 51 51
Testing is unsound: miss some reachable states bad states reachable states initial states 52
Testing is unsound: miss some reachable errors No false positives (complete) reachable states bad states False negatives (Unsound) initial states 53
How to find “the right” abstraction? Pick an abstract domain suited for your property Numerical domains Domains for reasoning about the heap … Combination of abstract domains 54
Intervals Abstraction y y � [3, 6] 6 5 4 3 2 1 0 1 2 3 4 x x � [1, 4] 55
Example x � � x=0 int x = 0; if (? ) x++; x � [0, 0] if x � [0, 0] x � [0, 1] x � [0, 2] x++ x � [1, 1] x++ x � [1, 2] if exit [a 1, a 2] � [b 1, b 2] = [min(a 1, b 1), max(a 2, b 2)] 57
Polyhedral Abstraction abstract state is an intersection of linear inequalities of the form a 1 x 2+a 2 x 2+…anxn � c represent a set of points by their convex hull (image from http: //www. cs. sunysb. edu/~algorith/files/convex-hull. shtml) 58
Mc. Carthy 91 function proc MC (n : int) returns (r : int) var t 1 : int, t 2 : int; begin if n > 100 then r = n - 10; else t 1 = n + 11; t 2 = MC(t 1); r = MC(t 2); endif; end var a : int, b : int; begin /* top */ b = MC(a); end if (n>=101) then n-10 else 91 59
Mc. Carthy 91 function proc MC (n : int) returns (r : int) var t 1 : int, t 2 : int; if (n>=101) then n-10 else 91 begin /* top */ if n > 100 then /* [|n-101>=0|] */ r = n - 10; /* [|-n+r+10=0; n-101>=0|] */ else /* [|-n+100>=0|] */ t 1 = n + 11; /* [|-n+t 1 -11=0; -n+100>=0|] */ t 2 = MC(t 1); /* [|-n+t 1 -11=0; -n+100>=0; -n+t 2 -1>=0; t 2 -91>=0|] */ r = MC(t 2); /* [|-n+t 1 -11=0; -n+100>=0; -n+t 2 -1>=0; t 2 -91>=0; r-t 2+10>=0; r-91>=0|] */ endif; /* [|-n+r+10>=0; r-91>=0|] */ end var a : int, b : int; begin /* top */ b = MC(a); /* [|-a+b+10>=0; b-91>=0|] */ end if (n>=101) then n-10 else 91 60
Following the recipe (in a nutshell) 1) Abstraction n t x x n Abstract state Concrete state 2) Transformers t->n = x n t x n 61
Example: shape (heap) analysis void stack-init(int i) { // test for i = 4 Node* x = null; emp do { Node t = malloc(…) t->n = x; x = t; } while(--i>0) t t x t t t x x n n x t x n n t x } n n n n x Top = x; assert(acyclic(Top)) n x n t n t x n n x t x n t t x top 62
3) Exploration void stack-init(int i) { Node* x = null; emp do { Node t = malloc(…) t->n = x; x = t; } while(--i>0) Top = x; assert(acyclic(Top)) } t t x t x x x t n n t x n n n tt x x n x Top n n t x n n x t x Top n t x x t n t t x t n n t n x Top n 63
Astree [Cousot+, ’ 02 -05] Prove absence of runtime errors in safety critical C code Synchronous, sequential programs (Avionics) Properties: division by 0, floating point overflow, … Verified primary flight control software of the Airbus A 340 fly-by-wire system (132, 000 LOC)
Astree [Cousot+, ’ 02 -05] Airbus code Analysed program k LOC False alarms Analysis time Sequential 1 18 3 1 h 14 Sequential 2 37 2 10 min Synchronous 1 100 3 7 h 15 Synchronous 2 76 0 6 h Synchronous 3 500 2 30 h Metrics (2. 6 GHz, 16 Gb RAM PC)
Static Driver SLAM Verifier Rules Static Driver Verifier Precise API Usage Rules (SLIC) Defects 100% path coverage Driver’s Source Code in C Environment model
SLAM State machine for locking state { enum {Locked, Unlocked} s = Unlocked; } Rel Unlocked Acq Rel Locked Acq Error Locking rule in SLIC Ke. Acquire. Spin. Lock. entry { if (s==Locked) abort; else s = Locked; } Ke. Release. Spin. Lock. entry { if (s==Unlocked) abort; else s = Unlocked; }
SLAM (now SDV) [Ball+, ’ 11] 100 drivers and 80 SLIC rules. The largest driver ~ 30, 000 LOC Total size ~450, 000 LOC The total runtime for the 8, 000 runs (driver x rule) 30 hours on an 8 -core machine 20 mins. Timeout Useful results (bug / pass) on over 97% of the runs Caveats: pointers (imprecise) & concurrency (ignores)
Scaling false positives bad states reachable states false negatives initial states Sound Complete 69
Unsound static analysis false positives bad states reachable states false negatives initial states 70
Unsound static analysis Static analysis No code execution Trade soundness for scalability Do not cover all execution paths But cover “many”
Find. Bugs [Pugh+, ’ 04] Analyze Java programs (bytecode) Looks for “bug patterns” Bug patterns Method() vs method() Override equal(…) but not hash. Code() Unchecked return values Null pointer dereference App 17 KLOC NP bugs Other Bugs Bad Practice Dodgy Sun JDK 1. 7 597 68 180 594 654 Eclipse 3. 3 1447 146 259 1079 653 Netbeans 6 1022 189 305 3010 1112 glassfish 2176 146 154 964 1222 jboss 178 30 57 263 214
PREfix [Pincus+, ’ 00] Developed by Pinucs, purchased by Microsoft Automatic analysis of C/C++ code Memory errors, divide by zero Inter-procedural bottom-up analysis Heuristic - choose “ 100” paths Minimize effect of false positive Program KLOC Time Mozilla browser 540 11 h Apache 15 m 49 2 -5 warnings per KLOC
PREfast Analyze Microsoft kernel code + device drivers Memory errors, races, Part of Microsoft visual studio Intra-procedural analysis User annotations memcpy( __out_bcount( length ) dest, __in_bcount( length ) src, length ); PREfix + PREfast found 1/6 of bugs fixed in Windows Server’ 03
Coverity [Engler+, ‘ 04] Looks for bug patterns Enable/disable interrupts, double locking, double locking, buffer overflow, … Learns patterns from common Robust & scalable 150 open source program -6, 000 bugs Unintended acceleration in Toyota
Summary Cyber-security threats are real, and here to stay Software is the new battlefront Automatic program analysis techniques are critical both for defense and offense
Sound SA vs. Testing Sound SA Unsound SA Testing Can find rare errors Can raise false alarms Can miss errors Finds real errors Cost ~ program’s complexity Cost ~ program’s execution Can handle limited classes of programs and still be useful No need to efficiently handle rare cases 78
Sound SA vs. Formal verification Sound Static Analysis Formal verification Fully automatic Requires specification and loop invariants Applicable to a programming language Program specific Can be very imprecise May yield false alarms Relatively complete Provides counter examples Provides useful documentation Can be mechanized using theorem provers 79
Bill Gates, Win. Hec’ 02 Things like even software verification, this has been the Holy Grail of computer science for many decades but now in some very key areas, for example, driver verification we’re building tools that can do actual proof about the software and how it works in order to guarantee the reliability.
Conclusions Tool helps Find significant bugs �improve reliability �Improve productivity Long way to go Limited classes of bugs Precision - false positive render tool useless Scaling – program analysis is difficult Part of a larger picture
More Verification condition generators Verifying the verifiers Pointer analysis Concurrent + distributed systems Learning abstractions Symbolic execution Model checking SMT solvers
Additional security applications of static analysis and formal verification Automatic Test/Exploit Generation [Avgerinos Cha Hao Brumley ‘ 11] Perform static analysis to search for potential vulnerabilities, generate an initial input that triggers the bug, and generate exploit input Automatic Patch-based Exploit Generation Automatically generate exploits from the patch binary and the original vulnerable program binary and sometimes in minutes of time Automatic Malware Dissection and Trigger-based Behavior Analysis Automatic exploration of program execution paths in malware to uncover trigger conditions (such as the time used in time bombs and commands in botnet programs) and trigger-based behavior See Bit. Blaze project: http: //bitblaze. cs. berkeley. edu
The End
- Slides: 83