All You Ever Wanted to Know About Dynamic
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) (Yes, we were trying to overflow the title length field on the submission server) Edward J. Schwartz, Thanassis Avgerinos, David Brumley 2/26/2021 Carnegie Mellon University 1
A Few Things You Need to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis Avgerinos, David Brumley 2/26/2021 Carnegie Mellon University 2
The Root of All Evil Humans write programs This Talk: Computers Analyzing Programs Dynamically at Runtime 2/26/2021 Carnegie Mellon University 3
Two Essential Runtime Analyses Detect packing in malware Detect Exploits [Costa 2005, Crandall 2005, Newsome 2005, Suh 2004] [Bayer 2009, Yin 2007] Dynamic Taint Analysis: What values are derived from user input? Automated Test Case Generation Input Filter Generation [Costa 2007, Brumley 2008] [Cadar 2008, Godefroid 2005, Sen 2005] Forward Symbolic Execution: What input will make execution reach this line of code? 2/26/2021 Carnegie Mellon University 4
Our Contributions Computers Analyzing Programs Dynamically at Runtime Dynamic Taint Analysis: Is this value affected by user input? Forward Symbolic Execution: What input will make execution reach this line of code? 2/26/2021 Carnegie Mellon University 1: Turn English descriptions into an algorithm – Operational Semantics 2: Algorithm highlights caveats, issues, and unsolved problems that are deceptively hard 5
Our Contributions (cont’d) 3: Systematize recurring themes in a wealth of previous work 2/26/2021 Carnegie Mellon University 6
Dynamic Taint Analysis: What values are derived from user input? 1. How it works – example 2. Desired properties 3. Example issue. Paper has many more. 2/26/2021 Carnegie Mellon University 7
tainted x = get_input( ) y = x + 42 … Input is tainted goto y Taint Introduction t = Is. Untrusted(src) Input get_input(src)↓ t 2/26/2021 Δ untainted Carnegie Mellon University Var Val x 7 τ Var Tainted? x T 8
tainted Δ untainted x = get_input( ) y = x + 42 … Data derived from user input is tainted goto y Var Val x y 7 49 τ Taint Propagation Var Tainted? t 1 = τ[x 1] , t 2 = τ[x 2] Bin. Op x 1 + x 2 ↓ t 1 v t 2 x y T T 2/26/2021 Carnegie Mellon University 9
tainted Δ untainted x = get_input( y = x + 42 … goto y ) Policy Violation Detected Taint Checking Pgoto(ta) = ¬ ta (Must be true to execute) 2/26/2021 Carnegie Mellon University Var Val x y 7 49 τ Var Tainted? x y T T 10
x = get_input( y =… … goto y ) … strcpy(buffer, argv[1]) ; … return ; Jumping to overwritten return address 2/26/2021 Real Use: Exploit Detection Carnegie Mellon University 11
Memory Load Variables Memory μ Δ 2/26/2021 Var Val Addr Val x 7 7 42 τ Var Tainted? x T Addr Tainted? 7 Carnegie Mellon University τμ F 12
Problem: Memory Addresses x = get_input( y = load( x ) … goto y ) All values derived from user input are tainted? ? 2/26/2021 Carnegie Mellon University Δ Var Val x 7 μ Addr Val 7 42 τμ Addr Tainted? 7 F 13
Policy 1: Taint depends only on the memory cell Δ Var x = get_input( ) x Jump target could y = load( x ) Undertainting be any untainted … memory cell value Failing to identify tainted values Addr μ goto y - e. g. , missing exploits 7 Taint Propagation v = Δ[x] , t = τ μ[v] Load load(x) ↓ t 2/26/2021 Carnegie Mellon University τμ Val 7 Val 42 Addr Tainted? 7 F 14
Policy 2: If either the address or the memory cell is tainted, then the value is tainted x = get_input( ) y = load(jmp_table +x % 2) Overtainting … Unaffected values jmp_table are tainted goto y - e. g. , exploits on safe inputs Policy Violation? Memory Address expression is tainted printa printb Taint Propagation v = Δ[x] , t = τ μ[v], ta = τ[x] Load load(x) ↓ t v ta 2/26/2021 Carnegie Mellon University 15
Research Challenge State-of-the-Art is not perfect for all programs Overtainting: Policy may wrongly detect taint Undertainting: Policy may miss taint 2/26/2021 Carnegie Mellon University 16
Forward Symbolic Execution: What input will make execution reach this line of code? • How it works – example • Inherent problems of symbolic execution • Proposed solutions 2/26/2021 Carnegie Mellon University 17
The Challenge 232 possible inputs bad_abs(x is input) if (x < 0) then return -x if (x = 0 x 12345678) then return -x return x 0 x 12345678 Forward Symbolic Execution: What input will make execution reach this line of code? 2/26/2021 Carnegie Mellon University 18
A Simple Example x symbolic can have What input will any value make execution reach this line of x≥ 0 code? Interpreter bad_abs(x is input) If (x < 0) Interpreter f return -x If x == 0 x 12345678 Interpreter t f return x x≥ 0Λ x != 0 x 12345678 2/26/2021 t x<0 return -x x≥ 0Λ x == 0 x 12345678 Carnegie Mellon University 19
One Problem: Exponential Blowup Due to Branches Interpreter Branch 1 Branch 2 Branch 3 Exponential Number of Interpreters/formulas in # of branches 2/26/2021 Carnegie Mellon University 20
Path Selection Heuristics Symbolic Execution Tree However, these are heuristics. In the worst case all create an exponential number of formulas in the tree height. … 2/26/2021 • Depth-First Search (bounded) , Random Search [Cadar 2008] • Concolic Testing [Sen 2005, Godefroid 2008] Carnegie Mellon University 21
Symbolic Execution is not Easy • Exponential number of interpreters/formulas branching • Exponentially-sized formulas substitution s+s+ s + s + s == 42 • Solving a formula is NP-Complete! 2/26/2021 Carnegie Mellon University 22
Other Important Issues Formalization More complex policies Π = (s + s + s) == 42 2/26/2021 Carnegie Mellon University 23
Conclusion • Dynamic taint analysis and forward symbolic execution used extensively in literature – Formal algorithm and what is done for each possible step of execution often not emphasized • We provided a formal definition and summarized – Critical issues – State-of-the-art solutions – Common tradeoffs 2/26/2021 Carnegie Mellon University 24
Thank You! thanassis@cmu. edu Questions? 2/26/2021 Carnegie Mellon University 25
- Slides: 25