On Abstraction Refinement for Program Analyses in Datalog
On Abstraction Refinement for Program Analyses in Datalog Xin Zhang, Ravi Mangal, Mayur Naik Georgia Tech Radu Grigore, Hongseok Yang Oxford University
Datalog for program analysis Datalog 2 Programming Language Design and Implementation, 2014 6/10/2014
What is Datalog? Datalog 3 Programming Language Design and Implementation, 2014 6/10/2014
What is Datalog? Datalog 4 Input relations: edge(i, j). path(i, j). Output relations: Rules: (1) path(i, i). (2) path(i, k) : - path(i, j), edge(j, k). Least fixpoint computation: Input: edge(0, 1), edge(1, 2). path(0, 0). path(1, 1). path(2, 2). path(0, 1) : - path(0, 0), edge(0, 1). path(0, 2) : - path(0, 1), edge(1, 2). Programming Language Design and Implementation, 2014 6/10/2014
Why Datalog? If there exists a path from a to b, and there is an edge from b to c, then there exists a path from a to c: path(a, c) : - path(a, b), edge(b, c). 5 Datalog Programming Language Design and Implementation, 2014 6/10/2014
Why Datalog? k-object-sensitivity, k = 2, ~100 KLOC 6 Programming Language Design and Implementation, 2014 6/10/2014
Limitation k-objectsensitivity, k = 2, ~100 KLOC 7 k-objectsensitivity, k = 10, ~500 KLOC Programming Language Design and Implementation, 2014 6/10/2014
Program abstraction Abstractio n Precisio n 8 Scalability Programming Language Design and Implementation, 2014 6/10/2014
Parametric program abstraction 1 1 1 Abstractio n Precisio n 9 Scalability Programming Language Design and Implementation, 2014 6/10/2014
Parametric program abstraction 1 0 1 1 0 Abstractio n Precisio n 10 Scalability Programming Language Design and Implementation, 2014 6/10/2014
Parametric program abstraction: Example 1 Cloning depth K for each call site and allocation site 1 0 1 1 0 Pointer Analysis 11 Programming Language Design and Implementation, 2014 6/10/2014
Parametric program abstraction: Example 2 Predicates to use as abstraction predicates 1 0 1 1 0 Shape Analysis 12 Programming Language Design and Implementation, 2014 6/10/2014
Program abstraction 1 0 1 1 0 0 Datalog Program alias(p, q)? alias(m, n)? 13 Programming Language Design and Implementation, 2014 6/10/2014
Program abstraction 1 0 1 1 0 Datalog Program alias(p, q)? 14 0 1 1 0 0 Counterexample guided refinement (CEGAR) via MAXSAT Datalog Program alias(m, n)? Programming Language Design and Implementation, 2014 6/10/2014
Pointer analysis example f(){ v 1 = new. . . ; v 2 = id 1(v 1); v 3 = id 2(v 2); q 2: assert(v 3!= v 1); } g(){ v 4 = new. . . ; v 5 = id 1(v 4); v 6 = id 2(v 5); q 1: assert(v 6!= v 1); } id 1(v){return v; } id 2(v){return v; } 15 Programming Language Design and Implementation, 2014 6/10/2014
Pointer analysis as graph reachability a 1 0 a 0 6’ b 0 3 b 1 6 a 1 c 1 1 6’’ a 0 b 0 c 0 d 0 7’ 4 b 1 d 1 7 c 1 16 2 c 0 7’’ d 0 5 d 1 Programming Language Design and Implementation, 2014 6/10/2014
Graph reachability in Datalog a 1 0 a 0 6’ b 0 3 b 1 6 a 1 c 1 1 6’’ a 0 b 0 c 0 d 0 7’ 4 c 1 2 Query Tuple c 0 Output relations: path(i, j) b 1 d 1 7 7’’ d 0 5 Input relations: edge(i, j, n), abs(n) d 1 Original Query q 1: path(0, 5) assert(v 6!= v 1) q 2: path(0, 2)17 assert(v 3!= v 1) Rules: (1) path(i, i). (2) path(i, j) : - path(i, k), edge(k, j, n), abs(n). Input tuples: edge(0, 6, a 0), edge(0, 6’, a 1), edge(3, 6, b 0), … 16 possible abstractions in total Programming Language Design and Implementation, 2014 6/10/2014
Desired result a 1 0 a 0 6’ b 0 3 b 1 6 a 1 c 1 1 6’’ a 0 b 0 c 0 d 0 7’ 4 c 1 2 Query c 0 d 1 7’’ d 0 5 d 1 Answer q 1: path(0, 5) a 1 b 0 c 1 d 0 q 2: path(0, 2) Impossibility 18 Output relations: path(i, j) b 1 7 Input relations: edge(i, j, n), abs(n) Rules: (1) path(i, i). (2) path(i, j) : - path(i, k), edge(k, j, n), abs(n). Input tuples: edge(0, 6, a 0), edge(0, 6’, a 1), edge(3, 6, b 0), … Programming Language Design and Implementation, 2014 6/10/2014
Iteration 1 a 1 0 a 0 b 0 3 b 1 6 6’ a 1 c 1 1 6’’ a 0 b 0 c 0 d 0 7’ 4 b 1 d 1 7 c 1 2 Query q 1: path(0, 5) q 2: path(0, 2)19 c 0 7’’ d 0 5 Eliminated Abstractions d 1 path(0, 0). path(0, 6) : - path(0, 0), edge(0, 6, a 0), abs(a 0). path(0, 1) : - path(0, 6), edge(6, 1, a 0), abs(a 0). path(0, 7) : - path(0, 1), edge(1, 7, c 0), abs(c 0). path(0, 2) : - path(0, 7), edge(7, 2, c 0), abs(c 0). path(0, 4) : - path(0, 6), edge(6, 4, b 0), abs(b 0). path(0, 7) : - path(0, 4), edge(4, 7, d 0), abs(d 0). path(0, 5) : - path(0, 7), edge(7, 5, d 0), abs(d 0). … Programming Language Design and Implementation, 2014 6/10/2014
Iteration 1 - derivation graph a 1 0 a 0 b 0 3 b 1 6 6’ a 1 c 1 1 6’’ a 0 b 0 c 0 d 0 7’ 4 b 1 d 1 7 c 1 2 Query c 0 7’’ d 0 5 d 1 Eliminated Abstractions q 1: path(0, 5) q 2: path(0, 2)20 Programming Language Design and Implementation, 2014 6/10/2014
Iteration 1 - derivation graph path(0, 0) edge(0, 6, a 0) abs(a 0)edge(6, 1, a 0) path(0, 6)edge(6, 4, b 0) abs(c 0)edge(1, 7, c 0) path(0, 1) abs(c 0) edge(7, 2, c 0) path(0, 7) edge(7, 5, d 0) abs(d 0) path(0, 2) 21 path(0, 4)edge(4, 7, d 0) abs(d 0) path(0, 5) Programming Language Design and Implementation, 2014 6/10/2014
Iteration 1 - derivation graph path(0, 0) edge(0, 6, a 0) abs(a 0)edge(6, 1, a 0) path(0, 6)edge(6, 4, b 0) abs(c 0)edge(1, 7, c 0) path(0, 1) abs(c 0) edge(7, 2, c 0) path(0, 7) edge(7, 5, d 0) abs(d 0) path(0, 2) 22 path(0, 4)edge(4, 7, d 0) abs(d 0) path(0, 5) Programming Language Design and Implementation, 2014 6/10/2014
Iteration 1 - derivation graph path(0, 0) edge(0, 6, a 0) abs(a 0)edge(6, 1, a 0) path(0, 6)edge(6, 4, b 0) abs(c 0)edge(1, 7, c 0) path(0, 1) abs(c 0) edge(7, 2, c 0) path(0, 7) edge(7, 5, d 0) abs(d 0) path(0, 2) 23 path(0, 4)edge(4, 7, d 0) abs(d 0) path(0, 5) Programming Language Design and Implementation, 2014 6/10/2014
Iteration 1 - derivation graph path(0, 0) edge(0, 6, a 0) abs(a 0)edge(6, 1, a 0) path(0, 6)edge(6, 4, b 0) abs(c 0)edge(1, 7, c 0) path(0, 1) abs(c 0) edge(7, 2, c 0) path(0, 7) edge(7, 5, d 0) abs(d 0) path(0, 2) 24 path(0, 4)edge(4, 7, d 0) abs(d 0) path(0, 5) Programming Language Design and Implementation, 2014 6/10/2014
Iteration 1 - derivation graph a 1 0 a 0 b 0 3 b 1 6 6’ a 1 c 1 1 6’’ a 0 b 0 c 0 d 0 7’ 4 b 1 d 1 7 c 1 2 Query c 0 7’’ d 0 5 d 1 Eliminated Abstractions q 1: path(0, 5) q 2: path(0, 2)25 Programming Language Design and Implementation, 2014 6/10/2014
Encoded as MAXSAT 26 Hard Constraints Programming Language Design and Implementation, 2014 Soft Constraints 6/10/2014
Encoded as MAXSAT Avoid all the counterexample s Minimize the abstraction cost 27 Programming Language Design and Implementation, 2014 6/10/2014
Encoded as MAXSAT Query Eliminated Abstractions q 1: path(0, 5) q 2: path(0, 2) 28 Programming Language Design and Implementation, 2014 6/10/2014
Iteration 2 and beyond Iteration 1 Datalog solver Query Answer MAXSAT solver Eliminated Abstractions q 1: path(0, 5) 29 q 2: path(0, 2) Programming Language Design and Implementation, 2014 6/10/2014
Iteration 2 and beyond Iteration 2 Datalog solver Query Answer MAXSAT solver Eliminated Abstractions q 1: path(0, 5) 30 q 2: path(0, 2) Programming Language Design and Implementation, 2014 6/10/2014
Iteration 2 and beyond Iteration 2 Datalog solver Query Answer MAXSAT solver Eliminated Abstractions q 1: path(0, 5) 31 q 2: path(0, 2) Programming Language Design and Implementation, 2014 6/10/2014
Iteration 2 and beyond Iteration 2 Datalog solver Query Answer MAXSAT solver Eliminated Abstractions q 1: path(0, 5) 32 q 2: path(0, 2) Programming Language Design and Implementation, 2014 6/10/2014
Iteration 2 and beyond Iteration 3 Datalog solver MAXSAT solver q 1 is proven. Query Answer Eliminated Abstractions q 1: path(0, 5) 33 q 2: path(0, 2) Programming Language Design and Implementation, 2014 6/10/2014
Iteration 2 and beyond Iteration 3 Datalog solver MAXSAT solver q 2 is impossible to prove. q 1 is proven. Query q 1: path(0, 5) 34 Answer Eliminated Abstractions Impossibilit y q 2: path(0, 2) Programming Language Design and Implementation, 2014 6/10/2014
Mixing counterexamples Iteration 1 Iteration 3 Eliminated Abstractions : 35 Programming Language Design and Implementation, 2014 6/10/2014
Mixing counterexamples Iteration 1 Mixed! Iteration 3 Eliminated Abstractions : 36 Programming Language Design and Implementation, 2014 6/10/2014
Experimental setup � Implemented in JChord using off-the-shelf solvers: � Datalog: bddbddb � MAXSAT: Mi. Fu. Ma. X � Applied to two analyses that are challenging to scale: � k-object-sensitivity � flow-insensitive, � typestate 37 weak updates, cloning-based analysis: � flow-sensitive, � Evaluated pointer analysis: strong updates, summary-based on 8 Java programs from Da. Capo and Ashes. Programming Language Design and Implementation, 2014 6/10/2014
Benchmark characteristics classes methods bytecode(K B) KLOC toba-s 1 K 6 K 423 258 javasrc-p 1 K 6. 5 K 434 265 weblech 1. 2 K 8 K 504 326 hedc 1 K 7 K 442 283 antlr 1. 1 K 7. 7 K 532 303 luindex 1. 3 K 7. 9 K 508 295 lusearch 1. 2 K 8 K 511 314 schroederm 1. 9 k 12 K 708 460 38 Programming Language Design and Implementation, 2014 6/10/2014
Results: pointer analysis queries resolved total 4 -objectsensitivity abstraction < 50%size curren baselin t e < 3% of max 7 0 iterations final max 170 18 K 10 toba-s 7 javasrc-p 46 46 0 470 18 K 13 weblech 5 5 2 140 31 K 10 hedc 47 47 6 730 29 K 18 antlr 143 5 970 29 K 15 luindex 138 67 1 K 40 K 26 lusearch 322 29 1 K 39 K 17 schroederm 51 51 25 450 58 K 15 39 Programming Language Design and Implementation, 2014 6/10/2014
Performance of Datalog: pointer analysis Baseline k = 4, 3 h 28 m k = 3, 590 s k = 2, 214 s k = 1, 153 s 40 lusearch Programming Language Design and Implementation, 2014 6/10/2014
Performance of MAXSAT: pointer analysis lusearch 41 Programming Language Design and Implementation, 2014 6/10/2014
Statistics of MAXSAT formulae pointer analysis 42 variables clauses toba-s 0. 7 M 1. 5 M javasrc-p 0. 5 M 0. 9 M weblech 1. 6 M 3. 3 M hedc 1. 2 M 2. 7 M antlr 3. 6 M 6. 9 M luindex 2. 4 M 5. 6 M lusearch 2. 1 M 5 M schroeder-m 6. 7 M 23. 7 M Programming Language Design and Implementation, 2014 6/10/2014
Conclusion MAXSAT Datalog Abstractio n 43 Programming Language Design and Implementation, 2014 6/10/2014
Conclusion MAXSAT Datalog A(x, y): - B(x, z), C(z, y) Soundness Tradeoffs Scalability vs. Precision Hard Constraints 1 0 1 1 0 Soft Constraints Sound vs. Complete … 44 Programming Language Design and Implementation, 2014 6/10/2014
- Slides: 44