A PolynomialTime Algorithm for Global Value Numbering SAS

A Polynomial-Time Algorithm for Global Value Numbering SAS 2004 Sumit Gulwani George C. Necula 0

Global Value Numbering Goal: Discover equivalent expressions in procedures Applications: • Compiler optimizations – Copy propagation, Constant propagation, Common subexpression elimination, Induction variable elimination etc. • Program verification – Discover loop invariants, verify program assertions • Discover equivalent computations across programs – Plagiarism detection tools, Translation validation 1

Global Value Numbering Equivalence problem is undecidable. Simplification Assumptions: • Operators are uninterpreted (will not discover x = c) • Conditionals are non-deterministic (will not discover y = c) • Will discover z = c c : = a £ b; If (b == 3) True x : = b £ a; y : = a £ 3; False z : = a £ b; 2

Non-trivial Example * x : = a; y : = a; z : = F(a); x : = b; y : = b; z : = F(b); assert(x = y); assert(z = F(y)); 3

Existing Algorithms • Algorithms that work on SSA form of the program – Alpern, Wegman, Zadeck’s (AWZ) algorithm: POPL 1988 • Polynomial, Incomplete – Ruthing, Knoop, Steffen’s (RKS) Algorithm: SAS 1999 • Polynomial, Incomplete, Improvement on AWZ • Dataflow analysis or Abstract interpretation based – Kildall’s Algorithm: POPL 1973 • Exponential, Complete – Our Algorithm: POPL 2004 • Polynomial, Complete, Randomized – Our Algorithm: this paper • Polynomial, Complete 4

Why SSA based algorithms are incomplete? x = (a, b) * x : = a; y : = a; z : = F(a); y = (a, b) x : = b; y : = b; z : = F(b); z = (F(a), F(b)) F(y) = F( (a, b)) assert(x = y); assert(z = F(y)); • AWZ Algorithm: functions are uninterpreted – fails to discover second assertion • RKS Algorithm: uses rewrite rules for normalization – Does not discover all assertions in little more involved examples. – Rewrite rules not applied exhaustively (exp applications o. w. ) – Rules are pessimistic in handling loops 5

Abstract Interpretation based algorithm G 10 Join Node G 20 G = Join(G 10, G 20) G 0 Assignment Node x : = e G = SP(G 0, x : = e) G 0 Conditional Node * G 1= G 0 G 2= G 0 6

Outline • Strong equivalence DAG (SED) • The join operation: Idea #1 • Pruning an SED: Idea #2 • The strongest postcondition operation • Fixed point computation 7

Representing Equivalences a : = 1; b : = 2; x : = F(1, 2); { a, 1 } { b, 2 } { x, F(1, 2) } 8

Representing Equivalences a : = 1; b : = 2; x : = F(1, 2); { a, 1 } { b, 2 } { x, F(1, 2), F(a, 2), F(1, b), F(a, b) } Such an explicit representation can be exponential. 9

Strong Equivalence DAG (SED) A data structure for representing equivalences. • Nodes n: <Set of variables, Type> • Type: c, ? , F(n 1, n 2) • Terms(n): set of equivalent expressions • – Terms(<V, ? >) = V – Terms(<V, c>) = V [ { c } – Terms(<V, F(n 1, n 2)>) = V [ { F(e 1, e 2) | e 1 2 Terms(n 1), e 2 2 Terms(n 2) } 8 variables x, 9 at most one node <V, t> s. t. x 2 V – called Node(x) 10

SED: Example n 4 n 3 n 1 e, F d, c, F a, 2 b, ? n 2 This SED represents the following partition: Terms(n 1) = { a, 2 } Terms(n 2) = { b} Terms(n 3) = { c, d, F(a, b), F(2, b) } Terms(n 4) = { e, F(c, b), F(d, b), F(F(a, b), F(F(2, b) } 11

Outline • Strong equivalence DAG (SED) • The join operation: Idea #1 • Pruning an SED: Idea #2 • The strongest postcondition operation • Fixed point computation 12

The Join Operation G = Join(G 1, G 2) G is obtained by product construction of G 1 and G 2 If n=<V 1, t 1> 2 G 1 and m=<V 2, t 2> 2 G 2, then [n, m]= <V 1 Å V 2, t 1 t t 2> 2 G Definition of t 1 t t 2 ctc=c F(l 1, r 1) t F(l 2, r 2) = F ([l 1, l 2], [r 1, r 2]) t 1 t t 2 = ? , otherwise Proof of Correctness Terms([n, m]) = Terms(n) Å Terms(m) (Thus product construction = partition intersection) 13

Example: The Join Operation y 1, F y 2, F y 3, y 4 y 5, ? y 6, ? G 1 y 7, ? F F y 3, ? y 4, y 5 ? G 2 y 6, y 7 ? F F y 3, ? y 4, y 5 ? y 7, ? y 6, ? G = Join(G 1, G 2) 14

Outline • Strong equivalence DAG (SED) • The join operation: Idea #1 • Pruning an SED: Idea #2 • The strongest postcondition operation • Fixed point computation 15

Motivation: The Prune Operation • If G=Join(G 1, G 2), then Size(G) can be Size(G 1) £ Size(G 2) • There are programs, where size of SEDs after n joins is exponential in n. Discovering equivalences among all expressions vs. among program expressions For the latter, it is sufficient to discover equivalences among all terms of size at most t at each program point (where t = #variables * size of program). Thus, SEDs can be pruned to have a small size. 16

The Prune Operation Prune(G, k) • For each node <V, t>, check if x 2 V is equal to some F-term of size less than k. • If not, then delete all the nodes that are reachable from only <V, t> 17

Example: The Prune Operation y 1, G y 2, F y 2, ? F F y 3, ? y 4, y 5 ? G y 6, ? y 7, ? y 4, y 5 ? Prune(G, 2) 18

Outline • Strong equivalence DAG (SED) • The join operation: Idea #1 • Pruning an SED: Idea #2 • The strongest postcondition operation • Fixed point computation 19

The Strongest Postcondition Operation G = SP(G 0, x : = e) To obtain G from G’, do: • Delete label x from Node(x) in G 0 • Let n=<V, t> be the node in G 0 s. t. e 2 Terms(n) (Add such a node to G 0 if it does not already exists) Add x to V. 20

Example: The Strongest Postcondition Operation u, F z, u, F x, ? G 0 z, F x, ? G = SP(G 0, u : = F(z, x)) 21

Outline • Strong equivalence DAG (SED) • The join operation: Idea #1 • Pruning an SED: Idea #2 • The strongest postcondition operation • Fixed point computation 22

Fixed Point Computation and Complexity • The lattice of sets of equivalences (among uninterpreted function terms) has height at most k. • Complexity – Dominated by the cost of join operations – # of join operations: O(j £ k) – Each join operation: O(k 2 £ N) • This requires doing pruning while computing join – Total cost: O(k 3 £ N £ j) k: # of variables N: size of program j: # of join points in program 23

Example G 1 z, F x, y, 1 G 3 z, F x, y, ? x : = 1; y : = 1; z : = F(1, 1); x : = 2; y : = 2; z : = F(2, 2); L 1 z, F x, y, 2 L 3 u : = F(x, y); G 4 L 4 Assert(u = z); G 3 = Join(G 1, G 2) G 2 u, z, F x, y, ? G 4 = Assignment(G 3, u : = F(x, y)) 24

Conclusion • Idea #1: Join of 2 SEDs = Product construction • Idea #2: Prune SEDs (Discovering equivalences among program expressions does not require computing equivalences involving large terms) Future Work • Inter-procedural value numbering • Abstract interpretation for combined theory of linear arithmetic and uninterpreted functions 25