Global Redundancy Elimination Computing Available Expressions COMP 512

Global Redundancy Elimination: Computing Available Expressions COMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use. COMP 512, Fall 2003 1

Review So far, we have seen • Local Value Numbering > Finds redundancy, constants, & identities in a block • Superlocal Value Numbering Extends local value numbering to EBBs > Used SSA-like name space to simplify bookkeeping > • Dominator Value Numbering Extends scope to “almost” global (no back edges) > Uses dominance information to handle join points in CFG > Today’s Lecture • Global Common Subexpression Elimination (GCSE) > Applying global data-flow analysis to the problem Today’s lecture: computing AVAIL COMP 512, Fall 2003 2

The Idea The evaluation of an expression e at point p is redundant if and only if every path from the procedure’s entry to p contains an evaluation of e and the value(s) of e’s consitutent subexpressions do not change between those earlier evaluations and p Evaluating e at p always produces the same value as those earlier evaluations From the example in the last lecture: D u e + f E u e + f F u (u , u ) 2 0 1 x e + f e+f is redundant The trick lies in finding these redundant subexpressions COMP 512, Fall 2003 3

Using Available Expressions for GCSE The goal Find common subexpressions whose range spans basic blocks, and eliminate unnecessary re-evaluations The mechanism • Pose the problem as a system of simultaneous equations over the CFG of the code • Solve the equations to produce a set for each CFG node that contains the names of every expression available on entry • Use these sets, AVAIL(n), as the basis for redundancy elimination COMP 512, Fall 2003 4

Using Available Expressions for GCSE The goal Find common subexpressions whose range spans basic blocks, and eliminate unnecessary re-evaluations Safety • x+y AVAIL(n) proves that earlier value of x+y is the same • Transformation must provide a name for each such value • Several schemes for this mapping Profitability • Don’t add any evaluations • Add some copy operations • Copies are inexpensive • Many copies coalesce away • Copies can shrink or stretch live ranges COMP 512, Fall 2003 * 5

Computing Available Expressions For each block b • Let AVAIL(b) be the set of expressions available on entry to b • Let EXPRKILL(b) be the set of expression killed in b • Let DEEXPR(b) be the set of downward exposed expressions > x DEEXPR(b) x defined in b & not subsequently killed in b Now, AVAIL(b) can be defined as: AVAIL(b) = x pred(b) (DEEXPR(x) (AVAIL(x) EXPRKILL(x) )) AVAIL(n 0) = Ø Entry node in CFG is n 0 where preds(b) is the set of b’s predecessors in the CFG This system of simultaneous equations forms a data-flow problem > Solve it with a data-flow algorithm COMP 512, Fall 2003 6

Using Available Expressions for GCSE The Big Picture 1. block b, compute AVAIL(b) 2. Assign unique global names to expressions in AVAIL(b) 3. block b, value number b starting with AVAIL(b) To compute AVAIL(b) : 1. block b, compute DEEXPR(b) and EXPRKILL(b) 2. block b, compute AVAIL(b) COMP 512, Fall 2003 7

Computing Available Expressions First step is to compute DEEXPR & EXPRKILL assume a block b with operations o 1, o 2, …, ok VARKILL Ø DEEXPR(b) Ø Backward through block for i = k to 1 assume oi is “x y + z” add x to VARKILL if (y VARKILL) and (z VARKILL) then add “y + z” to DEEXPR(b) EXPRKILL(b) Ø } } For each expression e for each variable v e if v VARKILL(b) then EXPRKILL(b) {e } COMP 512, Fall 2003 Many data-flow problems have initial information that costs less to compute O(k) steps O(N) steps N is # of operations * 8

Computing Available Expressions The worklist iterative algorithm Worklist { all blocks, bi } while (Worklist Ø) remove a block b from Worklist recompute AVAIL(b ) as AVAIL(bi) = x pred(b) (DEEXPR(x) (AVAIL(x) EXPRKILL(x) )) if AVAIL(b ) changed then Worklist successors(b ) • Finds fixed point solution to equation for AVAIL • That solution is unique • Identical to “meet over all paths” solution COMP 512, Fall 2003 } How do we know these things? Today, trust me * 9

Back to Our Example AVAIL sets in blue A m a + b n a + b { a+b } B { a+b } C p c + d r c + d q a + b r c + d { a+b, c+d } D e b + 18 s a + b u e + f F E e a + 17 t c + d u e + f v a + b w c + d x e + f { a+b, c+d, e+f } { a+b, c+d } G y a + b z c + d COMP 512, Fall 2003 10

Remember the Big Picture The Big Picture 1. block b, compute AVAIL(b) 2. Assign unique global names to expressions in AVAIL(b) 3. block b, value number b starting with AVAIL(b) We’ve done step 1. COMP 512, Fall 2003 11

Global CSE (replacement step) Managing the name space Need a unique name e AVAIL(b) 1. Can generate them as replacements are done (Fortran H) 2. Can compute a static mapping 3. Can encode value numbers into names (Common strategy) (Briggs 94) Strategy 1. Works well, but requires 2 passes (or a lot of walking around IR) 2. Fast, but limits replacement to textually identical expressions 3. Requires more analysis (VN), but yields more CSEs Assume, w. l. o. g. , solution 2 COMP 512, Fall 2003 12

Global CSE (replacement step) Compute a static mapping from expression to name • After analysis & before transformation > b, e AVAIL(b), assign e a global name by hashing on e COMP 512, Fall 2003 * 13

Back to Our Example A Assigning unique names to global CSEs m a + b n a + b { a+b } B a+b t 1 c+d t 2 e+f t 3 { a+b } C p c + d r c + d q a + b r c + d { a+b, c+d } D e b + 18 s a + b u e + f F E e a + 17 t c + d u e + f v a + b w c + d x e + f { a+b, c+d, e+f } { a+b, c+d } G y a + b z c + d COMP 512, Fall 2003 14

Remember the Big Picture The Big Picture 1. block b, compute AVAIL(b) 2. Assign unique global names to expressions in AVAIL(b) 3. block b, value number b starting with AVAIL(b) We’ve done steps 1 & 2. COMP 512, Fall 2003 15

Global CSE (replacement step) Compute a static mapping from expression to name • After analysis & before transformation > b, e AVAIL(b), assign e a global name by hashing on e • During transformation step Evaluation of e insert copy name(e) e > Reference to e replace e with name(e) > Common strategy: The major problem with this approach • Insert copies that might be useful • Let DCE sort them out • Inserts extraneous copies Simplifies design & implementation At all definitions and uses of any e AVAIL(b), b > Those extra copies are dead and easy to remove > The useful ones often coalesce away > COMP 512, Fall 2003 * 16

An Aside on Dead Code Elimination What does “dead” mean? • Useless code — result is never used • Unreachable code — code that cannot execute • Both are lumped together as “dead” To perform DCE • Must have a global mechanism to recognize usefulness • Must have a global mechanism to eliminate unneeded stores • Must have a global mechanism to simplify control-flow predicates All of these will come later in the course COMP 512, Fall 2003 17

Value Numbering To perform replacement, we can value numbering each block b • • Initialize hash table with AVAIL(b) Replace an expression in AVAIL(b) means copy from its name At each evaluation of a global name, copy new value to its name Otherwise, value number as in last two lectures Net Result • Catches local redundancies with value numbering • Catches nonlocal redundancies because of AVAIL sets • Not quite same effect, but close Local redundancies found by value > Global redundancies found by spelling > COMP 512, Fall 2003 18

Back to Our Example A B m a + b t 1 m n t 1 After replacement & local value numbering C p c + d t 2 p r t 2 D e s u t 3 b + 18 t 1 e + f u F G q t 1 r c + d t 2 r E e t u t 3 a + 17 t 2 e + f u v t 1 w t 2 x t 3 y t 1 z t 2 COMP 512, Fall 2003 19

Back to Our Example A B m a + b t 1 m n t 1 m m In practice, most of these copies will be folded into subsequent uses… C p c + d t 2 p r t 2 p D e s u t 3 b + 18 t 1 e + f u F G We leave copy folding to another pass where it can be done with appropriate tools (interference graph) q t 1 r c + d t 2 r E e t u t 3 a + 17 t 2 e + f u v t 1 w t 2 r x t 3 u y t 1 m z t 2 r COMP 512, Fall 2003 20

Some Copies Serve a Critical Purpose In the example, all the copies coalesce away. Sometimes, the copies are needed. w a+b x a+b y a+b w a+b t 1 w x a+b t 1 x y t 1 Cannot write “w or x” • Copies into t 1 create a common name along two paths • Makes the replacement possible > Later uses of w or x may preclude their sharing storage COMP 512, Fall 2003 21

Back to Our Example A B m a + b n a + b p c + d r c + d LVN C LVN q a + b r c + d D e b + 18 s a + b u e + f F G y a + b z c + d COMP 512, Fall 2003 GRE, SVN N. B. : SVN subsumes LVN DVN subsumes SVN GRE & x. VN are not directly comparable E e a + 17 GRE, SVN v a + b w c + d x e + f t c + d u e + f GRE, SVN GRE, DVN GRE Example does not highlight value identity versus lexical identity 22

Next Class Interative data-flow analysis: • • Does it halt? Does it produce the desired answer? How fast does it converge? Implementation strategies COMP 512, Fall 2003 23

And that’s the end of my story …. Extra Slides COMP 512, Fall 2003 24

Data-flow Analysis Definition Data-flow analysis is a collection of techniques for compile-time reasoning about the run-time flow of values • Almost always involves building a graph Problems are trivial on a basic block > Global problems control-flow graph (or derivative) > Whole program problems call graph (or derivative) > • Usually formulated as a set of simultaneous equations Sets attached to nodes and edges > Lattice (or semilattice) to describe values > We solved AVAIL with an iterative fixed-point algorithm > • Desired result is usually meet over all paths solution “What is true on every path from the entry? ” > “Can this happen on any path from the entry? ” > Related to the safety of optimization > COMP 512, Fall 2003 25

Data-flow Analysis Limitations 1. Precision – “up to symbolic execution” > Assume all paths are taken 2. Solution – cannot afford to compute MOP solution > Large class of problems where MOP = MFP= LFP > Not all problems of interest are in this class 3. Arrays – treated naively in classical analysis > Represent whole array with a single fact 4. Pointers – difficult (and expensive) to analyze > Imprecision rapidly adds up > Need to ask the right questions Good news: Simple problems can carry us pretty far Summary For scalar values, we can quickly solve simple problems COMP 512, Fall 2003 * 26

Data-flow Analysis Semilattice A semilattice is a set L and a meet operation such that, a, b, & c L : 1. a a = a 2. a b = b a 3. a (b c) = (a b) c imposes an order on L, a, b, & c L : 1. a ≥ b a b = b 2. a > b a ≥ b and a ≠ b A semilattice has a bottom element, denoted 1. a L, a = 2. a L, a ≥ COMP 512, Fall 2003 27

Data-flow Analysis How does this relate to data-flow analysis? • Choose a semilattice to represent the facts • Attach a meaning to each a L Each a L is a distinct set of known facts • With each node n, associate a function fn : L L fn models behavior of code in block corresponding to n • Let F be the set of all functions that the code might generate Example — AVAIL • Semilattice is (2 E, ), where E is the set of all expressions & is > Set are bigger than |variables|, is Ø • For a node n, fn has the form fn(x) = Dn (x Nn) > Where Dn is DEF(n) and Nn is NKILL(n) COMP 512, Fall 2003 28

Concrete Example: Available Expressions A B E = {a+b, c+d, e+f, a+17, b+18} m a + b n a + b 2 E is the set of all subsets of E C p c + d r c + d D e b + 18 s a + b u e + f F G y a + b z c + d COMP 512, Fall 2003 2 E = [ q a + b r c + d E v a + b w c + d x e + f e a + 17 t c + d u e + f {a+b, c+d, e+f, a+17, b+18}, {a+b, c+d, e+f, a+17}, {a+b, c+d, e+f, b+18}, {a+b, c+d, a+17, b+18}, {a+b, e+f, a+17, b+18}, {c+d, e+f, a+17, b+18}, {a+b, c+d, e+f}, {a+b, c+d, b+18}, {a+b, c+d, a+17}, {a+b, e+f, b+18}, {a+b, a+17, b+18}, {c+d, e+f, a+17}, {c+d, e+f, b+18}, {c+d, a+17, b+18}, {e+f, a+17, b+18}, {a+b, c+d}, {a+b, e+f}, {a+b, a+17}, {a+b, b+18}, {c+d, e+f}, {c+d, a+17}, {c+d, b+18}, {e+f, a+17}, {e+f, b+18}, {a+17, b+18}, {a+b}, {c+d}, {e+f}, {a+17}, {b+18}, {} ] 29

Concrete Example: Available Expressions The Lattice {a+b, c+d, e+f, a+17, b+18}, Comparability (transitive) {a+b, c+d, e+f, a+17} {a+b, c+d, e+f, b+18} {a+b, c+d, a+17, b+18} {a+b, e+f, a+17, b+18} {c+d, e+f, a+17, b+18} { a+b, c+d, e+f} {a+b, e+f, b+18} {a+b, a+17, b+18} {a+b, c+d, a+17} {c+d, e+f, b+18} {e+f, a+17, b+18}, {a+b, e+f, a+17} {c+d, a+17, b+18} {a+b, c+d} {a+b, a+17} {c+d, e+f} {c+d, b+18} {e+f, b+18} { a+b, e+f} {a+b, b+18} {c+d, a+17} {e+f, a+17} {a+17, b+18} {a+b} {c+d} {e+f} {a+17} {b+18} meet {} COMP 512, Fall 2003 * 30

Lattice Theory This stuff is somewhat dry Everybody stand up and stretch COMP 512, Fall 2003 31

Data-flow Analysis What does this have to do with the iterative algorithm? Worklist { all blocks, bi } while (Worklist Ø) remove a block bi from Worklist recompute AVAIL(bi ) as AVAIL(b) = x pred(b) (DEF(x) (AVAIL(x) NKILL(x) )) if AVAIL(bi ) changed then Worklist successors(bi ) We can use a lattice-theoretic formulation to prove • Termination – it halts on an instance of AVAIL • Correctness – it produces the desired result for AVAIL • Complexity – it runs pretty quickly (d(CFG)+3 passes) COMP 512, Fall 2003 32

Data-flow Analysis Termination • If every fn F is monotone, i. e. , f(x y) ≤ f(x) f(y), and • If the lattice is bounded, i. e. , every descending chain is finite Chain is sequence x 1, x 2, …, xn where xi L, 1 ≤ i ≤ n > xi+1, 1 ≤ i < n chain is descending > Then • The iterative algorithm must halt on an instance of the problem • Set at each block can only change a finite number of times • Any finite semilattice is bounded • Some infinite semilattices are bounded COMP 512, Fall 2003 33

Data-flow Analysis Correctness • Does the iterative algorithm compute the desired answer? Admissible Function Spaces 1. f F, x, y L, f (x y) = f (x) f (y) Not distributive answer may not be unique 2. fi F such that x L, fi(x) = x 3. f, g F h F such that h(x ) = f (g(x)) 4. x L, a finite subset H F such that x = f H f ( ) If F meets these four conditions, then the problem (L, F, ) has a unique fixed point solution LFP = MOP order of evaluation does not matter COMP 512, Fall 2003 * 34

Data-flow Analysis Sets stabilize in two passes around a loop Complexity • For a problem with an admissible function space & a bounded semilattice, • If the functions all meet the rapid condition, i. e. , f, g F, x L, f (g( )) ≥ g( ) f (x) x then, a round-robin, reverse-postorder iterative algorithm Each pass does O(E ) will halt in d(G)+3 passes over a graph G meets & O(N ) other operations d(G) is the loop-connectedness of the graph w. r. t a DFST > Maximal number of back edges in an acyclic path > Several studies suggest that, in practice, d(G) is small (<3) > For most CFGs, d(G) is independent of the specific DFST COMP 512, Fall 2003 * 35

Data-flow analysis What does this mean? • Reverse postorder Number the nodes in a postorder traversal > Reverse the order > • Round-robin iterative algorithm Visit all the nodes in a consistent order (RPO) > Do it again until the sets stop changing > So, these conditions are easily met > Admissible framework, rapid function space > Round-robin, reverse-postorder, iterative algorithm The analysis runs in (effectively) linear time COMP 512, Fall 2003 36

Data-flow Analysis How do we use these results? · Prove that data-flow framework is admissible & rapid Its just algebra > Most (but not all) global data-flow problems are rapid > This is a property of F > · Code up the iterative algorithm World’s simplest data-flow algorithm > Other versions (worklist) have similar behavior > This lets us ignore most of the other data-flow algorithms in 512 COMP 512, Fall 2003 37

EXTRA SLIDES START HERE COMP 512, Fall 2003 38

Global CSE (replacement step) Managing the name space Need a unique name e AVAIL(b) 1. Can generate them as replacements are done (Fortran H) 2. Can compute a static mapping 3. Can encode value numbers into names (Briggs 94) Strategy 1. This works; it is the classic method 2. Fast, but limits replacement to textually identical expressions 3. Requires more analysis (VN), but yields more CSEs Assume, w. l. o. g. , solution 2 COMP 512, Fall 2003 39

Computing Available Expressions The Big Picture 1. Build a control-flow graph 2. Gather the initial (local) data — DEF(b) & NKILL(b) 3. Propagate information around the graph, evaluating the equation 4. Post-process the information to make it useful (if needed) All data-flow problems are solved, essentially, this way Next lecture: • Iterative computation of AVAIL information • From Chapter 8 of Ea. C COMP 512, Fall 2003 40

Example A B m a + b n a + b C p c + d r c + d D e b + 18 s a + b u e + f F G q a + b r c + d E e a + 17 t c + d u e + f v a + b w c + d x e + f y a + b z c + d COMP 512, Fall 2003 41

Back to Our Example AVAIL sets in blue A m a + b n a + b LVN { a+b } B p c + d r c + d { a+b } C LVN q a + b r c + d { a+b, c+d } GRE, SVN { a+b, c+d } D e b + 18 s a + b u e + f F E e a + 17 GRE, SVN v a + b w c + d x e + f t c + d u e + f GRE, DVN GRE, SVN { a+b, c+d, e+f } { a+b, c+d } G y a + b z c + d COMP 512, Fall 2003 GRE, DVN GRE Example does not highlight value identity versus lexical identity 42