Comp 512 Spring 2011 Building SSA Form I

Comp 512 Spring 2011 Building SSA Form, I Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. COMP 512, Rice University 1

Previous Lectures • Looked at iterative data-flow analysis Round-robin and iterative algorithms > DOM, LIVE, AVAIL, VERYBUSY, CONSTANTS, MOD, REACHES > Looked at DEF-USE chains > • Looked at solving constant propagation two ways Sets in a traditional data-flow framework solved on the CFG > Adding value attributes to names and treating the DEF-USE chains as a sparse evaluation graph > • Insights behind SSA The right name space can simplify optimization (LVN, SVN) > Sparse evaluation graphs can reduce cost of analysis > Today: we can discover the spots where the analysis should compute new information > COMP 512, Rice University 2

Constant Propagation over DEF-USE Chains Birth points x 17 - 4 x a + b x y - z We should be able Value is born here to compute the values 17 - 4 ythat - z we need with fewer meet operations, if only we can find these birth points. Value is born here • 17 Need - 4 toy identify - z 13 birth points • Need to insert some artifact to force x 13 z x * q Value is born 17 - 4 y - z 13 a+b s w - x COMP 512, Rice University • the evaluation to follow the birth points Computes of three values here Enter Static Single Assignment form Computes of four values here 3

Constant Propagation over DEF-USE Chains Making Birth Points Explicit x 0 17 - 4 There are three birth points for x x 5 a + b x 1 y - z x 3 13 z x 4 * q s w - x 6 COMP 512, Rice University * 4

Constant Propagation over DEF-USE Chains Making Birth Points Explicit x 0 17 - 4 Each needs a definition to reconcile the values of x x 5 a + b • Insert a ϕ-function at each birth x 1 y - z x 2 Ø(x 1, x 0) x 3 13 x 4 Ø(x 3, x 2) z x 4 * q point • Rename values so each name is defined once • Now, each use refers to one definition Static Single Assignment Form x 6 Ø(x 5, x 4) s w - x 6 COMP 512, Rice University * 5

Constant Propagation over DEF-USE Chains Making Birth Points Explicit x 0 17 - 4 How DEF-USE do we build chains SSA onform? SSA form Simple • Onealgorithm def per use x 5 a + b x 1 y - z x 2 Ø(x 1, x 0) x 3 13 x 4 Ø(x 3, x 2) z x 4 * q x 6 Ø(x 5, x 4) s w - x 6 COMP 512, Rice University 1. Insert • Meets a ϕoccur at each at øjoin functions point for each name • For CONSTANTS, 3 binary ’s 2. Rename to get single definition & single use This produces • Correct SSA form • More ϕ’s than any other known algorithm for SSA construction The rest is optimization (!) * 6

Building Static Single Assignment Form SSA-form • Each name is defined exactly once • Each use refers to exactly one name A ϕ-function is a special kind of copy that selects one of its parameters. What’s hard The choice of parameter is governed by the CFG edge along which control reached the current block. • Straight-line code is trivial • Splits in the CFG are trivial • Joins in the CFG are hard y 1 . . . y 2 . . . y 3 Ø(y 1, y 2) Building SSA Form Few machines implement a ϕ -function directly in hardware. • Insert ϕ-functions at birth points of values • Rename all values for uniqueness COMP 512, Rice University * 7

SSA Construction Algorithm (High-level sketch) 1. Insert ϕ-functions 2. Rename values … that’s all. . . … of course, there is some bookkeeping to be done. . . COMP 512, Rice University * 8

SSA Construction Algorithm (The simplest algorithm) 1. Insert ϕ-functions at every join for every name 2. Solve reaching definitions 3. Rename each use to the def that reaches it (will be unique) Builds a version of SSA with the maximal number of ϕ- functions What’s wrong with this approach • • Too many ϕ-functions (precision) Too many ϕ-functions (space) Too many ϕ-functions (time) Need to relate edges to ϕ-functions parameters (bookkeeping) To do better, we need a more complex approach COMP 512, Rice University 9

SSA Construction Algorithm (Detailed sketch) 1. Insert ϕ-functions a. ) calculate dominance frontiers Moderately complex b. ) find global names for each name, build a list of blocks that define it c. ) insert ϕ-functions Compute list of blocks where each name is assigned & use as a worklist global name n block b in which n is assigned block d in b’s dominance frontier insert a ϕ-function for n in d Creates the iterated dominance frontier add d to n’s list of defining blocks { This adds to the worklist ! Use a checklist to avoid putting blocks on the worklist twice; keep another checklist to avoid inserting the same ϕ-function * 10 COMP 512, Rice University twice.

SSA Construction Algorithm (Detailed sketch) 2. Rename variables in a pre-order walk over dominator tree (use an array of stacks, one stack per global name) Staring with the root block, b 1 counter per name for subscripts a. ) generate unique names for each ϕ-function and push them on the appropriate stacks b. ) rewrite each operation in the block i. Rewrite uses of global names with the current version (from the stack) ii. Rewrite definition by inventing & pushing new name c. ) fill in ϕ-function parameters of successor blocks d. ) recurse on b’s children in the dominator tree Reset the state e. ) <on exit from block b > pop names generated in b from stacks Need the end-of-block name for this path COMP 512, Rice University * 11

Original idea: R. T. Prosser. “Applications of Boolean matrices to the analysis of flow diagrams, ” Proceedings of the Eastern Joint Computer Conference, Spartan Books, New York, pages 133 -138. SSA Construction Algorithm (Low-level detail) Computing Dominance • First step in ϕ-function insertion computes dominance • Recall that n dominates m iff n is on every path from n 0 to m Every node dominates itself > n’s immediate dominator is its closest dominator, IDOM(n)† > DOM(n 0 ) = { n 0 } Initially, DOM(n) = N, n≠n 0 DOM(n) = { n } ( p preds(n) DOM(p)) Computing DOM • These equations form a rapid data-flow framework • Iterative algorithm will solve them in d(G) + 3 passes Each pass does N unions & E intersections, > E is O(N 2), O(N 2) work > COMP 512, Rice University †IDOM(n ) ≠ n, unless n is n , by convention. 0 12

Example Progress of iterative solution for DOM B 0 B 1 B 2 B 3 B 4 B 5 Results of iterative solution for DOM B 6 B 7 Flow Graph COMP 512, Rice University * 13

Example Progress of iterative solution for DOM B 0 B 1 B 2 B 3 B 4 B 5 Results of iterative solution for DOM B 6 B 7 Dominance Tree There asymptotically faster algorithms. With the right data structures, the iterative algorithm can be made faster. See Cooper, Harvey, & Kennedy, on the web site. COMP 512, Rice University 14

Dominance Frontiers & Inserting ϕ-functions Where does an assignment in block n induce ϕ–functions? • n DOM m ⇒ no need for ϕ–function a in m > Definition in n blocks any previous definition from reaching m • If m has multiple predecessors, and n dominates one of them, but not all of them, then m needs a ϕ–function for each definition in n More formally, m is in the dominance frontier of n if and only if 1. ∃ p ∈ preds(m) such that n ∈ DOM(p), and 2. n does not strictly dominate m (n ∉ DOM(m) – { m }) • This notion of dominance frontier is precisely what we need to insert ϕ– functions: a def in block n induces a ϕ–function in each block in DF(n). COMP 512, Rice University “strict” dominance allows a ϕ–function at the head of a single-block loop. 15

Example Dominance Frontiers & ϕ-Function Insertion B 0 • A definition at n forces a ϕ-function at m iff x B 1ϕ(. . . ) n DOM(m) but n DOM(p) for some p preds(m) • DF(n ) is fringe just beyond region n dominates B 2 B 3 x B 4. . . B 5 x B 6ϕ(. . . ) x B 7ϕ(. . . ) Dominance Frontiers • DF(4) is {6}, so in 4 forces ϕ-function in 6 • in 6 forces ϕ-function in DF(6) = {7} • in 7 forces ϕ-function in DF(7) = {1} • in 1 forces ϕ-function in DF(1) = Ø (halt ) For each assignment, we insert the ϕ-functions COMP 512, Rice University * 16

Example Computing Dominance Frontiers B 0 • Only join points are in DF(n) for some n • Leads to a simple, intuitive algorithm for computing B 1 B 2 B 3 B 4 B 5 dominance frontiers For each join point x (i. e. , |preds(x)| > 1) For each CFG predecessor p of x Run from p to IDOM(x) in the dominator tree, & add x to DF(n) for each n from p up to but not IDOM(x) B 6 B 7 Dominance Frontiers • For some applications, we need post-dominance, the post-dominator tree, and reverse dominance frontiers, RDF(n) > Just dominance on the reverse CFG > Reverse the edges & add unique exit node • We will use these in dead code elimination COMP 512, Rice University * 17

SSA Construction Algorithm ( Reminder) 1. Insert ϕ-functions at every join for every name a. ) calculate dominance frontiers Needs a little more detail b. ) find global names for each name, build a list of blocks that define it c. ) insert ϕ-functions global name n block b in which n is assigned block d in b’s dominance frontier insert a ϕ-function for n in d add d to n’s list of defining blocks COMP 512, Rice University Step 1. b is not in the original set of algorithms It produces an SSA form with fewer ϕ-functions * 18

SSA Construction Algorithm Finding global names Otherwise, cannot • Different between two forms of SSA need a ϕ-function • Minimal uses all names • Semi-pruned uses names that are live on entry to some block Shrinks name space & number of ϕ-functions > Pays for itself in compile-time speed > • For each “global name”, need a list of blocks where it is defined Drives ϕ-function insertion > b defines x implies a ϕ-function for x in every c DF(b) > Pruned SSA adds a test to see if x is live at insertion point Occasionally, building pruned is faster than building semi-pruned. Any algorithm that has non-linear behavior in the number of ϕ-functions will have 512, a size pruned is the SSA flavor of choice. COMP Ricewhere University 19

Excluding local names avoids Ø’s for y&z B 2 B 0 i • • • B 1 a Ø(a, a) b Ø(b, b) c Ø(c, c) d Ø(d, d) i Ø(i, i) a • • • c • • • i > 100 With all the ϕ-functions • Lots of new ops • Renaming is next B 3 b • • • c • • • d • • • B 4 d • • • B 6 B 7 Assume a, b, c, & d defined before B 0 Example a • • • d • • • B 5 c • • • d Ø(d, d) c Ø(c, c) b • • • a Ø(a, a) b Ø(b, b) c Ø(c, c) d Ø(d, d) y a+b z c+d i i+1 i > 100 COMP 512, Rice University 20

SSA Construction Algorithm (Less high-level sketch) 2. Rename variables in a pre-order walk over dominator tree (use an array of stacks, one stack per global name) Staring with the root block, b 1 counter per name for subscripts a. ) generate unique names for each ϕ-function and push them on the appropriate stacks b. ) rewrite each operation in the block i. Rewrite uses of global names with the current version (from the stack) ii. Rewrite definition by inventing & pushing new name c. ) fill in ϕ-function parameters of successor blocks d. ) recurse on b’s children in the dominator tree Reset the state e. ) <on exit from block b > pop names generated in b from stacks Need the end-of-block name for this path COMP 512, Rice University 21