CO 444 H Ben Livshits Course Staff Administrivia
CO 444 H Ben Livshits Course Staff Administrivia Overview of the Material
We are in the Idealized World of CFGs t = x+y a = t t = x+y b = t c = t b = t t = x+y b = t c = t 2
Data Flow Equations 3
Some Static Analysis Goals • Compute facts about values in the program • For example • • • What can values can integer x have? What locations can pointer p point to? Can double y be negative? Can it assume value 17? etc. • This is static reasoning – we are approximating runtime execution here 4
Definitions • We will start this discussion by talking about definitions… • A basic block can generate a definition • A basic block can either • Kill a definition of x if it surely redefines x • Transmit a definition if it may not redefine the same variable(s) as that definition 5
IN and OUT The following sets are defined: • IN(B) = set of definitions reaching the beginning of block B • OUT(B) = set of definitions reaching the end of B 6
Equations Two kinds of equations: • Confluence equations: IN(B) in terms of OUTs of predecessors of B • Transfer equations: OUT(B) in terms of IN(B) and what goes on in block B 7
Confluence Equations IN(B) = ∪predecessors P of B OUT(P) P 1 P 2 {d 1, d 2} {d 2, d 3} {d 1, d 2, d 3} B 8
Transfer Equations • Generate a definition in the block if its variable is not definitely rewritten later in the basic block • Kill a definition if its variable is definitely rewritten in the block • An internal definition may be both killed and generated 9
Example: Gen and Kill IN = {d 2(x), d 3(y), d 3(z), d 5(y), d 6(y), d 7(z)} Kill includes {d 1(x), d 2(x), d 3(y), d 5(y), d 6(y), …} Gen = {d 2(x), d 3(z), …, d 4(y)} d 1: d 2: d 3: d 4: y = 3 x = y+z *p = 10 y = 5 OUT = {d 2(x), d 3(z), …, d 4(y), d 7(z)} 10
Transfer Function for a Block Connecting IN and OUT sets… For any block B: OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B) 11
Iterative Solution to Equations • For an n-block flow graph, there are 2 n equations and 2 n unknowns. • Alas, the solution is not unique. • Standard theory assumes a field of constants; sets are not a field. • Use iterative solution to get the least fixedpoint. • Identifies any def that might reach a point. 12
Iterative Solution --- (2) IN(entry) = ∅; for each block B do OUT(B)= ∅; while (changes occur) do for each block B do { IN(B) = ∪predecessors P of B OUT(P); OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B); } 13
Example: Reaching Definitions B 1 B 2 B 3 d 1: x = 5 if x == 10 d 2: x = 15 IN(B 1) = {} OUT(B 1) = { d 1} IN(B 2) = {d 1, d 2} OUT(B 2) = { d 1, d 2} IN(B 3) = {d 1, d 2} OUT(B 3) = { d 2} 14
Aside: Notice the Conservatism • Not only the most conservative assumption about when a def is KILLed or GEN’d • Also the conservative assumption that any path in the flow graph can actually be taken • Fine, as long as the optimization is triggered by limitations on the set of RD’s, not by the assumption that a def does not reach 15
Another Data-Flow Problem: Available Expressions • An expression x+y is available at a point if no matter what path has been taken to that point from the entry, x+y has been evaluated, and neither x nor y have even possibly been redefined. • Useful for global common-subexpression elimination. 16
Equations for AE • The equations for AE are essentially the same as for RD, with one exception • Confluence of paths involves intersection of sets of expressions rather than union of sets of definitions 17
Defining GEN(B) and KILL(B) • An expression x+y is generated if it is computed in B, and afterwards there is no possibility that either x or y is redefined • An expression x+y is killed if it is not generated in B and either x or y is possibly redefined 18
Example of GEN and KILL Kills x+y, w*x, etc. Kills z-w, x+z, etc. x = x+y z = a+b Generates a+b 19
Transfer Equations • Transfer is the same idea: OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B) 20
Confluence Equations • Confluence involves intersection, because an expression is available coming into a block if and only if it is available coming out of each predecessor IN(B) = ∩predecessors P of B OUT(P) 21
Iterative Solution IN(entry) = ∅; for each block B do OUT(B)= ALL; while (changes occur) do for each block B do { IN(B) = ∩predecessors P of B OUT(P); OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B); } 22
Why It Works • An expression x+y is unavailable at point p iff there is a path from the entry to p that either: 1. Never evaluates x+y, or 2. Kills x+y after its last evaluation. • IN(entry) = ∅ takes care of (1). • OUT(B) = ALL, plus intersection during iteration handles (2). 23
Example Entry x+y killed x+y never gen’d point p 24
Subtle Point • It is conservative to assume an expression isn’t available, even if it is. • But we don’t have to be “insanely conservative. ” • If after considering all paths, and assuming x+y killed by any possibility of redefinition, we still can’t find a path explaining its unavailability, then x+y is available 25
Live Variable Analysis • Variable x is live at a point p if on some path from p, x is used before it is redefined. • Useful in code generation: if x is not live on exit from a block, there is no need to copy x from a register to memory. 26
Equations for Live Variables • LV is essentially a “backwards” version of RD. • In place of Gen(B): Use(B) = set of variables x possibly used in B prior to any certain definition of x. • In place of Kill(B): Def(B) = set of variables x certainly defined before any possible use of x. 27
Transfer Equations • Transfer equations give IN’s in terms of OUT’s: IN(B) = (OUT(B) – Def(B)) ∪ Use(B) 28
Confluence Equations • Confluence involves union over successors, so a variable is in OUT(B) if it is live on entry to any of B’s successors. OUT(B) = ∪successors S of B IN(S) 29
Iterative Solution OUT(exit) = ∅; for each block B do IN(B)= ∅; while (changes occur) do for each block B do { OUT(B) = ∪successors S of B IN(S); IN(B) = (OUT(B) – Def(B)) ∪ Use(B); } 30
Data-Flow Frameworks Lattice-Theoretic Formulation Meet-Over-Paths Solution Monotonicity/Distributivity 31
Data-Flow Analysis Frameworks • Generalizes and unifies each of the DFA examples from previous lecture. • Important components: Direction D forward or backward Domain V (possible values for IN, OUT) Meet operator ∧ (effect of path confluence) Transfer functions F (effect of passing through a basic block) 32
Gary Kildall • This theory was thesis at U. Wash. of Gary Kildall. • Gary is better known for CP/M, the first real PC operating system. • There is an interesting story. • Google query: kildall cpm • www. freeenterpriseland. com/BOOK /KILDALL. html 33
Semilattices • V and ∧ form a semilattice if for all x, y, and z in V: 1. 2. 3. 4. 5. x ∧ x = x (idempotence ). x ∧ y = y ∧ x (commutativity ). x ∧ (y ∧ z) = (x ∧ y) ∧ z (associativity ). Top element ⊤ such that for all x, ⊤∧ x = x. Bottom element (optional) ⊥ such that for all x, ⊥ ∧ x = ⊥. 34
Example: Semilattice • V = power set of some set. • ∧ = union. • Union is idempotent, commutative, and associative. • What are the top and bottom elements? 35
Partial Order for a Semilattice • • • Say x ≤ y iff x ∧ y = x. Also, x < y iff x ≤ y and x ≠ y. ≤ is really a partial order: 1. x ≤ y and y ≤ z imply x ≤ z (proof in text). 2. x ≤ y and y ≤ x iff x = y. Proof: x ∧ y = x and y ∧ x = y. Thus, x = x ∧ y = y ∧ x = y. 36
Axioms for Transfer Functions 1. F includes the identity function. • Why needed? Constructions often require introduction of an empty block. 2. F is closed under composition. • Why needed? • • The concatenation of two blocks is a block. Transfer function for a block can be constructed from individual statements. 37
Good News! • The problems from the last lecture fit the model. • RD’s: Forward, meet = union, transfer functions based on Gen and Kill. • AE’s: Forward, meet = intersection, transfer functions based on Gen and Kill. • LV’s: Backward, meet = union, transfer functions based on Use and Def. 38
Example: Reaching Definitions • Direction D = forward. • Domain V = set of all sets of definitions in the flow graph. • ∧ = union. • Functions F = all “gen-kill” functions of the form f(x) = (x - K) ∪ G, where K and G are sets of definitions (members of V). 39
Example: Satisfies Axioms • Union on a power set forms a semilattice (idempotent, commutative, associative). • Identity function: let K = G = ∅. • Composition: A little algebra. 40
Example: Partial Order • For RD’s, S ≤ T means S ∪ T = S. • Equivalently S ⊇ T. • Seems “backward, ” but that’s what the definitions give you. • Intuition: ≤ measures “ignorance. ” • The more definitions we know about, the less ignorance we have. • ⊤ = “total ignorance. ” 41
DFA Frameworks • (D, V, ∧, F). • A flow graph, with an associated function f. B in F for each block B. • A boundary value v. ENTRY or v. EXIT if D = forward or backward, respectively. 42
Iterative Algorithm (Forward) OUT[entry] = v. ENTRY; for (other blocks B) OUT[B] = ⊤; while (changes to any OUT) for (each block B) { IN(B) = ∧ predecessors P of B OUT(P); OUT(B) = f. B(IN(B)); } 43
Iterative Algorithm (Backward) • Same thing --- just: 1. Swap IN and OUT everywhere. 2. Replace entry by exit. 44
What Does the Iterative Algorithm Do? • MFP (maximal fixedpoint ) = result of iterative algorithm. • MOP = meet over all paths from entry to a given point, of the transfer function along that path applied to v. ENTRY. • IDEAL = ideal solution = meet over all executable paths from entry to a point. 45
Transfer Function of a Path f 1 f 2 . . . fn-1 B fn-1(. . . f 2(f 1(v. ENTRY)). . . ) 46
Maximum Fixedpoint • Fixedpoint = solution to the equations used in iteration: IN(B) = ∧ predecessors P of B OUT(P); OUT(B) = f. B(IN(B)); • Maximum = any other solution is ≤ the result of the iterative algorithm (MFP). 47
MOP and IDEAL • All solutions are really meets of the result of starting with v. ENTRY and following some set of paths to the point in question. • If we don’t include at least the IDEAL paths, we have an error. • But try not to include too many more. • Less “ignorance, ” but we “know too much. ” 48
MOP Versus IDEAL --- (1) • At each block B, MOP[B] ≤ IDEAL[B]. • I. e. , the meet over many paths is ≤ the meet over a subset. • Example: x ∧ y ∧ z ≤ x ∧ y because x∧ y∧ z∧ x ∧ y = x ∧ y ∧ z. • Intuition: Anything not ≤ IDEAL is not safe, because there is some executable path whose effect is not accounted for. 49
MOP Versus IDEAL --- (2) • Conversely: any solution that is ≤ IDEAL accounts for all executable paths (and maybe more paths), and is therefore conservative (safe), even if not accurate. 50
MFP Versus MOP --- (1) • • • Is MFP ≤ MOP? If so, then since MOP ≤ IDEAL, we have MFP ≤ IDEAL, and therefore MFP is safe. Yes, but … requires two assumptions about the framework: 1. “Monotonicity. ” 2. Finite height (no infinite chains x). . < x 2 < x 1 < 51
MFP Versus MOP --- (2) • Intuition: If we computed the MOP directly, we would compose functions along all paths, then take a big meet. • But the MFP (iterative algorithm) alternates compositions and meets arbitrarily. 52
Monotonicity • A framework is monotone if the functions respect ≤. That is: • If x ≤ y, then f(x) ≤ f(y). • Equivalently: f(x ∧ y) ≤ f(x) ∧ f(y). • Intuition: it is conservative to take a meet before completing the composition of functions. 53
Good News! • The frameworks we’ve studied so far are all monotone. • Easy proof for functions in Gen-Kill form. • And they have finite height. • Only a finite number of defs, variables, etc. in any program. 54
Two Paths to B That Meet Early In MFP, Values x and y get combined too soon. f(x) OUT = x f ENTRY MOP considers paths independently and combines at the last possible moment. OUT = f(x) ∧ f(y) IN = x∧y B OUT = f(x∧y) OUT = y f(y) Since f(x ∧ y) ≤ f(x) ∧ f(y), it is as if we added nonexistent paths. 55
Distributive Frameworks • Strictly stronger than monotonicity is the distributivity condition: f(x ∧ y) = f(x) ∧ f(y) 56
Even More Good News! • All the Gen-Kill frameworks are distributive. • If a framework is distributive, then combining paths early doesn’t hurt. • MOP = MFP. • That is, the iterative algorithm computes a solution that takes into account all and only the physical paths. 57
- Slides: 57