Dataflow Analysis Classical Analysis for Objectoriented Programs cont
Dataflow Analysis: Classical Analysis for Object-oriented Programs, cont.
Announcements n HW 1, Quiz 1 and 2 graded n HW 2 out n n We’ve added some posts covering some common setup issues Post question on n Setup, please do set this up as soon as possible! Starter code, class analysis framework and worklist algorithm Soot Spring 21 CSCI 4450/6450, A Milanova 2
Outline of Today’s Class n Will go over HW 1 Problems 2 and 3 n Class analysis framework questions? n n Rapid Type Analysis (RTA) The XTA analysis family 0 -CFA Points-to analysis (PTA) Spring 21 CSCI 4450/6450, A Milanova 3
Your Homework n A bunch of flow-insensitive, contextinsensitive analyses for Java n n n RTA, XTA, and optionally other Simple property space Simple transfer functions n n E. g. , in fact, RTA gets rid of most CFG nodes, processes just 2 kinds of nodes Millions of lines of code in seconds Spring 21 CSCI 4450/6450, A Milanova 4
Class Analysis n n Problem statement: What are the classes of objects that a (Java) reference variable may refer to? Applications n Call graph construction n n Nodes are method Edges represent calling relationships Notion of methods reachable from main Virtual call resolution Spring 21 CSCI 4450/6450, A Milanova
Class Hierarchy Analysis (CHA) n In Java, if a reference variable r has type A, r can refer only to objects that are concrete subclasses of A. Denoted by Sub. Types(A) n n n Note: refers to Java subtype, not true subtype Note: Sub. Types(A) notation due to Tip and Palsberg (OOPSLA’ 00) At virtual call site r. m(), we can find what methods may be called based on the hierarchy information Spring 21 CSCI 4450/6450, A Milanova 6
Rapid Type Analysis (RTA) n Due to Bacon and Sweeney n n n David Bacon and Peter Sweeney, “Fast Static Analysis of C++ Virtual Function Calls”, OOPSLA ’ 96 Improves on CHA Expands calls only if it has seen an instantiated object of the appropriate type! Spring 21 CSCI 4450/6450, A Milanova 7
Example A public class A { public static void main() { A a; D d = new D(); E e = new E(); if (…) a = d; else a = e; a. m(); } } public class B extends A { public void foo() { G g = new G(); } Spring } 21 CSCI 4450/6450, A Milanova m() B m() G m() C D m() E main a. m(): A. m B. m C. m G. m RTA starts at main. Records that D and E are instantiated. At call a. m() looks at all CHA targets. Expands only into target C. m()! Never reaches B. foo(), never records G as being instantiated.
RTA R is the set of reachable methods I is the set of instantiated types 1. { main } R // Algo: initialize R with main 2. for each method m R and each new site new C in m {C} I // Algo: add C to I; schedule // “successor” constraints Spring 21 CSCI 4450/6450, A Milanova 9
RTA 3. for each method m R, each virtual call y. n(z) in m, each class C in Sub. Types(Static. Type(y)) and n’, where n’ = resolve(C, n) { n’ } I, R // Algo: add target n’ to R, if not already // there. Schedule “successors” Spring 21 CSCI 4450/6450, A Milanova 10
XTA Analysis Family n Due to Tip and Palsberg n n n Frank Tip and Jens Palsberg, “Scalable Propagation-Based Call Graph Construction Algorithms”, OOPSLA ’ 00 Generalizes RTA Improves on RTA by storing more precise information about flow of class types Spring 21 CSCI 4450/6450, A Milanova 11
XTA R is the set of reachable methods Sm is the set of types that flow to method m Sf is the set of types that flow to field f 1. { main } R 2. for each method m R and each new site new C in m {C} Sm 12
XTA 3. for each method m R, each virtual call y. n(z) in m, each class C in Sub. Types(Static. Type(y)) Sm and n’, where n’ = resolve(C, n) { n’ } R // add n’ to R if not already there { C } Sn’ // add C to Sn’ if not already there Sm Sub. Types(Static. Type(p)) Sn’ Sub. Types(Static. Type(ret)) Sm (p denotes the parameter of n’, and ret denotes the return of n’) 13
XTA 4. for each method m R, each field read x = y. f in m Sf Sm 5. for each method m R, each field write x. f = y in m Sm Sub. Types(Static. Type(f)) Spring 21 CSCI 4450/6450, A Milanova Sf 14
Practical Concerns n n Multiple parameters Direct calls n n either static invoke calls or special invoke calls n Array reads and writes! Static fields n See Tip and Palsberg for more n Spring 21 CSCI 4450/6450, A Milanova 15
Example: RTA vs. XTA public class A { public static void main() { n 1(); n 2(); } static void n 1() { A a 1 = new B(); a 1. m(); } static void n 2() { A a 2 = new C(); a 2. m(); } Spring 21 CSCI 4450/6450, A Milanova } A m() B m() G m() C D m() E 16
Boolean Expression Hierarchy: RTA vs. XTA vs. “Ground Truth” public class And. Exp extends Bool. Exp { private Bool. Exp left; private Bool. Exp right; public And. Exp(Bool. Exp left, Bool. Exp right) { this. left = left; this. right = right; } public boolean evaluate(Context c) { private Bool. Exp l = this. left; private Bool. Exp r = this. right; return l. evaluate(c) && r. evaluate(c); } Spring 21 CSCI 4450/6450, A Milanova } 17
Boolean Expression Hierarchy: RTA vs. XTA vs. “Ground Truth” public class Or. Exp extends Bool. Exp { private Bool. Exp left; private Bool. Exp right; public Or. Exp(Bool. Exp left, Bool. Exp right) { this. left = left; this. right = right; } public boolean evaluate(Context c) { private Bool. Exp l = this. left; private Bool. Exp r = this. right; return l. evaluate(c) || r. evaluate(c); } } Spring 21 CSCI 4450/6450, A Milanova 18
Boolean Expression Hierarchy: RTA vs. XTA vs. “Ground Truth” main() { Context the. Context = new Context(); Bool. Exp x = new Var. Exp(“X”); Bool. Exp y = new Var. Exp(“Y”); Bool. Exp exp = new And. Exp( new Constant(true), new Or. Exp(x, y) ); the. Context. assign(x, true); the. Context. assign(y, false); boolean result = exp. evaluate(the. Context); } Spring 21 CSCI 4450/6450, A Milanova 19
Outline of Today’s Class n Will go over HW 1 Problems 2 and 3 n Class analysis framework questions? n n Rapid Type Analysis (RTA) The XTA analysis family 0 -CFA Points-to analysis (PTA) Spring 21 CSCI 4450/6450, A Milanova 20
0 -CFA n n Described in Tip and Palsbserg’s paper 0 -CFA stands for 0 -level Control Flow Analysis, where “ 0 -level” stands for context-insensitive analysis n n Will see 1 -CFA, 2 -CFA, … k-CFA next time Improves on XTA by storing even more information about flow of class types Spring 21 CSCI 4450/6450, A Milanova 21
0 -CFA R is the set of reachable methods Sv is the set of types that flow to variable v Sf is the set of types that flow to field f 1. { main } R 2. for each method m R and each new site x = new C in m {C} Sx 22
0 -CFA 3. for each method m R, each virtual call x = y. n(z) in m, each class C in Sy and n’, where n’ = resolve(C, n) { n’ } R { C } Sthis Sz Sub. Types(Static. Type(p)) Sp Sret Sub. Types(Static. Type(x)) Sx (this is the implicit parameter of n’, p is the parameter of n’, and ret is the return of n’) 23
0 -CFA 4. for each method m R, each field read x = y. f in m Sf Sub. Types(Static. Type(x)) Sx 5. for each method m R, each field write x. f = y in m Sy Sub. Types(Static. Type(f)) Spring 21 CSCI 4450/6450, A Milanova Sf 24
0 -CFA 6. for each method m R, each assignment x = y in m Sy Sub. Types(Static. Type(x)) Spring 21 CSCI 4450/6450, A Milanova Sx 25
Example: XTA vs. 0 -CFA public class A { public static void main() { A a 1 = new B(); a 1. m(); A a 2 = new C(); a 2. m(); A m() B m() G m() C D m() E } } Spring 21 CSCI 4450/6450, A Milanova 26
Boolean Expression Hierarchy: XTA vs. 0 -CFA public class And. Exp extends Bool. Exp { private Bool. Exp left; private Bool. Exp right; public And. Exp(Bool. Exp left, Bool. Exp right) { this. left = left; this. right = right; } public boolean evaluate(Context c) { private Bool. Exp l = this. left; private Bool. Exp r = this. right; return l. evaluate(c) && r. evaluate(c); } Spring 21 CSCI 4450/6450, A Milanova } 27
Boolean Expression Hierarchy: XTA vs. 0 -CFA public class Or. Exp extends Bool. Exp { private Bool. Exp left; private Bool. Exp right; public Or. Exp(Bool. Exp left, Bool. Exp right) { this. left = left; this. right = right; } public boolean evaluate(Context c) { private Bool. Exp l = this. left; private Bool. Exp r = this. right; return l. evaluate(c) || r. evaluate(c); } } Spring 21 CSCI 4450/6450, A Milanova 28
Boolean Expression Hierarchy: XTA vs. 0 -CFA main() { Context the. Context = new Context(); Bool. Exp x = new Var. Exp(“X”); Bool. Exp y = new Var. Exp(“Y”); Bool. Exp exp = new And. Exp( new Constant(true), new Or. Exp(x, y) ); the. Context. assign(x, true); the. Context. assign(y, false); boolean result = exp. evaluate(the. Context); } Spring 21 CSCI 4450/6450, A Milanova 29
Outline of Today’s Class n Will go over HW 1 Problems 2 and 3 n Class analysis framework questions? n n Rapid Type Analysis (RTA) The XTA analysis family 0 -CFA Points-to analysis (PTA) Spring 21 CSCI 4450/6450, A Milanova 30
Andersen’s Points-to Analysis n Commonly attributed to Lars Andersen [1994] n n n “Andersen’s points-to analysis for C” More approximation than our earlier formulation: don’t ever ‘’kill’’; maintain a single points-to graph for all program points Flow-insensitive, context-insensitive analysis Formulated in terms of subset constraints Solvable by the worklist algorithm 31
Andersen’s Points-to Analysis pts(p) denotes the points-to set of p (1) p = &a (2) p = q (3) p = *q (4) *p = q { a } pts(p) pts(q) pts(p) for each x in pts(q). pts(x) for each x in pts(p). pts(q) pts(p) pts(x) Use worklist-like algorithm to compute least solution of these constraints 32
Andersen’s Points-to Analysis: Examples Example 1: p 1 = &a p 2 = p 1 *p 2 = 1 Spring 21 CSCI 4450/6450, A Milanova 33
Andersen’s Points-to Analysis: Examples Example 2: p 3 = &p 1 = &a … q = p 3 r = *q p 1 = &b Spring 21 CSCI 4450/6450, A Milanova 34
Andersen’s Points-to Analysis: Quiz 2 Example a = &x; p = &a if (…) { q = &b; *p = q; } else { q = &c; *p = q; } 35
PTA n n Widely referred to as Andersen’s points-to analysis for Java Improves on 0 -CFA by storing information about objects, not classes n n A a 1 = new A(); // o 1 A a 2 = new A(); // o 2 Spring 21 CSCI 4450/6450, A Milanova 36
PTA R is the set of reachable methods Pt(v) is the set of objects that v may point to Pt(o. f) is the set of objects that field f of object o may point to 1. { main } R 2. for each method m R and each new site i: x = new C in m { oi } Pt(x) // instead of C, we have oi 37
PTA class_of(o) returns the class of object o 3. for each method m R, each virtual call x = y. n(z) in m, each class oi in Pt(y) and n’, where n’ = resolve(class_of(oi), n) { n’ } R { oi } Pt(this) Pt(z) Sub. Types(Static. Type(p)) Pt(p) Pt(ret) Sub. Types(Static. Type(x)) Pt(x) (this is the implicit parameter of n’, p is the parameter of n’, and ret is the return of n’) 38
PTA 4. for each method m R, each field read x = y. f in m for each object o Pt(y) Pt(o. f) Sub. Types(Static. Type(x)) Pt(x) 5. for each method m R, each field write x. f = y in m for each object o Pt(x) Pt(y) Sub. Types(Static. Type(f)) Pt(o. f) 39
0 -CFA 6. for each method m R, each assignment stmt x = y in m Pt(y) Sub. Types(Static. Type(x)) Spring 21 CSCI 4450/6450, A Milanova Pt(x) 40
Example: 0 -CFA vs. PTA public class A { public static void main() { X x 1 = new X(); // o 1 A a 1 = new B(); // o 2 x 1. f = a 1; // o 1. f points to o 2 A a 2 = x 1. f; // a 2 points to o 2 a 2. m(); A m() B m() G m() C D m() E X x 2 = new X(); // o 3 A a 3 = new C(); // o 4 x 2. f = a 3; // o 3. f points to o 4 A a 4 = x 2. f; // a 4 points to o 4 a 4. m(); } Spring 21 CSCI 4450/6450, A Milanova 41
Spring 21 CSCI 4450/6450, A Milanova 42
The Big Picture n n All fit into our monotone dataflow framework! Flow-insensitive, context-insensitive n n Least solution of S = fj(S) V S Algorithms differ mainly in “size” of S n n RTA: only 2 kinds of statements; Lattice? XTA: expands to all statements; Lattice? 0 -CFA: all statements; Lattice? PTA (Points-to analysis): all statements; Lattice elements are points-to graphs Spring 21 CSCI 4450/6450, A Milanova 43
The Big Picture RTA: I Types: ABCD 0 -CFA: v 1, v 2, … XTA: Sm 1 Sm 2 … Smk Sf 1 … Sfk A B C D… vn A B C D… Spring 21 CSCI 4450/6450, A Milanova PTA: v 1, v 2, … vn o 1: A o 2: A o 3: B o 4: B o 5: C o 6: D … 44
The Big Picture n n All fit into monotone dataflow framework Flow-insensitive, context-insensitive n n Least solution of S = fj(S) V S Algorithms differ mainly in “size” of S n n RTA: only 2 kinds of statements; Lattice? XTA: expands to all statements; Lattice? 0 -CFA: all statements; Lattice? PTA (Points-to analysis): all statements; Lattice elements are points-to graphs Spring 21 CSCI 4450/6450, A Milanova 45
The Big Picture RTA: I Types: ABCD 0 -CFA: v 1, v 2, … XTA: Sm 1 Sm 2 … Smk Sf 1 … Sfk A B C D… vn A B C D… Spring 21 CSCI 4450/6450, A Milanova PTA: v 1, v 2, … vn o 1: A o 2: A o 3: B o 4: B o 5: C o 6: D … 46
- Slides: 46