Program Analysis via Graph Reachability Thomas Reps University
Program Analysis via Graph Reachability Thomas Reps University of Wisconsin http: //www. cs. wisc. edu/~reps/ See http: //www. cs. wisc. edu/wpis/papers/tr 1386. ps
PLDI 00 Registration Form • PLDI 00: …………. . $ ____ • Tutorial (morning): …………… $ ____ • Tutorial (afternoon): …………. . $ ____ • Tutorial (evening): ……………. $ – 0 –
1987 1993 1994 1995 1996 1997 1998 Slicing & Applications Dataflow Analysis Demand CFL Algorithms Reachability Structure. Transmitted Dependences Set Constraints
Applications • Program optimization • Software engineering – Program understanding – Reengineering – Static bug-finding • Security (information flow)
Collaborators • • Susan Horwitz Mooly Sagiv Genevieve Rosay David Melski • David Binkley • Michael Benedikt • Patrice Godefroid
Themes • Harnessing CFL-reachability • Exhaustive alg. Demand alg.
Backward Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%dn”, sum); printf(“%dn”, i); } Backward slice with respect to “printf(“%dn”, i)”
Backward Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%dn”, sum); printf(“%dn”, i); } Backward slice with respect to “printf(“%dn”, i)”
Slice Extraction int main() { int i = 1; while (i < 11) { i = i + 1; } printf(“%dn”, i); } Backward slice with respect to “printf(“%dn”, i)”
Forward Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%dn”, sum); printf(“%dn”, i); } Forward slice with respect to “sum = 0”
Forward Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%dn”, sum); printf(“%dn”, i); } Forward slice with respect to “sum = 0”
What Are Slices Useful For? • Understanding Programs – What is affected by what? • Restructuring Programs – Isolation of separate “computational threads” • Program Specialization and Reuse – Slices = specialized programs – Only reuse needed slices • Program Differencing – Compare slices to identify changes • Testing – What new test cases would improve coverage? – What regression tests must be rerun after a change?
Line-Character-Count Program void line_char_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line(FILE *f, BOOL *bptr, scan_line(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; scan_line(f, &eof_flag, &n); chars = chars + n; } printf(“lines = %dn”, lines); printf(“chars = %dn”, chars); } int *iptr);
Character-Count Program void char_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line(FILE *f, BOOL *bptr, scan_line(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; scan_line(f, &eof_flag, &n); chars = chars + n; } printf(“lines = %dn”, lines); printf(“chars = %dn”, chars); } int *iptr);
Line-Character-Count Program void line_char_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line(FILE *f, BOOL *bptr, scan_line(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; scan_line(f, &eof_flag, &n); chars = chars + n; } printf(“lines = %dn”, lines); printf(“chars = %dn”, chars); } int *iptr);
Line-Count Program void line_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line 2(FILE *f, BOOL *bptr, scan_line 2(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; scan_line 2(f, &eof_flag, &n); chars = chars + n; } printf(“lines = %dn”, lines); printf(“chars = %dn”, chars); } int *iptr);
Specialization Via Slicing wc -lc wc -l
Control Flow Graph int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%dn”, sum); printf(“%dn”, i); } Enter F sum = 0 i = 1 while(i < 11) T sum = sum + i i = i + i printf(sum) printf(i)
Control Dependence Graph int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%dn”, sum); printf(“%dn”, i); } T sum = 0 T i = 1 T sum = sum + i Control dependence p q q is reached from p T if condition p is true (T), not otherwise. p F Similar for false (F). q Enter T T while(i < 11) T i = i + i T T printf(sum) printf(i)
Flow Dependence Graph int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%dn”, sum); printf(“%dn”, i); } sum = 0 i = 1 sum = sum + i Flow dependence p q Value of variable assigned at p may be used at q. Enter while(i < 11) i = i + i printf(sum) printf(i)
Program Dependence Graph (PDG) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%dn”, sum); printf(“%dn”, i); } T sum = 0 T i = 1 T sum = sum + i Control dependence Flow dependence Enter T T while(i < 11) T i = i + i T T printf(sum) printf(i)
Program Dependence Graph (PDG) int main() { int i = 1; int sum = 0; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%dn”, sum); printf(“%dn”, i); } T sum = 0 T i = 1 T sum = sum + i Opposite Order Same PDG Enter T T while(i < 11) T i = i + i T T printf(sum) printf(i)
Backward Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%dn”, sum); printf(“%dn”, i); } T sum = 0 T i = 1 T sum = sum + i Enter T T while(i < 11) T i = i + i T T printf(sum) printf(i)
Backward Slice (2) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%dn”, sum); printf(“%dn”, i); } T sum = 0 T i = 1 T sum = sum + i Enter T T while(i < 11) T i = i + i T T printf(sum) printf(i)
Backward Slice (3) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%dn”, sum); printf(“%dn”, i); } T sum = 0 T i = 1 T sum = sum + i Enter T T while(i < 11) T i = i + i T T printf(sum) printf(i)
Backward Slice (4) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%dn”, sum); printf(“%dn”, i); } T sum = 0 T i = 1 T sum = sum + i Enter T T while(i < 11) T i = i + i T T printf(sum) printf(i)
Slice Extraction int main() { int i = 1; while (i < 11) { i = i + 1; } printf(“%dn”, i); } T i = 1 Enter T T while(i < 11) T i = i + i T printf(i)
Code. Surfer
Browsing a Dependence Graph Pretend this is your favorite browser What does clicking on a link do? Or you move to an internal tag You get a new page
Interprocedural Slice int main() { int add(int x, int y) { int sum = 0; return x + y; int i = 1; } while (i < 11) { sum = add(sum, i); i = add(i, 1); } printf(“%dn”, sum); printf(“%dn”, i); } Backward slice with respect to “printf(“%dn”, i)”
Interprocedural Slice int main() { int add(int x, int y) { int sum = 0; return x + y; int i = 1; } while (i < 11) { sum = add(sum, i); i = add(i, 1); } printf(“%dn”, sum); printf(“%dn”, i); } Backward slice with respect to “printf(“%dn”, i)”
Interprocedural Slice int main() { int add(int x, int y) { int sum = 0; return x + y; int i = 1; } while (i < 11) { sum = add(sum, i); i = add(i, 1); } printf(“%dn”, sum); printf(“%dn”, i); } Superfluous components included by Weiser’s slicing algorithm [TSE 84] Left out by algorithm of Horwitz, Reps, & Binkley [PLDI 88; TOPLAS 90]
System Dependence Graph (SDG) Enter main Call p Enter p
SDG for the Sum Program Enter main sum = 0 i = 1 while(i < 11) printf(sum) Call add xin = sum yin = i sum = xout xin = i yin= 1 Enter add x = xin printf(i) y = yin x=x+y xout = x i = xout
Interprocedural Backward Slice Enter main Call p Enter p
Interprocedural Backward Slice (2) Enter main Call p Enter p
Interprocedural Backward Slice (3) Enter main Call p Enter p
Interprocedural Backward Slice (4) Enter main Call p Enter p
Interprocedural Backward Slice (5) Enter main Call p Enter p
Interprocedural Backward Slice (6) Enter main Call p ) ( [ ] Enter p
Matched-Parenthesis Path ( ) [ )
Interprocedural Backward Slice (6) Enter main Call p Enter p
Interprocedural Backward Slice (7) Enter main Call p Enter p
Slice Extraction Enter main Call p Enter p
Slice of the Sum Program Enter main i = 1 while(i < 11) printf(i) Call add xin = i yin= 1 Enter add x = xin y = yin x=x+y xout = x i = xout
CFL-Reachability [Yannakakis 90] • G: Graph (N nodes, E edges) • L: A context-free language • L-path from s to t iff • Running time: O(N 3)
Interprocedural Slicing via CFL-Reachability • Graph: System dependence graph • L: L(matched) [roughly] • Node m is in the slice w. r. t. n iff there is an L(matched)-path from m to n
Asymptotic Running Time [Reps, Horwitz, Sagiv, & Rosay 94] • CFL-reachability – System dependence graph: N nodes, E edges – Running time: O(N 3) • System dependence graph Special structure Running time: O(E + Call. Sites % Max. Params 3)
matched | | ( e [ e ] ] e ) ] e [ s ( e [ matched ] ( matched ) matched e e [ e ] ] Ordinary CFL-Reachability Graph Reachability e ) t
Regular-Language Reachability [Yannakakis 90] • G: Graph (N nodes, E edges) • L: A regular language • L-path from s to t iff • Running time: O(N+E) vs. O(N 3) • Ordinary reachability (= transitive closure) – Label each edge with e – L is e*
CFL-Reachability via Dynamic Programming Graph B Grammar C A A B C
Degenerate Case: CFL-Recognition exp id | exp + exp | exp * exp | ( exp ) “(a + b) * c” L(exp) ? ( s a + b ) * c t
Degenerate Case: CFL-Recognition exp id | exp + exp | exp * exp | ( exp ) “a + b) * c +” L(exp) ? a s + b ) * c + t
Program Chopping Given source S and target T, what program points transmit effects from S to T? S T Intersect forward slice from S with backward slice from T, right?
Non-Transitivity and Slicing int main() { int add(int x, int y) { int sum = 0; return x + y; int i = 1; } while (i < 11) { sum = add(sum, i); i = add(i, 1); } printf(“%dn”, sum); printf(“%dn”, i); } Forward slice with respect to “sum = 0”
Non-Transitivity and Slicing int main() { int add(int x, int y) { int sum = 0; return x + y; int i = 1; } while (i < 11) { sum = add(sum, i); i = add(i, 1); } printf(“%dn”, sum); printf(“%dn”, i); } Forward slice with respect to “sum = 0”
Non-Transitivity and Slicing int main() { int add(int x, int y) { int sum = 0; return x + y; int i = 1; } while (i < 11) { sum = add(sum, i); i = add(i, 1); } printf(“%dn”, sum); printf(“%dn”, i); } Backward slice with respect to “printf(“%dn”, i)”
Non-Transitivity and Slicing int main() { int add(int x, int y) { int sum = 0; return x + y; int i = 1; } while (i < 11) { sum = add(sum, i); i = add(i, 1); } printf(“%dn”, sum); printf(“%dn”, i); } Backward slice with respect to “printf(“%dn”, i)”
Non-Transitivity and Slicing int main() { int add(int x, int y) { int sum = 0; return x + y; int i = 1; } while (i < 11) { sum = add(sum, i); i = add(i, 1); } printf(“%dn”, sum); printf(“%dn”, i); } Forward slice with respect to “sum = 0” Backward slice with respect to “printf(“%dn”, i)”
Non-Transitivity and Slicing int main() { int add(int x, int y) { int sum = 0; return x + y; int i = 1; } while (i < 11) { sum = add(sum, i); i = add(i, 1); } printf(“%dn”, sum); printf(“%dn”, i); } Chop with respect to “sum = 0” and “printf(“%dn”, i)”
Non-Transitivity and Slicing Enter main sum = 0 i = 1 while(i < 11) printf(sum) Call add xin = sum yin = i ( sum = xout xin = i yin= 1 y = yin x=x+y i = xout ] Enter add x = xin printf(i) xout = x
Program Chopping Given source S and target T, what program points transmit effects from S to T? S T “Precise interprocedural chopping” [Reps & Rosay FSE 95]
Dataflow Analysis • Goal: For each point in the program, determine a superset of the “facts” that could possibly hold during execution • Examples – Constant propagation – Reaching definitions – Live variables – Possibly uninitialized variables
Possibly Uninitialized Variables {} Start {w, x, y} {w, y} x=3 if. . . {w, y} y=x {w, y} y=w {w} w=8 {} printf(y) {w, y}
Precise Intraprocedural Analysis C n start pfp = fk fk-1 … f 2 f 1 MOP[n] = pfp(C) p Paths. To[n]
start p(a, b) start main x=3 if. . . ( b=a p(x, y) p(a, b) return from p printf(y) exit main ) return from p ] printf(b) exit p
Precise Interprocedural Analysis ret C start ( MOMP[n] = n ) pfp(C) p Matched. Paths. To[n] [Sharir & Pnueli 81]
Representing Dataflow Functions Identity Function Constant Function a b c
Representing Dataflow Functions “Gen/Kill” Function Non-“Gen/Kill” Function a b c
x y start main x=3 start p(a, b) if. a b . . b=a p(x, y) p(a, b) return from p printf(y) exit main printf(b) exit p
Composing Dataflow Functions f 2 f 1({a, c}) = a b c
x y start main ( start p(a, b) if. x=3 p(x, y) a b . . Might yb be uninitialized here? b=a p(a, b) return from p printf(y) printf(b) NO! YES! exit main ) exit p ]
matched | | | matched (i matched )i 1 i Call. Sites edge stack Off Limits! ( )( ) ( ( ( ) ) )
unbal. Left | | matched unbal. Left (i unbal. Left 1 i Call. Sites stack Off Limits! ( ) ( ( ( ) ) ( ( )
Interprocedural Dataflow Analysis via CFL-Reachability • Graph: Exploded control-flow graph • L: L(unbal. Left) • Fact d holds at n iff there is an L(unbal. Left)-path from
Asymptotic Running Time [Reps, Horwitz, & Sagiv 95] • CFL-reachability – Exploded control-flow graph: ND nodes – Running time: O(N 3 D 3) • Exploded control-flow graph Special structure Running time: O(ED 3) Typically: E l N, hence O(ED 3) l O(ND 3) “Gen/kill” problems: O(ED)
Why Bother? “We’re only interested in million-line programs” • Know thy enemy! – “Any” algorithm must do these operations – Avoid pitfalls (e. g. , claiming O(N 2) algorithm) • The essence of “context sensitivity” • Special cases – “Gen/kill” problems: O(ED) • Compression techniques – Basic blocks – SSA form, sparse evaluation graphs • Demand algorithms
Unifying Conceptual Model for Dataflow-Analysis Literature • • • Linear-time gen-kill [Hecht 76], [Kou 77] Path-constrained DFA [Holley & Rosen 81] Linear-time GMOD [Cooper & Kennedy 88] Flow-sensitive MOD [Callahan 88] Linear-time interprocedural gen-kill [Knoop & Steffen 93] • Linear-time bidirectional gen-kill [Dhamdhere 94] • Relationship to interprocedural DFA [Sharir & Pneuli 81], [Knoop & Steffen 92]
Themes • Harnessing CFL-reachability • Exhaustive alg. Demand alg.
Exhaustive Versus Demand Analysis • Exhaustive analysis: All facts at all points • Optimization: Concentrate on inner loops • Program-understanding tools: Only some facts are of interest
Exhaustive Versus Demand Analysis • Demand analysis: – Does a given fact hold at a given point? – Which facts hold at a given point? – At which points does a given fact hold? • Demand analysis via CFL-reachability – single-source/single-target CFL-reachability – single-source/multi-target CFL-reachability – multi-source/single-target CFL-reachability
x y start main ( start p(a, b) if. x=3 a b . . “Semi-exhaustive”: Might by be All “appropriate” uninitialized demands p(x, y) here? b=a p(a, b) return from p printf(y) printf(b) NO! YES! exit main ) exit p
Experimental Results [Horwitz , Reps, & Sagiv 1995] • 53 C programs (200 -6, 700 lines) • For a single fact of interest: – demand always better than exhaustive • All “appropriate” demands beats exhaustive when percentage of “yes” answers is high – Live variables – Truly live variables – Constant predicates –. . .
A Related Result [Sagiv, Reps, & Horwitz 1996] • ]Uses a generalized analysis technique[ • 38 C programs (300 -6, 000 lines) – copy-constant propagation – linear-constant propagation • All “appropriate” demands always beats exhaustive – factor of 1. 14 to about 6
Exhaustive Versus Demand Analysis • Demand algorithms for – Interprocedural dataflow analysis – Set constraints – Points-to analysis
Most Significant Contributions: 1987 -2000 • Asymptotically fastest algorithms – Interprocedural slicing – Interprocedural dataflow analysis • Demand algorithms – Interprocedural dataflow analysis [CC 94, FSE 95] – All “appropriate” demands beats exhaustive • Tool for slicing and browsing ANSI C – Slices programs as large as 75, 000 lines – University research distribution – Commercial product: Code. Surfer (Gramma. Tech, Inc. )
References • Papers by Reps and collaborators: – http: //www. cs. wisc. edu/~reps/ • CFL-reachability – Yannakakis, M. , Graph-theoretic methods in database theory, PODS 90. – Reps, T. , Program analysis via graph reachability, Inf. and Softw. Tech. 98.
References • Slicing, chopping, etc. – Horwitz, Reps, & Binkley, TOPLAS 90 – Reps, Horwitz, Sagiv, & Rosay, FSE 94 – Reps & Rosay, FSE 95 • Dataflow analysis – Reps, Horwitz, & Sagiv, POPL 95 – Horwitz, Reps, & Sagiv, FSE 95, TR-1283
- Slides: 91