Speeding Up Dataflow Analysis Using Flow Insensitive Pointer
- Slides: 31
Speeding Up Dataflow Analysis Using Flow. Insensitive Pointer Analysis Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer Microsoft Research University of Washington UC Berkeley
Motivation • Static analysis for program verification • Complex dataflow analyses are popular – SLAM, ESP, BLAST, CQual, … – Flow-Sensitive – Interprocedural – Expensive! • Cut down on “data flow facts” • Without losing anything important
General Idea • If complex analysis is worse than O(N) • And you have a cheap analysis that – Is O(N) – Reduces N • Then composing them saves time
Value Flow Graph (VFG) • • • Variant of a points-to graph Encodes the flow of values in the program Conservative approximation Lightweight, fast to compute and query Early queries can safely reduce – data-flow facts considered – program points considered • Like slicing a program wrt. value flow
Computing a VFG • Use a subtyping-based pointer analysis – We used One-Level Flow [Das] • Process all assignments – Not just those involving pointers • Represent constant values explicitly – Put them in the graph • Label graph with source locations – Encodes program slices
Example Points-To Graph x 1: int a, *x; 2: x = &a; 3: *x = 7; Points-to Edge a x Source “Address” Node Expr Node
One Level Flow Graph Flow Edge x 1: int a, *x; 2: x = &a; 3: *x = 7; Points-to Edge a x Source “Address” Node Expr Node
Value Flow Graph 2 x 1: int a, *x; 2: x = &a; 3: *x = 7; Flow Edge Points-to Edge 2 7 3 2 2, 3 a x Source “Address” Node Expr Node
VFG Properties • Computed in almost-linear time • Get points-to sets from VFG in linear time – Backwards reachability via flow edges – Gather up all variables • Get value flow from VFG in linear time – Backwards reachability via flow edges – Follow points-to edges up one
VFG Query: Points-To of x 2 x 1: int a, *x; 2: x = &a; 3: *x = 7; Flow Edge Points-to Edge 2 7 3 2 2, 3 a x Source “Address” Node Expr Node
VFG Query: Value Flow into a 2 x 1: int a, *x; 2: x = &a; 3: *x = 7; Flow Edge Points-to Edge 2 7 3 2 2, 3 a x Source “Address” Node Expr Node
VFG Summary • Computed in almost-linear time • Queries complete in linear time • Approximates flow of values in program • Show two applications that benefit – ESP – SLAM
Application 1: ESP • Verification tool for large C++ programs • Tracks “typestate” of values – Encoded as Finite State Machine – Special Error state • Core: interprocedural data-flow engine – Flow sensitive: state at every point • Performed bottom-up on call graph • Requires function summaries
ESP Function Summaries • Consider stateful memory locations • Summarize function behavior for each loc – Reducing number of locs would be good! – But C has evil casts, so types cannot be used • Worst case set of locations: – All globals and formal parameters – Everything transitively reachable from there
Reduce Location Set • Location L needs to be considered in F if – Some exp E has its state changed in F – Value held by L at entry to F can flow into E • Assuming state-changing ops are known • Query VFG to find values that flow in
ESP Example FILE *e, *f, *g, *h; void foo() { FILE **p; int a = (int)h; if (…) p = &e; else p = &f; *p = fopen(…); } Locations to consider foo() summary: { e, *e, f, *f, g, *g, h, *h }
ESP Example FILE *e, *f, *g, *h; void foo() { FILE **p; int a = (int)h; if (…) p = &e; else p = &f; *p = fopen(…); } (1) Compute VFG (2) Query value flow on *p (3) Reduced locations to consider foo() summary: { e, f } (4) Reduce lines to consider for dataflow
ESP Results • FILE * output in GCC – 140 KLOC, 2149 functions, 66 files, 1068 globals • VFG Queries take 200 seconds • Reduce average number of locations per function summary from 1100 to <1 – Median of 15 for functions with >0 • Verification takes 15 minutes – Infeasible otherwise
Application 2: SLAM • Validates temporal safety properties – Boolean abstraction – Interprocedural dataflow analysis – Counterexample-driven refinement • Convert C program to Boolean program • Exhaustive dataflow analysis – No errors? Program is safe. – Real error? Program has a bug. – False error? Add predicates, repeat.
Boolean Programs int x, y; x = 5; p means “x == 5” y = 6; q means “x < y” x = x * 2; y = y * 2; assert(x<y) Predicates C Program (important!) bool p, q; p = 1; q = 1; p = 0; q = 1; assert(q) Boolean Program
SLAM Predicates • Hard to come up with good predicates • Counterexample-driven refinement – Picks good predicates – Is very slow • Taking all possible predicates – Is even slower • Want “all the useful” predicates
Speeding Up SLAM • For a simple subset of C – Similar to “Copy Constants” – Use VFG to find a sufficient set of predicates – Provably sufficient for this subset • If this set fails to prove the real program – Fall back on counterexample-driven refinement
A Simple Language s : : = vi = n | vi = vj | if (*) s 1 else s 2 | vi = fun(vj, …) | return(vi) | assert(vi » vj) // constants // variable copy // condition ignored // function call // function return // safety property
Predicate Discovery • High-level idea – Each flow edge in the VFG means “values may flow from X to Y” – Add predicates to see if they do • For each assert(vi » vj) – Consider the chain of values flowing to vi, vj – Add an equality predicate for each link – Use constants to resolve scoping
SLAM Example int sel(int f) { int r; if (*) r = f; else r = 3; return(r); } void main() { int a, b, c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c); } b 3 r 2 c 4 f a 1
Predicates For “b” int sel(int f) { int r; if (*) r = f; else r = 3; return(r); } void main() { int a, b, c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c); } b 3 Predicates: b == r r == 3 r == f f == a a == 1 r f a 1
Predicates For “b” int sel(int f) { int r; if (*) r = f; else r = 3; return(r); } void main() { int a, b, c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c); } b 3 r Predicates: b == r r == 3 r == f f == a // no scope! a == 1 f a 1
Predicates For “b” int sel(int f) { int r; if (*) r = f; else r = 3; return(r); } void main() { int a, b, c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c); } b 3 r f a 1 Predicates: b == r r == 3 r == f f == a // no scope! f == 1 f == 3 a == 1 a == 3
Why does this work? • Simple language – No arithmetic, etc. – Just copying around initial values • Knowing final values of variables – Completely decides safety condition • Still related to real life – Cannot do arithmetic on locks, FILE *s, device driver status codes, etc.
Some SLAM Results Program LOC Original Runtime Improved Generated Missing Runtime Predicates apmbatt 2207 229 22 85 0 pnpmem 3849 1132 125 143 4 floppy 7562 1063 600 154 33 iscsiprt 4543 ** 729 146 42 Generated predicates are between all and two-thirds of the necessary predicates. However, since SLAM must iterate once to generate 3 -7 missing predicates, the net performance increase is more than linear. Predicates can be specialized or simplified if the assert() condition is a common relational operator (e. g. , x==y, x<y, x==5).
Conclusions • Complex interprocedural analyses can benefit from inexpensive value-flow • VFG encodes value flow – Constructed and queried quickly • Prune the set of dataflow facts and program points considered • Large net performance increase
- Constant pointer and pointer to constant
- üyou
- Constant pointer and pointer to constant
- Constant to pointer in c
- Display the address of intval using cout and intptr.
- Pointer expressions and pointer arithmetic
- Pointer pointer
- Pointer pointer
- Importance of listening skills for students
- Active listening vs hearing
- Which flip-flop is insensitive to clock overlap?
- Which flip-flop is insensitive to clock overlap?
- Which flip flop is insensitive to clock overlap
- Gender insensitive
- Data flow modeling in verilog examples
- Naiad dataflow
- Suman jana
- Dataflow mmc
- Statement terminator
- Heisenberg schrodinger joke
- A speeding bus makes contact with a bug that splatters
- Two smooth tracks of equal length
- Quickchek menu
- Speeding crashes
- Light speeding up
- Accidents & disasters
- Pathophysiology of atelectasis
- T piece
- O2 mask types
- Define turbulent flow
- Internal flow vs external flow
- Energy naturally flows from warmer matter to cooler matter.