Interprocedural Dataflow Analysis in the Presence of Large

















- Slides: 17
Interprocedural Dataflow Analysis in the Presence of Large Libraries Atanas (Nasko) Rountev Scott Kagan Ohio State University Thomas Marlowe Seton Hall University PRESTO Research Group, Ohio State University
Uses of Interprocedural Dataflow Analysis § § Performance optimizations in compilers Software understanding and transformation § e. g. dependence analysis for program slicing, change impact analysis, refactoring, etc. Software testing § e. g. dataflow-based testing; testing of object interactions in OO software Software checking § e. g. object protocols: open(read|write)*close 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 2
Model for Interprocedural Whole-Program Analysis code for C 1 code for C 2 … code for Cn § § Engine for Whole. Program Dataflow Analysis dataflow solution for C 1 + C 2 + … + Cn Components C 1, C 2, …, Cn form a complete program Assumption: it is possible and desirable to analyze the source code of the entire program 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 3
A Specific Case: Main + Lib code for Main code for Lib Engine for Whole. Program Dataflow Analysis dataflow solution for Main + Lib form a complete program § What if we are using large libraries that need to be reanalyzed from scratch? § e. g. the standard Java libraries contain about 10, 000 classes and 80, 000 methods § need to be re-analyzed with every new Main component CC 2006, Scott Kagan, PRESTO Research Group 10/21/2021 § 4
Example: Methods in Java Programs 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 5
A Specific Case: Main + Lib code for Main summary for Lib § Summary Generation Analysis Engine for Whole-Program Dataflow Analysis summary for Lib dataflow solution for Main Goal: the solution for Main should be as good as the solution that would have been computed by a wholeprogram analysis (no loss of precision) 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 6
Functional Approach to Whole-Program Analysis § § § § Sharir-Pnueli 1981 Dataflow lattice L Edge function f: L L for effects of a statement Path function: f = fn fn-1 … f 2 f 1 Phase 1: summary functions φn: L L § solution at node n as a function of the solution at the entry of n’s procedure Phase 2: solutions at start nodes of procedures Phase 3: solutions at the remaining nodes 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 7
Example: Functional Approach φ28 6 ==φf 13 8 ff 71 f 0 φ21 = f 4 f 5 (φ28 f 6) φ13 = (φ21 f 2) (φ21 f 3) 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 8
Callbacks § § Callbacks § e. g. function pointers in C § e. g. virtual dispatch in C++ and Java Can no longer determine φ21 and φ13 without code for ext 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 9
Library Summary § § Idea: run “pieces” of phase 1 Compute functions for sets of librarylocal paths 14 φ 16= id 14 φ 21= f 8 f 7 17 φ 21= f 4 f 5 φ 117 = f 2 f 3 φ 12 = id 13 10/21/2021 f 6 CC 2006, Scott Kagan, PRESTO Research Group 10
Library Summary Generation § § § “Fixed” call in the library § always invokes the same library procedure independent of code for main component “Fixed” procedure in the library § makes no calls, or § makes only fixed calls, to fixed procedures § standard functional approach can be applied k For any other procedure, compute φn § k is the start node, or § k is a return from a non-fixed call, or § k is a return from a fixed call to a non-fixed procedure 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 11
Example: Library Summary Generation § § § Fixed calls § 11 -12 and 23 -24 Non-fixed calls § 16 -17 Fixed procedures § p 3 Non-fixed procedures § p 1 and p 2 k Contexts k for φn § 7 and 14: start nodes § 17: return from a non-fixed call § 12: return from a fixed call to a non-fixed procedure 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 12
The Condensed Graph p 1 7 p 2 14 16 11 17 12 13 21 14 φ 16= id φ 14 21 = f 8 f 7 f 6 17 φ 21= f 4 f 5 φ 117 = f 2 f 3 φ 12 = id 13 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 13
Analysis of a Main Component § § § Create a “fake” graph for the whole program Run a wholeprogram analysis engine Safe solutions for non-library nodes § precise for distributive problems 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 14
Original vs. Condensed Library CFGs: Number of Nodes 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 15
Original vs. Condensed Library CFGs: Number of Edges 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 16
Discussion § § Flow and context insensitivity Cost reduction: time and memory Compact representation of functions § IFDS, IDE Use assumptions about the callback methods? § e. g. assume callback methods are “good” 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 17