Interprocedural Dataflow Analysis in the Presence of Large

  • Slides: 17
Download presentation
Interprocedural Dataflow Analysis in the Presence of Large Libraries Atanas (Nasko) Rountev Scott Kagan

Interprocedural Dataflow Analysis in the Presence of Large Libraries Atanas (Nasko) Rountev Scott Kagan Ohio State University Thomas Marlowe Seton Hall University PRESTO Research Group, Ohio State University

Uses of Interprocedural Dataflow Analysis § § Performance optimizations in compilers Software understanding and

Uses of Interprocedural Dataflow Analysis § § Performance optimizations in compilers Software understanding and transformation § e. g. dependence analysis for program slicing, change impact analysis, refactoring, etc. Software testing § e. g. dataflow-based testing; testing of object interactions in OO software Software checking § e. g. object protocols: open(read|write)*close 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 2

Model for Interprocedural Whole-Program Analysis code for C 1 code for C 2 …

Model for Interprocedural Whole-Program Analysis code for C 1 code for C 2 … code for Cn § § Engine for Whole. Program Dataflow Analysis dataflow solution for C 1 + C 2 + … + Cn Components C 1, C 2, …, Cn form a complete program Assumption: it is possible and desirable to analyze the source code of the entire program 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 3

A Specific Case: Main + Lib code for Main code for Lib Engine for

A Specific Case: Main + Lib code for Main code for Lib Engine for Whole. Program Dataflow Analysis dataflow solution for Main + Lib form a complete program § What if we are using large libraries that need to be reanalyzed from scratch? § e. g. the standard Java libraries contain about 10, 000 classes and 80, 000 methods § need to be re-analyzed with every new Main component CC 2006, Scott Kagan, PRESTO Research Group 10/21/2021 § 4

Example: Methods in Java Programs 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 5

Example: Methods in Java Programs 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 5

A Specific Case: Main + Lib code for Main summary for Lib § Summary

A Specific Case: Main + Lib code for Main summary for Lib § Summary Generation Analysis Engine for Whole-Program Dataflow Analysis summary for Lib dataflow solution for Main Goal: the solution for Main should be as good as the solution that would have been computed by a wholeprogram analysis (no loss of precision) 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 6

Functional Approach to Whole-Program Analysis § § § § Sharir-Pnueli 1981 Dataflow lattice L

Functional Approach to Whole-Program Analysis § § § § Sharir-Pnueli 1981 Dataflow lattice L Edge function f: L L for effects of a statement Path function: f = fn fn-1 … f 2 f 1 Phase 1: summary functions φn: L L § solution at node n as a function of the solution at the entry of n’s procedure Phase 2: solutions at start nodes of procedures Phase 3: solutions at the remaining nodes 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 7

Example: Functional Approach φ28 6 ==φf 13 8 ff 71 f 0 φ21 =

Example: Functional Approach φ28 6 ==φf 13 8 ff 71 f 0 φ21 = f 4 f 5 (φ28 f 6) φ13 = (φ21 f 2) (φ21 f 3) 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 8

Callbacks § § Callbacks § e. g. function pointers in C § e. g.

Callbacks § § Callbacks § e. g. function pointers in C § e. g. virtual dispatch in C++ and Java Can no longer determine φ21 and φ13 without code for ext 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 9

Library Summary § § Idea: run “pieces” of phase 1 Compute functions for sets

Library Summary § § Idea: run “pieces” of phase 1 Compute functions for sets of librarylocal paths 14 φ 16= id 14 φ 21= f 8 f 7 17 φ 21= f 4 f 5 φ 117 = f 2 f 3 φ 12 = id 13 10/21/2021 f 6 CC 2006, Scott Kagan, PRESTO Research Group 10

Library Summary Generation § § § “Fixed” call in the library § always invokes

Library Summary Generation § § § “Fixed” call in the library § always invokes the same library procedure independent of code for main component “Fixed” procedure in the library § makes no calls, or § makes only fixed calls, to fixed procedures § standard functional approach can be applied k For any other procedure, compute φn § k is the start node, or § k is a return from a non-fixed call, or § k is a return from a fixed call to a non-fixed procedure 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 11

Example: Library Summary Generation § § § Fixed calls § 11 -12 and 23

Example: Library Summary Generation § § § Fixed calls § 11 -12 and 23 -24 Non-fixed calls § 16 -17 Fixed procedures § p 3 Non-fixed procedures § p 1 and p 2 k Contexts k for φn § 7 and 14: start nodes § 17: return from a non-fixed call § 12: return from a fixed call to a non-fixed procedure 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 12

The Condensed Graph p 1 7 p 2 14 16 11 17 12 13

The Condensed Graph p 1 7 p 2 14 16 11 17 12 13 21 14 φ 16= id φ 14 21 = f 8 f 7 f 6 17 φ 21= f 4 f 5 φ 117 = f 2 f 3 φ 12 = id 13 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 13

Analysis of a Main Component § § § Create a “fake” graph for the

Analysis of a Main Component § § § Create a “fake” graph for the whole program Run a wholeprogram analysis engine Safe solutions for non-library nodes § precise for distributive problems 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 14

Original vs. Condensed Library CFGs: Number of Nodes 10/21/2021 CC 2006, Scott Kagan, PRESTO

Original vs. Condensed Library CFGs: Number of Nodes 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 15

Original vs. Condensed Library CFGs: Number of Edges 10/21/2021 CC 2006, Scott Kagan, PRESTO

Original vs. Condensed Library CFGs: Number of Edges 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 16

Discussion § § Flow and context insensitivity Cost reduction: time and memory Compact representation

Discussion § § Flow and context insensitivity Cost reduction: time and memory Compact representation of functions § IFDS, IDE Use assumptions about the callback methods? § e. g. assume callback methods are “good” 10/21/2021 CC 2006, Scott Kagan, PRESTO Research Group 17