Interprocedural Analysis Noam Rinetzky Mooly Sagiv http www

  • Slides: 35
Download presentation
Interprocedural Analysis Noam Rinetzky Mooly Sagiv http: //www. cs. tau. ac. il/~msagiv/courses/pa 05. html

Interprocedural Analysis Noam Rinetzky Mooly Sagiv http: //www. cs. tau. ac. il/~msagiv/courses/pa 05. html Tel Aviv University 640 -6706 Textbook Chapter 2. 5

Outline u u u u Challenges in interprocedural analysis The trivial solution Why isn’t

Outline u u u u Challenges in interprocedural analysis The trivial solution Why isn’t it adequate Simplifying assumptions A naive solution Join over valid paths The call-string approach The functional approach – A case study linear constant propagation – Context free reachability u u Modularity issues Other solutions

Challenges in Interprocedural Analysis u Respect call-return mechanism u Handling recursion u Local variables

Challenges in Interprocedural Analysis u Respect call-return mechanism u Handling recursion u Local variables u Parameter passing mechanisms: value, valueresult, reference, by name u Procedure nesting u The called procedure is not always known u The source code of the called procedure is not always available – separate compilation – vendor code –. . .

Extended Syntax of While P : = begin D S end D : =

Extended Syntax of While P : = begin D S end D : = proc id(val id*, res id*) isl S endl’ | D D S : = [x : = a]l | [call p(a, z)]ll’ | [skip]l | S 1 ; S 2 | if [b]l then S 1 else S 2 | while [b]l do S b : = true | false | not b | b 1 opb b 2 | a 1 opr a 2 a : = x | n | a 1 opa a 2

A Trivial treatment of procedures u Analyze a single procedure u After every call

A Trivial treatment of procedures u Analyze a single procedure u After every call continue with conservative information – Global variables and local variables which “may be modified by the call” are mapped to

A Trivial treatment of procedures begin proc p() is 1 [a , x ]

A Trivial treatment of procedures begin proc p() is 1 [a , x ] [x : = 1]2 end 3 [a , x 1] [x 0] [call p()]45 [x 0] [print x]6 [x ] end [x ]

Advantages of the trivial solution u Can be easily implemented u Procedures can be

Advantages of the trivial solution u Can be easily implemented u Procedures can be written in different languages u Procedure inline can help u Side-effect analysis can help

Disadvantages of the trivial solution u Modular (object oriented and functional) programming encourages small

Disadvantages of the trivial solution u Modular (object oriented and functional) programming encourages small frequently called procedures u Optimization – Modern machines allows the compiler to schedule many instructions in parallel – Need to optimize many instructions – Inline can be a bad solution u Software engineering – Many bugs result from interface misuse – Procedures define partial functions

Simplifying Assumptions u All the code is available u Simple parameter passing u The

Simplifying Assumptions u All the code is available u Simple parameter passing u The called procedure is syntactically known u No nesting u Procedure names are syntactically different from variables u Procedures are uniquely defined u Recursion is supported

Constant Example begin proc p() is 1 if [b]2 then ( [a : =

Constant Example begin proc p() is 1 if [b]2 then ( [a : = a -1]3 [call p()]45 [a : = a + 1]6 ) [x : = -2* a + 5]7 end 8 [a=7]9 ; [call p()]1011 ; [print(x)]12 end

A naive Interprocedural solution u Treat procedure calls as gotos u Obtain a conservative

A naive Interprocedural solution u Treat procedure calls as gotos u Obtain a conservative solution u Find the least fixed point of the system: DFentry(s) = DFentry(v) = {f(e)(DFentry(u) : (u, v) E} u Use Chaotic iterations

Simple Example [x 0, a 0] begin proc p() is 1 [x : =

Simple Example [x 0, a 0] begin proc p() is 1 [x : = a + end 3 [a=7]4 [call p()]56 [print x]7 [a=9]8 [call p()]910 [print a]11 end [x 0, a 7] 1]2 [x 8, a 7] [x 8, a 9] a=7 [x , a ] [x 8, a 9][x 0, a 7] call p 5 proc p call p 6 x=a+1 print x end a=9 call p 10 print a [x 8, a 7] [x 0, a 7] [x 8, a 7]

Simple Example [x 0, a 0] begin proc p() is 1 [x : =

Simple Example [x 0, a 0] begin proc p() is 1 [x : = a + end 3 [a=7]4 [call p()]56 [print x]7 [a=9]8 [call p()]910 [print a]11 end [x 0, a 7] 1]2 [x , a ] [x , a 9] [x , a ] a=7 call p 5 proc p call p 6 x=a+1 print x end a=9 call p 10 print a [x , a ]

We want something better… u Let paths(v) denote the potentially infinite set paths from

We want something better… u Let paths(v) denote the potentially infinite set paths from start to v (written as sequences of labels) u For a sequence of edges [e 1, e 2, …, en] define f [e 1, e 2, …, en]: L L by composing the effects of basic blocks f [e 1, e 2, …, en](l) = f(en) (… (f(e 2) (f(e 1) (l)) …) u JOP[v] = {f[e 1, e 2, …, en]( ) [e 1, e 2, …, en] paths(v)}

Valid Paths f 1 f 2 callq f 3 ( enterq f 4 ret

Valid Paths f 1 f 2 callq f 3 ( enterq f 4 ret fk-1 ) fk-2 exitq fk-3 f 5 fk

Invalid Path int x; void main() { x = 5; p(); return; } void

Invalid Path int x; void main() { x = 5; p(); return; } void p() { if (. . . ) { x = x + 1; p(); // p_calls_p 1 x = x - 1; } return; }

A More Precise Solution u u Only considers matching calls and returns (valid) Can

A More Precise Solution u u Only considers matching calls and returns (valid) Can be defined via context free grammar Every call is a different letter Matching calls and returns Matched |(c Matched )c for all [call p()]lclr in P Valid Matched | lc Valid for all [call p()]lclr in P

A More Precise Solution u u Only considers matching calls and returns (valid) Can

A More Precise Solution u u Only considers matching calls and returns (valid) Can be defined via context free grammar Every call is a different letter Matching calls and returns Let Lab* = all the labels in the program Lab. IP={lc, lr : [call p()]lclr in the program} Intra | (li, lj) Intra for all li , lj in Lab* Lab. IP Matched | Intra | Matched | (lc, ln) Matched (lx, lr) for all [call p()]lclr and p isln S lx Valid Matched | (lc, ln) Valid for all [call p()]lclr and p isln S lx

The Join-Over-Valid-Paths (JVP( u For a sequence of edges [e 1, e 2, …,

The Join-Over-Valid-Paths (JVP( u For a sequence of edges [e 1, e 2, …, en] define f [e 1, e 2, …, en]: L L by composing the effects of basic statements – f[](s)=s – f [e, p](s) = f[p] (fe (s)) = {f[e 1, e 2, …, e]( ) [e 1, e 2, …, e] vpaths(l), e = (*, l)} u Compute a safe approximation to JVP u In some cases the JVP can be computed u JVPl – Distributivity of f – Functional representation

The Call String Approach for Approximating JVP u No assumptions u Record at every

The Call String Approach for Approximating JVP u No assumptions u Record at every node a pair (l, c) where l L is the dataflow information and c is a suffix of unmatched calls u Use Chaotic iterations u To guarantee termination limit the size of c (typically 1 or 2) u Emulates inline (but no code growth) u Exponential in C u For a finite lattice there exists a C which leads to join over all valid paths

Simple Example [x 0, a 0] begin proc p() is 1 [x : =

Simple Example [x 0, a 0] begin proc p() is 1 [x : = a + end 3 [a=7]4 [call p()]56 [print x]7 [a=9]8 [call p()]910 [print a]11 end [x 0, a 7] 1]2 [x 8, a 7] [x 8, a 9] a=7 call p 5 9, [x 8, a 9] proc p call p 6 x=a+1 print x end a=9 call p 10 print a [x 10, a 9] 5, [x 0, a 7] 9, [x 8, 5, [x 0, a 9] a 7] 9, [x 10, 5, [x 8, a 9] a 7]

Recursive Example begin 0 proc p() is 1 [b]2 if then ( [a :

Recursive Example begin 0 proc p() is 1 [b]2 if then ( [a : = a -1]3 [x 0, a 0] [x 0, a 7] [call p()]45 [a : = a + 1]6 a=7 Call p 10 Call 10: [x -7, a 6] 4: [x -7, Call p 5 print(x) end 8 [a=7]9 ; [call p()]1011 ; [print(x)]12 end 13 If( … ) 10: [x 0, a 7] a=a-1 10: [x 0, a 6] Call p 4 p 11 ) [x : = -2* a + 5]7 10: [x 0, a 7] 4: [x 0, a 6] p 4: [x -7, a 6] a=a+1 4: [x -7, a 7] 4: [x , a ] 4: [x -7, a 6] 4: [x 0, a 6] x=-2 a+5 end

The Functional Approach u The meaning of a function is mapping from states into

The Functional Approach u The meaning of a function is mapping from states into states u The abstract meaning of a function is function from an abstract state to abstract states

Motivating Example e. [x -2 e(a)+5, a e(a)] begin p proc p() is 1

Motivating Example e. [x -2 e(a)+5, a e(a)] begin p proc p() is 1 [b]2 if then ( [a : = a -1]3 [call p()]45 [x 0, a 0] [x 0, a 7] a=7 a=a-1 Call p 10 [a : = a + 1]6[x -9, a 7] Call p 11 ) [x -9, a 7] print(x) [x : = -2* a + 5]7 end 8 [a=7]9 ; [call p()]1011 ; [print(x)]12 end If( … ) Call p 4 Call p 5 a=a+1 x=-2 a+5 end

Motivating Example e. [x -2 e(a)+5, a e(a)] begin p proc p() is 1

Motivating Example e. [x -2 e(a)+5, a e(a)] begin p proc p() is 1 [b]2 if then ( [a : = a -1]3 [call p()]45 [x 0, a 0] [x 0, a ] read(a) a=a-1 Call p 10 [a : = a + 1]6[x , a ] Call p 11 ) [x , a ] print(x) [x : = -2* a + 5]7 end 8 [read(a)]9 ; [call p()]1011 ; [print(x)]12 end If( … ) Call p 4 Call p 5 a=a+1 x=-2 a+5 end

The Functional Approach u Main idea: Iterate on the abstract domain of functions from

The Functional Approach u Main idea: Iterate on the abstract domain of functions from L to L u Two phase algorithm – Compute the dataflow solution at the exit of a procedure as a function of the initial values at the procedure entry (functional values) – Compute the dataflow values at every point using the functional values u Can compute the JVP

Example: Constant propagation = Var N { , } u Domain: F: L L

Example: Constant propagation = Var N { , } u Domain: F: L L u. L – (f 1 f 2)(x) = f 1(x) f 2(x) Id= env L. env x=7 env[x 7] ○ env y=x+1 env[y env(x)+1 ] x=y env[y env(x)+1] ○ env[x 7] ○ env

Example: Constant propagation = Var N { , } u Domain: F: L L

Example: Constant propagation = Var N { , } u Domain: F: L L u. L – (f 1 f 2)(x) = f 1(x) f 2(x) Id= env x=7 env[x 7] Id= env y=x+1 env[y env(x)+1 ] env[y env(x)+1] env[x 7] x=y

Running Example 1 init p 1 begin 0 If( … )2 a=79 a=a-13 Call

Running Example 1 init p 1 begin 0 If( … )2 a=79 a=a-13 Call p 10 Call p 4 Call p 11 Call p 5 print(x)12 a=a+16 end 13 x=-2 a+57 end 8 N Function 0 e. [x e(x), a e(a)]=id 1 e. [x e(x), a e(a)]=id 3 -13 e.

Running Example 1 N Function 1 e. [x e(x), a e(a)]=id p 1 2

Running Example 1 N Function 1 e. [x e(x), a e(a)]=id p 1 2 id begin 0 7 id If( … )2 3 id a=79 4 e. [x e(x), a e(a)-1] a=a-13 Call p 10 5 f ○ e. [x e(x), a e(a)-1] = 8 Call p 4 e. [x -2(e(a)-1)+5, a e(a)-1] Call p 11 6 e. [x -2(e(a)-1)+ 5, a e(a)-1] Call p 5 7 e. [x -2(e(a)-1)+5, a e(a)] e. [x e(x), a e(a)] print(x)12 a=a+16 end 13 8 e. [x -2 e(a)+5, a e(a)] 8 a, x. [x -2 e(a)+5, a e(a)] x=-2 a+57 end 8 0 e. [x e(x), a e(a)]=id 10 e. [x e(x), a 7] 11 a, x. [x -2 e(a)+5, a e(a)] ○ f 10

Running Example 2 N Function 1 [x 0, a 7] p 1 2 [x

Running Example 2 N Function 1 [x 0, a 7] p 1 2 [x 0, a 7] begin 0 7 [x 0, a 7] If( … )2 3 [x 0, a 7] a=79 a=a-13 4 [x 0, a 6] Call p 10 1 [x -7, a 6] Call p 4 6 [x -7, a 7] Call p 11 7 [x , a 7] Call p 5 8 [x -9, a 7] print(x)12 1 [x , a ] a=a+16 end 13 8 [x -9, a 7] x=-2 a+57 end 8 0 [x 0, a 0] 10 [x 0, a 7] 11 [x -9, a 7]

Issues in Functional Approach u How to guarantee that finite height for functional lattice?

Issues in Functional Approach u How to guarantee that finite height for functional lattice? – It may happen that L has finite height and yet the lattice of monotonic function from L to L do not u Efficiently – – – represent functions Functional join Functional composition Testing equality Usually non-trivial But can be done for distributive functions

Example Linear Constant Propagation u Consider the constant propagation lattice u The value of

Example Linear Constant Propagation u Consider the constant propagation lattice u The value of every variable y at the program exit can be represented by: y = {(axx + bx )| x Var* } c u ax , c Z { , } bx Z Supports efficient composition and “functional” join – [z : = a * y + b] – What about [z: =x+y]? u Computes JVP

Functional Approach via Context Free Reachablity u The problem of computing reachability in a

Functional Approach via Context Free Reachablity u The problem of computing reachability in a graph restricted by a context free grammar can be solved in cubic time u Can be used to compute JVP in arbitrary finite distributive data flow problems (not just bitvector( u Nodes in the graph correspond to individual facts u Efficient implementations exit (MOPED(

Conclusion u Handling functions is crucial for abstract interpretation u Virtual functions and exceptions

Conclusion u Handling functions is crucial for abstract interpretation u Virtual functions and exceptions complicate things u But scalability is an issue u Assume-guarantee helps – But relies on specifications