DataFlow Analysis Approaches Dynamic Analysis Static Analysis Assertions

  • Slides: 38
Download presentation
Data-Flow Analysis

Data-Flow Analysis

Approaches • Dynamic Analysis • Static Analysis • Assertions • Error seeding, mutation testing

Approaches • Dynamic Analysis • Static Analysis • Assertions • Error seeding, mutation testing • Coverage criteria • Fault-based testing • Object oriented testing • Regression testing • • Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency analysis Interprocedural analysis

Data Flow Analysis (DFA) • Efficient technique for proving properties about programs • Not

Data Flow Analysis (DFA) • Efficient technique for proving properties about programs • Not as powerful as automated theorem provers, but requires less human expertise • Uses an annotated control flow graph model of the program • Compute facts for each node • Use the flow in the graph to compute facts about the whole program • We’ll focus on single units

Some examples of DFA techniques • DFA used extensively in program optimization • e.

Some examples of DFA techniques • DFA used extensively in program optimization • e. g. , determine if a definition is dead (and can be removed) determine if a variable always has a constant value determine if an assertion is always true and can be removed • DFA can also be used to find anomalies in the code • Find def/ref anomalies [Osterweil and Fosdick] • Cecil/Cesar system demonstrated the ability to prove general userspecified properties [Olender and Osterweil] • FLAVERS demonstrated applicability to concurrent system [Dwyer and Clarke] • Why “anomalies” and not faults? • May not correspond to an actual executable failure

Data flow analysis • First, determine local information that is true at each node

Data flow analysis • First, determine local information that is true at each node in the CFG • e. g. , What variables are defined What variables are referenced • Usually stored in sets • e. g. , ref(n) is the set of variables referenced at node n • Second, use this local information and control flow information to compute global information about the whole program • Done incrementally by looking at each node’s successors or predecessors • Use a fixed point algorithm-continue to update global information until a fixed point is reached

Reaching Definitions • Definition reaches a node if there is a def clear path

Reaching Definitions • Definition reaches a node if there is a def clear path from the definition to that node • Definition of x at node 1 reaches nodes 2, 3, 4, 5 but not 6 Could be used to determine data dependencies, useful for debugging, data flow testing, etc. Start from a definition, and move forward on the graph to see how far it reaches 1 x: = 2 5 x: = 3 4 : =x 6

Computing global information- reaching definitions Definitions that might reach a node reaching _def={xi} Start

Computing global information- reaching definitions Definitions that might reach a node reaching _def={xi} Start from a definition, and move forward on the graph to see how far it reaches 2 reaching 1 _def={yj} 3 reaching_def= {xi, yj} reaching_ def={xi, yj} Definitions that must reach a node 1 2 3 reaching_ def={yj} reaching_def ={yj}

Reaching Definitions • Xi means that the definition of variable x at node i

Reaching Definitions • Xi means that the definition of variable x at node i {might|must} reach the current node • Also stated as {possible|definite} {some|all} {any|all} {may|must}

Computing values for a node Forward flow • Keep track of the definitions into

Computing values for a node Forward flow • Keep track of the definitions into a node that have not been redefined • Flow out of a node depends on flow in and on what happens in the node In(n) def(n)={y} ref(n)={x…} y: = …x Out(n)

Example: Possible Reaching Definitions X 1 Y 2 X 4 int x, y; .

Example: Possible Reaching Definitions X 1 Y 2 X 4 int x, y; . . . x : = foo(); y : = x + 2; if x > 0 then x : = x + y; end if; . . . Forward flow, any path problem, def/ref sets are the initial facts associated with each node def={x} ref={ } x = foo() {X } 1 {X 1} def={y} ref={x} y=x+2 def={} ref={x} {X 1, Y 2} if(x > 0) {X , Y } 1 2 def={x} ref={x, y} def={} ref={ } {X 1, Y 2} x = x + y {X 4, Y 2} {X 1, Y 2, X 4}

Example: Definite Reaching Definitions { } X 1 Y 2 X 4 int x,

Example: Definite Reaching Definitions { } X 1 Y 2 X 4 int x, y; . . . x : = foo(); y : = x + 2; if x > 0 then x : = x + y; end if; . . . Forward flow, all path problem, def/ref sets are the initial facts associated with each node x = foo() {X } 1 y=x+2 {X 1} {X 1, Y 2} if(x > 0) {X , Y } 1 2 {X 1, Y 2} x = x + y {X 4, Y 2} {Y 2}

Example: Definite Reaching Definitions { } X 1 Y 2 X 4 int x,

Example: Definite Reaching Definitions { } X 1 Y 2 X 4 int x, y; . . . x : = foo(); y : = x + 2; if x > 0 then x : = x + y; end if; . . . Forward flow, all path problem, def/ref sets are the initial facts associated with each node What happens when we add a loop? x = foo() {X } 1 y=x+2 {X 1} {X 1, Y 2} if(x > 0) {X , Y } 1 2 {X 1, Y 2} x = x + y {X 4, Y 2} {Y 2}

Example: Definite Reaching Definitions { } X 1 Y 2 X 4 int x,

Example: Definite Reaching Definitions { } X 1 Y 2 X 4 int x, y; . . . x : = foo(); y : = x + 2; if x > 0 then x : = x + y; end if; . . . Forward flow, all path problem, def/ref sets are the initial facts associated with each node What happens when we add a loop? x = foo() {X } 1 y=x+2 {X 1} {X 1, Y 2} X if(x > 0) {X X 1, Y 2} x=x+y {Y 2} {X 1, Y 2} X {X 4, Y 2} X

Computing values for a node Forward flow • Keep track of the definitions into

Computing values for a node Forward flow • Keep track of the definitions into a node that have not been redefined • Flow out of a node depends on flow in and on what happens in the node In(n) y: = …x Out(n) For definite reaching defs: In(n) = kєpred. Out(k) Out(n) = In(n)-def(n) U def(nn)

Live Variables • a variable, x, is live at node p if there exists

Live Variables • a variable, x, is live at node p if there exists a def-clear path wrt x from node p to a use of x • x is live at 2, 3, 4, but not at node 5 Used to determine what variables need to be kept in a register after executing a node Start from a use, and move backward on the graph until a def is encountered 1 x: = 2 5 3 4 : =x 6 x: =

Computing global information- live variables Possible live variables 1 live={x, y} Definite live variables

Computing global information- live variables Possible live variables 1 live={x, y} Definite live variables 2 1 live={x} 3 4 live={y} 2 live={x} 3 live={x} 4 live={x, y}

Live Variables: Computing values for a node Backward flow • Keep track of the

Live Variables: Computing values for a node Backward flow • Keep track of the (forward) references that have not found their (backward) definition site • Flow out of a node depends on flow in and on what happens in the node Out(n) y: = …x In(n)

Example: Possible Live Variables int x, y; . . . x : = foo();

Example: Possible Live Variables int x, y; . . . x : = foo(); y : = x + 2; if x > 0 then x : = x + y; end if; . . . Backward flow, any path problem, def/ref sets are the initial facts associated with each node x = foo() { } {x} y=x+2 {x, y} if(x > 0) {x, y} x=x+y {} {} {x, y} {}

Example: Definite Live Variables int x, y; . . . x : = foo();

Example: Definite Live Variables int x, y; . . . x : = foo(); y : = x + 2; if x > 0 then x : = x + y; end if; . . . Backward flow, all path problem, def/ref sets are the initial facts associated with each node x = foo() { } {x} y=x+2 {x} if(x > 0) {} x=x+y {} {} {x, y} {}

Example: Definite Live Variables int x, y; . . . x : = foo();

Example: Definite Live Variables int x, y; . . . x : = foo(); y : = x + 2; if x > 0 then x : = x + y; end if; . . . • Backward flow, all path problem • def/ref sets are the initial facts associated with each node • In(n) = kєsucc(n)Out(k) Out(n) = In(n)-def(n) U ref(n) x = foo() { } {x} y=x+2 {x} if(x > 0) {} x=x+y {} {} {x, y} {}

Data Flow Analysis decisions • Backward/Forward • Any/All • Facts 1 • Equations 2

Data Flow Analysis decisions • Backward/Forward • Any/All • Facts 1 • Equations 2 3 • Initial values 4 5 Union or Intersection

Constant Propagation • Some variables at a point in a program may only take

Constant Propagation • Some variables at a point in a program may only take on one value • If we know this, can optimize the code when it is compiled

Constant Propagation int x, y; . . . x : = 3; y :

Constant Propagation int x, y; . . . x : = 3; y : = x + 2; if x > z then x : = x + y; end if; . . . x=3 y=x+2 if(x > z) x=x+y

Constant Propagation (x, y) U=unknown N= not a constant (U, U) int x, y;

Constant Propagation (x, y) U=unknown N= not a constant (U, U) int x, y; . . . x : = 3; y : = x + 2; if x > z then x : = x + y; end if; . . . Forward flow, all paths problem Facts are the computations and assignments made at each node x=3 (3, U) y=x+2 (3, 5) if(x >z) (3, 5) x=x+y (N, 5) (8, 5)

Constant Propagation with a loop U=unknown N= not a constant (U, U) x=3 (3,

Constant Propagation with a loop U=unknown N= not a constant (U, U) x=3 (3, U) y=x+2 (3, 5) (N, 5) if(x > z) (3, 5) (N, 5) Forward flow, all paths problem Facts are the computations and assignments made at each node x=x+y (N, 5) (3, 5) (N, 5) (8, 5) (N, 5)

Fixed point • The data flow analysis algorithm will eventually terminate • If there

Fixed point • The data flow analysis algorithm will eventually terminate • If there are only a finite number of possible sets that can be associated with a node • If the function that determines the sets that can be associated with a node is monotonic

Constant Propagation with a loop U=unknown N= not a constant (U, U) x=3 (3,

Constant Propagation with a loop U=unknown N= not a constant (U, U) x=3 (3, U) y=x+2 (3, 5) if(x > 0) (3, 5) (N, 5) Forward flow, all paths problem Facts are the computations and assignments made at each node x=x+y (N, 5) (3, 5) (N, 5) (8, 5) (N, 5)

DAVE: detects anomalous def/ref behavior • L. D. Fosdick and Leon J. Osterweil, "

DAVE: detects anomalous def/ref behavior • L. D. Fosdick and Leon J. Osterweil, " Data Flow Analysis in Software Reliability", ACM Computing Surveys, September 1976, 8 (3), pp. 306 -330. • Application independent specification of erroneous behavior • use of an uninitialized variable • redefinition of a variable that is not referenced

Anomalous pairs of ref/def information d - defined, r - referenced, u - undefined

Anomalous pairs of ref/def information d - defined, r - referenced, u - undefined d. . r: defined variable reaches a reference u. . d: undefined variable reaches a definition d. . d: definition is redefined before being used d. . u: definition is undefined before being used u. . r: undefined variable reaches a reference Unreference definition undefined reference

Consider an unreferenced definition • For a definition of a, want to know if

Consider an unreferenced definition • For a definition of a, want to know if that definition is not going to be referenced • Is there some path where a is redefined or undefined before being used? • May be indicative of a problem • Usually just a programming convenience and not a problem • For a definition of a, want to know if on all paths is a redefined or undefined before being used • May be indicative of a problem • Or could just be wasteful

Some versus All def a ref a def a For some path, a is

Some versus All def a ref a def a For some path, a is redefined without a subsequent reference def a For all paths, a is redefined without a subsequent reference

Unreferenced definition: Computing values for a node Forward flow • Keep track of the

Unreferenced definition: Computing values for a node Forward flow • Keep track of the definitions into a node that have not been referenced • Flow out of a node depends on flow in and on what happens in the node In(n) y: = …x Out(n)

Unreferenced definition: Computing values for a node Forward flow • Keep track of the

Unreferenced definition: Computing values for a node Forward flow • Keep track of the definitions into a node that have not been referenced • Flow out of a node depends on flow in and on what happens in the node In(n) y: = …x Out(n) Forward flow, all paths problem In(n) = kєpred(n)Out(k) Out(n) = In(n)-ref(n) U def(n)

Unreferenced definitions int x, y; . . . x : = foo(); y :

Unreferenced definitions int x, y; . . . x : = foo(); y : = x + 2; if x > 0 then x : = x + 1; end if; y: = … (unreferenced defs) {} x = foo() {x} y=x+2 {y} if(x > 0) {y} Need to look at each node where there is a def {y} x=x+1 Forward flow, all paths problem y: = z {y} {x, y}

Continuing with the unreferenced def example def={x} ref={ } • A definiton is redefined

Continuing with the unreferenced def example def={x} ref={ } • A definiton is redefined without being used if def(n) In(n)-ref(n) {} x = foo() {x} def={y} {x} ref={x} y = x + 2 • Must compute this for each node {y} def={} ref={x} {y} if(x > 0) {y} def={x} ref={x} def={y} ref={ } x = x +1 y: = … {y} {x, y} • For this example, the last node would report a redefinition of y on all paths • Finds the location where the redef occurs

Unreferenced definitions, (finds the location of the first def of the pair) int x,

Unreferenced definitions, (finds the location of the first def of the pair) int x, y; . . . x : = foo(); y : = x + 2; if x > 0 then x : = x + 1; end if; y : =. . . (unreferenced defs) {x, y} x = foo() {y} y=x+2 {y} double def {y} if(x > 0) {y} x=x+1 Backward flow, all paths problem y: = … {y} {y}

In values depend on direction Forward flow Backward flow In(n) y: = … Out(n)

In values depend on direction Forward flow Backward flow In(n) y: = … Out(n) In(n)

Data Flow Analysis Problem • Need to determine the information that should be computed

Data Flow Analysis Problem • Need to determine the information that should be computed at a node • Need to determine how that information should flow from node to node • Backward or Forward • Union or Intersection • Often there is more than one way to solve a problem • Can often be solved forward or backward, but usually one way is easier than the other