Dynamically Discovering Likely Program Invariants to Support Program

  • Slides: 51
Download presentation
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell,

Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented By: Wes Toland, Geoff Gerfin

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Invariants n What are invariants? n n n A constraint over a variable’s values

Invariants n What are invariants? n n n A constraint over a variable’s values A relationship between multiple variable values. Defined as mathematical predicates (Example: n >= 0)

Importance of Invariants n In program development: n n n Refining a specification Aid

Importance of Invariants n In program development: n n n Refining a specification Aid in runtime checking In software evolution: n n Aid programmer in understanding functionality of undocumented program so incorrect assumptions are not made. Violation of invariant results in a bug.

Daikon n n Programmers do not usually explicitly annotate or document code with invariants.

Daikon n n Programmers do not usually explicitly annotate or document code with invariants. Daikon proposes to automatically determine program invariants and report them in a meaningful manner.

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Daikon’s Infrastructure

Daikon’s Infrastructure

Daikon’s Infrastructure: Original Program i, s : = 0, 0; do i != n

Daikon’s Infrastructure: Original Program i, s : = 0, 0; do i != n -> i, s : = i + 1, s + b[i] od

Daikon’s Infrastructure: Instrumented Program print b, n; i, s : = 0, 0; do

Daikon’s Infrastructure: Instrumented Program print b, n; i, s : = 0, 0; do i != n -> print i, s, n, b[i]; i, s : = i + 1, s + b[i] od

Daikon’s Infrastructure: Trace File print b, n; i, s : = 0, 0; do

Daikon’s Infrastructure: Trace File print b, n; i, s : = 0, 0; do i != n -> print i, s, n, b[i]; i, s : = i + 1, s + b[i] od

Daikon’s Infrastructure: Invariants Determined Invariants� 1. ) n >= 0 2. ) s =

Daikon’s Infrastructure: Invariants Determined Invariants� 1. ) n >= 0 2. ) s = SUM(B) 3. ) i >= 0

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Code Instrumentation (1/6)

Code Instrumentation (1/6)

Code Instrumentation (2/6) n Daikon’s front-end modifies source code to trace specific variables at

Code Instrumentation (2/6) n Daikon’s front-end modifies source code to trace specific variables at points of interest: n n Function entry points (pre-conditions) Function exit points (post-conditions) Loop heads (loop invariants) The trace data is used as input to Daikon’s back-end, which is used to infer invariants

Code Instrumentation (3/6) n n Daikon uses an abstract syntax tree for code instrumentation.

Code Instrumentation (3/6) n n Daikon uses an abstract syntax tree for code instrumentation. What is an AST?

Code Instrumentation (4/6) How could this be useful for code instrumentation?

Code Instrumentation (4/6) How could this be useful for code instrumentation?

Code Instrumentation (5/6) n AST is used by Daikon to determine which variables are

Code Instrumentation (5/6) n AST is used by Daikon to determine which variables are in scope at each point of interest. n Code is inserted into program point to write the values for all variables in scope to a file in a specific format.

Code Instrumentation (6/6) n n Status variables are created for each original program variable

Code Instrumentation (6/6) n n Status variables are created for each original program variable and are passed along throughout function calls. Status variables: n n Modification timestamp (Used to prevent garbage output) Smallest and largest indices (for arrays and pointers) Linked list flag Status variables are updated when a program manipulates its associated variable.

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Data Trace Generation (1/2)

Data Trace Generation (1/2)

Data Trace Generation (2/2) Instrumented Code print b, n; i, s : = 0,

Data Trace Generation (2/2) Instrumented Code print b, n; i, s : = 0, 0; do i != n -> print i, s, n, b[i]; i, s : = i + 1, s + b[i] od Data Trace DB

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Inferring Invariants

Inferring Invariants

Types of Invariants (1/3) Single Variables Constant Value x=a Uninitialized Value x = uninit

Types of Invariants (1/3) Single Variables Constant Value x=a Uninitialized Value x = uninit Small Value Set x € {a, b, c} Single Numeric Variables Range Limits x >= a, x <= b, etc… Non-zero x != 0 Modulus x = a (mod b) Non-Modulus x != a (mod b)

Types of Invariants (2/3) Two Numeric Variables Linear Relationship y = ax + b

Types of Invariants (2/3) Two Numeric Variables Linear Relationship y = ax + b Functional Relationship y = f(x) Comparison x > y, x = y, etc… Combinations of Single x+y = a (mod b) Numeric Values Three Numeric Variables Polynomial Relationship z = ax + by + c

Types of Invariants (3/3) n Single-sequence variables: n n Range (min and max values)

Types of Invariants (3/3) n Single-sequence variables: n n Range (min and max values) Ordering (increasing or decreasing) Invariants over all elements (Given array[size], all elements >= c) Two-sequence variables n n n Linear relationship ( y[100] = a*x[100] + b ) Comparison ( x < y where x[i] = y[i]-1 ) Reversal for(i = 0; i < length(y); i++) n x[i]= y[length(y) - i] Sequence and numeric variables: n Membership: ( i € s)

Inferring Invariants (1/5) What invariants should be inferred from this method, regardless of the

Inferring Invariants (1/5) What invariants should be inferred from this method, regardless of the test suite input?

Inferring Invariants (2/5)

Inferring Invariants (2/5)

Inferring Invariants (3/5) n Daikon can identify from this trace that for all samples,

Inferring Invariants (3/5) n Daikon can identify from this trace that for all samples, x = orig(x)

Inferring Invariants (4/5) n Daikon can identify from this trace that for all samples,

Inferring Invariants (4/5) n Daikon can identify from this trace that for all samples, y = orig(y) = 1.

Inferring Invariants (5/5) n n Daikon can identify from this trace that for all

Inferring Invariants (5/5) n n Daikon can identify from this trace that for all samples, *x = orig(*x) + 1. Is this invariant too limited?

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Uses of Invariants (1/2) n Explicated Data Structures n n Confirmed and contradicted expectations

Uses of Invariants (1/2) n Explicated Data Structures n n Confirmed and contradicted expectations n n n Clearly define undocumented data structures without looking through code. Assert an understanding of code functionality. Example: It may appear that x is always less than y, which Daikon can verify for the programmer (assuming a valid test suite). Bug Discovery

Uses of Invariants (2/2) n Identify limited use of procedures n n Demonstrate test

Uses of Invariants (2/2) n Identify limited use of procedures n n Demonstrate test suite inadequacy n n Identify procedures that have unnecessary functionality based on the input. Reveal shortcoming of exercising all branches within a program by analyzing Daikon’s output. Validate program changes n n After a piece of code has been heavily modified, but should still abide by the original specifications, it is a good idea to compare the invariants. If they match, the programmer can be more confident that the modifications did not have adverse effects.

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Evaluation Overview n n n Asserting Daikon’s Invariant Detection Performance Evaluation Stability Evaluation

Evaluation Overview n n n Asserting Daikon’s Invariant Detection Performance Evaluation Stability Evaluation

Asserting Daikon’s Invariant Detection n n Simple accuracy evaluation of Daikon A sample program

Asserting Daikon’s Invariant Detection n n Simple accuracy evaluation of Daikon A sample program was taken from The Science of Programming n n n The “gold standard” of invariant identification Program had documented precondition, postcondition, and loop variant specifications Daikon reproduced all documented specifications plus some additional invariants: n n n Erroneously omitted (omitted in documentation) Information about the test suite Extraneous (Redundant invariants)

Performance Evaluation n n Siemen’s replace program is used over varying test cases and

Performance Evaluation n n Siemen’s replace program is used over varying test cases and number of variables. Most important factor: number of variables over which invariants are checked n n n This is not the total number of program variables, rather it is the number of variables in a program point’s scope. Invariant detection time grows quadratically with this factor. Additionally, invariant detection time grows linearly with test suite size.

Performance Evaluation

Performance Evaluation

Stability Evaluation • Number of test cases affects different types of invariants in different

Stability Evaluation • Number of test cases affects different types of invariants in different ways: • Note that the identical unary invariants do not vary much as the number of test cases are increased. • However, the number of differing unary invariants varies largely.

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Related Work (1/2) n Static Approaches to Inferring Invariants n Operate on program text,

Related Work (1/2) n Static Approaches to Inferring Invariants n Operate on program text, not test runs (symbolic execution) [Hoare 69]. n Advantages n Reported invariants are true for any program run (but not necessarily exhaustive). n Theoretically, static approaches can detect all sound invariants if a program is run to convergence. n Limitations n Omit properties that are true but uncomputable. n Pointer manipulation is impossible to approximate.

Related Work (2/2) n Dynamic Approaches to Inferring Invariants n Event traces [Blum 93].

Related Work (2/2) n Dynamic Approaches to Inferring Invariants n Event traces [Blum 93]. n Uses a state machine instead of AST. n Advantage: Lower data storage requirements. n Runtime switches based on user-inserted assert statements (Value Profiling) [Hanson 93].

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Limitations (1/2) n Accuracy of inferred invariants depends on quality and completeness of test

Limitations (1/2) n Accuracy of inferred invariants depends on quality and completeness of test cases n n n Additional test cases could provide data that will lead to additional invariants to be inferred. Additionally, invariants may only hold true for cases in test suite Daikon produces gigabytes of trace data, even while analyzing trivial programs. n The initial prototype implementation ran out of memory when testing 5, 542 test cases

Limitations (2/2) n n The instrumenter, and therefore Daikon, is currently limited to C,

Limitations (2/2) n n The instrumenter, and therefore Daikon, is currently limited to C, Java, and Lisp. Daikon does not yet follow arbitrary-length paths through recursive structures. Daikon cannot compute invariants such as linear relationships over numerous variables (more than 3). Instrumenting the program by modifying object code (opposed to source code) would allow for improved precision and portability. n n Exact memory locations could be traced. This approach has many more obstacles.

Future Work (1/2) n Ernst et. al. planned on increasing relevance and performance after

Future Work (1/2) n Ernst et. al. planned on increasing relevance and performance after this work by: n n n Reducing redundant invariance. Removing relations from variables that can be statically proven to be unrelated. Ignoring variables that have not been assigned since their last instrumentation. Converting the implementation of Daikon from Python to C. Checking fewer invariants (useful when programmer wants to focus on specific part of code)

Future Work (2/2) n Since paper publication: n Additional front-end support: 2002: Perl (dfepl

Future Work (2/2) n Since paper publication: n Additional front-end support: 2002: Perl (dfepl front-end implementation) n 2005: C++ (Kvasir front-end implementation) n n 2003: Various performance improvements: Handle data trace files incrementally n Original implementation stored entire trace file in memory n n 2005: IDE Plug-in support for Visual Studio

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring

Outline n n n n n Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Discussion n Questions? ? ?

Discussion n Questions? ? ?