Politecnico di Milano Delta Debugging An advanced debugging
Politecnico di Milano Delta Debugging An advanced debugging technique Authors: Carlo Curino, Alessandro Giusti AAIS 05 Curino, Giusti Delta Debugging
Motivations • Reducing faults: • 50%-80% of total cost • Debugging: • One of the hardest, yet least systematic activities of software engineering • most time-consuming • Locating faults: • most difficult AAIS 05 Curino, Giusti Delta Debugging
Overview • Which problems are solved by Delta Debugging • Four solutions: a common approach 1. Simplifying failure-inducing input 2. Isolating failure-inducing thread schedule 3. Identifying failure-inducing changes in the code 4. Isolating Cause-Effect Chains AAIS 05 Curino, Giusti Delta Debugging
Failure-inducing input • This HTML input makes Mozilla crash (segmentation fault). Which portion is the failure-inducing one? AAIS 05 Curino, Giusti Delta Debugging
Thread scheduling • The result of a multithread program seems not deterministic. Why it happens? AAIS 05 Curino, Giusti Delta Debugging
Code changes • The old version of GDB works with DDD, the new one doesn’t! • 178. 000 lines of code have been modified between the two versions where’s the bug? AAIS 05 Curino, Giusti Delta Debugging
Cause-effect chain • Which part of the program state is involved in the failure? AAIS 05 Curino, Giusti Delta Debugging
Four solutions: a single approach • The underlying problem is: • Find which part of something determines the failure So a common strategy can be applied: • Divide et impera applied to deltas between: • Working and failing Inputs • Working and failing code versions • Working and failing threads schedules • Working and failing program states This allows: • Efficient and automatic debugging procedure AAIS 05 Curino, Giusti Delta Debugging
Common terminology • A test case can either: • Fail • (The failure shows up) • Pass • (program runs properly) • Be Unspecified • (different problems arise) • Delta debugging Algorithms iteratively: • Apply changes (to input, code, schedule or state) • Run tests AAIS 05 Curino, Giusti Delta Debugging
Common terminology (2) • Concept of difference: • A really general delta between something in 2 test cases • Examples: • Difference in the input: different character (or bit) in the input stream • Difference in thread schedule: difference in the time a given thread switch is performed • Difference in the code: different statement in 2 version of a program • Difference in the program state: different values of the internal variables of a program AAIS 05 Curino, Giusti Delta Debugging
Simplifying Failure-inducing input AAIS 05 Curino, Giusti Delta Debugging
Minimizing vs Isolating • Minimizing (ddmin algorithm): • Slower • More human friendly • Isolating (dd algorithm): • Generalization of the ddmin algorithm • Faster • Good to generate the input of the cause-effect chain DD AAIS 05 Curino, Giusti Delta Debugging
Minimizing: Mozilla bug • Minimizing: • 57 test to simplify the 896 line HTML input to the “<SELECT>” tag that causes the crash • Each character is relevant (as shown from line 20 to 26) • Only removes deltas from the failing test • Returns a n-minimal (global minimum is NP) input that causes a failure AAIS 05 Curino, Giusti Delta Debugging
Minimizing: didactic example AAIS 05 Curino, Giusti Delta Debugging
Isolating: Mozilla bug • Isolating: • Only 7 tests (instead of 26) • Removes deltas from the failing test and add deltas to passing test • Isolates a single delta “<” that makes the failure to go away • Returns the 2 nearest input on failing and the other passing AAIS 05 Curino, Giusti Delta Debugging
General DD Algorithm Differences Initial Fail Initial Pass AAIS 05 Curino, Giusti Delta Debugging
General DD Algorithm Differences Initial Fail What if we remove these diff from current failing test? Initial Pass AAIS 05 Curino, Giusti Delta Debugging
General DD Algorithm Differences Initial Failure disappears: “Move up” Initial Pass AAIS 05 Curino, Giusti Delta Debugging
General DD Algorithm Initial Fail Differences What if we remove these diff? Initial Pass AAIS 05 Curino, Giusti Delta Debugging
General DD Algorithm Initial Fail Differences UNRESOLVED TEST: “Increase Granularity” Initial Pass AAIS 05 Curino, Giusti Delta Debugging
General DD Algorithm Initial Fail Differences What if we remove these diff from current failing test? Initial Pass AAIS 05 Curino, Giusti Delta Debugging
General DD Algorithm Initial Fail Differences Still Fails: “Move Down” Initial Pass AAIS 05 Curino, Giusti Delta Debugging
Formally: the Algorithm AAIS 05 Curino, Giusti Delta Debugging
Efficiency considerations • The worst case: |k|2 + 3|k| tests (k=cardinality of the change set) • all test cases are unresolved except the last one • very unlikely • The best case: 2*log|k| • Try to avoid unresolved tests outcomes • Lexical, syntactical knowledge about input AAIS 05 Curino, Giusti Delta Debugging
DEMO Eclipse Plugin Live Demo AAIS 05 Curino, Giusti Delta Debugging
Thread Scheduling • The behavior of a multithreaded program may depend on the schedule. AAIS 05 Curino, Giusti Delta Debugging
DD applied to Thread Scheduling • Debug is even harder here: • Thread switches and schedules are nondeterministic • It is difficult to reproduce and isolate failures • Goal: • Relate failure to a small set of relevant differences from passing and failing schedules • Again a “purely experimental approach”, no need to understand the program AAIS 05 Curino, Giusti Delta Debugging
Purely experimental: Pros and Cons • Pros: • program treated as a black box: requires only to execute the program • Failure: an arbitrary behaviour of the program. Requires only to distinguish failure from success. • Cons: • (w. r. t static analysis) Test-based: can not determine properties for all runs of a program like the general absence of deadlocks • require an observable failure AAIS 05 Curino, Giusti Delta Debugging
Dejavu tool • Tool: Dejavu (DEterministic JAVa replay Utility) by IBM • Reproduce of schedules and induced failures • Exploiting Dejavu • the Thread Schedule becomes an input • We can generate schedules by mixing 1 running schedule and 1 failing schedule AAIS 05 Curino, Giusti Delta Debugging
Differences in thread scheduling • Starting point: • Passing run • Failing run • Differences (for t 1): • t 1 occurs in at time 254 • t 1 occurs in at time 278 • ∆1 = |278 − 254| induces a statement interval: the code executed between time 254 and 278 AAIS 05 Curino, Giusti Delta Debugging
Differences in thread scheduling • We can build further test cases mixing the two schedule to isolate the relevant differences AAIS 05 Curino, Giusti Delta Debugging
Real life test: setting • Test #205 of the SPEC JVM 98 Java test suite • Modification of the raytracer program to a multi-threaded version • Introduction of a simple race condition • Implementation of an automated test that checks failure/passing • Generation of random schedules to find a passing schedule and a failing schedule • Differences between the passing and failing schedule: • 3, 842, 577, 240 differences • Each diff moves thread switch time to +1 or -1 AAIS 05 Curino, Giusti Delta Debugging
Real life test: results • DD isolate one single difference after 50 test (about 28 min) AAIS 05 Curino, Giusti Delta Debugging
Real life test: pin-point the failure • The failure occurs if and only if thread switch #33 occurs at yield point (safe point like function invocation) 59, 772, 127 (instead of 59, 772, 126) • at 59, 772, 127 line 91 is the first yield point after the initialization of Old. Scenes. Loaded • At 59, 772, 126 line 82 is the yield point just before the initialization of Old. Scenes. Loaded AAIS 05 Curino, Giusti Delta Debugging
Real life test: conclusion • Delta Debugging is efficient • even when applied to very large thread schedules (>3, 000, 000 diff) • No analysis is required as Delta Debugging relies on experiments alone • only the schedule was observed and altered • failure-inducing thread switch is easily associated with code • Alternate runs are obtained automatically • by generating random schedules • only one initial run (pass or fail) is required AAIS 05 Curino, Giusti Delta Debugging
Code changes • A given revision of a program behaves correctly. The next one does not. • Find which of the changes in the code causes the problem. • Inconvent when difference == thousands of lines of code AAIS 05 Curino, Giusti Delta Debugging
The manual solution • Binary search through the revision history Regression containment • AAIS 05 Does not always work: • Multiple changes that cause the failure only when combined (interference) • A single change can amount to many code lines (granularity) • Mixing parallel developement branches originates inconsistency problems Curino, Giusti Delta Debugging
Procedure • Developed in 1999: some differences with current general DD algorithms. • Consider the differences between the working and failing revisions. • Ignore any knowledge about the temporal ordering of the changes. • Goal: find a minimal failure-inducing change set. AAIS 05 Curino, Giusti Delta Debugging
Inconsistencies • Mixing code changes regardless of their ordering originates lots of tests with “Unresolved” outcome: • Integration failure • Construction failure • Execution failure • They increase complexity of the DD algorithm! AAIS 05 Curino, Giusti Delta Debugging
Future work • Group related changes (partly done) less inconsistent trials. • Common change dates/sources • Location criteria • Lexical criteria • Syntactic criteria (common funcions/modules) • Semantic criteria AAIS 05 Curino, Giusti Delta Debugging
Cause-Effect Background • A bit of background: • A program state is represented by variable values, and references. AAIS 05 Curino, Giusti Delta Debugging
Background (2) • While the program runs, the state evolves. • We assume the program is • Deterministic • Not interactive identical states at identical times have identical evolutions. AAIS 05 Curino, Giusti Delta Debugging
Idea: apply DD to program states. • We need two distinct runs: • one failing • one passing • We want the two runs to be (initially) as much similar as possibile. • If we let the two runs evolve in parallel, their initial state will be similar. • Isolating failure-inducing input can help. • Apply DD to different "slices" of the program evolution. (A sort of TAC for computer routines). AAIS 05 Curino, Giusti Delta Debugging
Procedure • • Iteratively • Build a new state mixing the passing and failing state. • Let the program evolve and see if it passes, fails, or does unrelated weird things (undefined outcome). • Isolate the smallest subset of the state relevant for the failure. No news so far. But: • AAIS 05 this happens at a specific moment of the program evolution. It will be repeated (e. g. at important functions' entry points). Curino, Giusti Delta Debugging
The result • A cause-effect chain that leads to a failure. AAIS 05 Curino, Giusti Delta Debugging
The cause-effect chain • The initial states are absolutely legitimate: for example, direct consequence of a specific input that the program should handle. intended program states. • The final effects are the failure. faulty program states. • AAIS 05 The error lies somewhere in the middle, when an intended program states evolves into a faulty one. Curino, Giusti Delta Debugging
Fascinating terminology • A defect in the code originates an infection in the state. • The infection usually propagates as the program evolves. AAIS 05 Curino, Giusti Delta Debugging
Limits • No automatic discrimination of intended and faulty (infected) states! • The human user can increase resolution of slices, and pinpoint the code that evolves an INTENDED state to a FAULTY one. Correct the error (== defect in the code) and break the cause-effect chain that leads to the failure. AAIS 05 Curino, Giusti Delta Debugging
Cause Transitions • Sometimes executing an instruction • a given variable ceases to be failure-inducing • others begin the failure-inducing subset of the state changes (cause transition) • An algorithm can efficiently find cause transitions in causeeffect chains, by means of binary search (again). AAIS 05 Curino, Giusti Delta Debugging
Cause Transitions (2) AAIS 05 Curino, Giusti Delta Debugging
Cause Transitions (3) Why do we bother looking for cause transitions? • A variable begins to cause a failure: • • Good location for a fix More important: • “cause transitions are significantly better locators of defects than any other methods previously known” • Result: valuable help in the search for the defect: only a bunch of cause transitions, and nearby code locations need to be analyzed as the source of the infection. AAIS 05 Curino, Giusti Delta Debugging
Other approaches to defect localization • Coverage • Slicing • Dynamic invariants no success with Siemens test suite • Explicit specification good results, but needs specification of desired internal behavior • Nearest neighbor (using coverage) best results albeit quite naive AAIS 05 Curino, Giusti Delta Debugging
Evaluation setup • Siemens suite • 7 C sample programs (hundreds of lines of code each). • 132 variations with one realistic defect each. • A test suite for each program. • Apply the different defect locators, and compare their performance (only comparison to NN is presented). AAIS 05 Curino, Giusti Delta Debugging
Evaluation results AAIS 05 Curino, Giusti Delta Debugging
Clarification • Two small improvements; • relevance of code locations (automatic) • sources of infection (programmer-driven): Unfair! Jump to the conclusion AAIS 05 Curino, Giusti Delta Debugging
Zoom on the representation of the state We said: “A program state is represented by variable values, and references” In general, representing and manipulating the state is not trivial • One of the problems: C pointers àcopying their value does not make sense àSolution: Memory graphs. AAIS 05 Curino, Giusti Delta Debugging
Memory graphs • Systematically unfold all data structures, starting from base variables. AAIS 05 Curino, Giusti Delta Debugging
Memory graphs (2) • Nodes: all values and all variables of a program operations like • Edges: • variable access • pointer dereferencing • struct member access • array element access à Abstract from memory addresses. à Compare and alter pointers. AAIS 05 Curino, Giusti Delta Debugging
Memory graphs (3) • What if the set of variables differ in the two states we are mixing? • Just compute the largest common subgraph. àThe deltas we apply to a state: • Change variable values. • Alter data structures. AAIS 05 Curino, Giusti Delta Debugging
Implementation considerations • All we need is a way to access and modify program state. • GDB is the solution for C programs, but has performance problems (5000% overhead). • DD applied to states is still a black box approach (sort of) • Easily extended to other languages as soon as something provides GDBlike functionality. AAIS 05 Curino, Giusti Delta Debugging
Conclusions Delta Debugging: • is an extremely interesting technique • works pretty good at least in theory • there are no usable tools • can be usefully integrated in various IDE • the algorithm is now patent-free (expired patent) SO : LET’S MAKE SOME MONEY ON IT! AAIS 05 Curino, Giusti Delta Debugging
Acknowledgements • Some slides and images adapted from Dr. Andreas Zeller’s presentations and papers • (http: //www. st. cs. uni-sb. de/~zeller/) AAIS 05 Curino, Giusti Delta Debugging
References • Yesterday, My Program Worked. Today, It does Not. Why? , Andreas Zeller, FSE 1999 • Finding Failure Causes through Automated Testing. Holger Cleve, Andreas Zeller; 4° International Workshop on Automated Debugging 2000 • Simplifying failure-inducing input, Ralf Hildebrandt, Andreas Zeller, ISSTA 2000 • Automated Debugging: Are We Close? Andreas Zeller; IEEE Computer, November 2001. • Isolating Failure-Inducing Thread Schedules. Jong-Deok Choi and Andreas Zeller, ISSTA 2002 • Isolating Cause-Effect Chains from Computer Programs, Andreas Zeller, FSE 2002 • Locating Causes of Program Failures. Holger Cleve and Andreas Zeller, ICSE 2005 AAIS 05 Curino, Giusti Delta Debugging
- Slides: 63