Delta Heuristically Minimize Interesting Files delta tigris org

  • Slides: 26
Download presentation
Delta: Heuristically Minimize “Interesting” Files delta. tigris. org Daniel S. Wilkerson work with Scott

Delta: Heuristically Minimize “Interesting” Files delta. tigris. org Daniel S. Wilkerson work with Scott Mc. Peak

This quater million line file crashes my tool! • We had a quarter million

This quater million line file crashes my tool! • We had a quarter million line (preprocessed) C++ file that crashed our C++ front-end (Elsa) • How long would it take you to minimize that by hand? • Delta reduced it in a few hours to a page or two of code • While we did something else!

Delta Debugging Algorithm • Andreas Zeller’s Delta Debugging Algorithm • For file minimization, reduces

Delta Debugging Algorithm • Andreas Zeller’s Delta Debugging Algorithm • For file minimization, reduces to this: for each granularity g from 0 to log 2 N g – partition the file into 2 parts – for each part • test if the file minus part is still interesting • if so, permanently throw out that part • Result is “one minimal” – removing any one line will make test fail

Example: both blue needed • • a b c d e f g h

Example: both blue needed • • a b c d e f g h

both blue needed: g = 0 • • a b c d e f

both blue needed: g = 0 • • a b c d e f g h can’t delete the box since it contains both b and e

both blue needed: g = 1 • • a b c d e f

both blue needed: g = 1 • • a b c d e f g h can’t delete; contains b can’t delete; contains e

both blue needed: g = 2 • • a b c d e f

both blue needed: g = 2 • • a b c d e f g h can delete

both blue needed: g = 3 • • a b c d e f

both blue needed: g = 3 • • a b c d e f g h can delete

both blue needed: final • • a b c d e f g h

both blue needed: final • • a b c d e f g h

You could do this manually. . . • and be much more clever. .

You could do this manually. . . • and be much more clever. . . but delta is often faster • I find it surprising that minimizing a file exibiting a certain behavior, brute force mostly wins over cleverness • “Computers are as dumb as hell but they go like 60” -- Richard Feynman

Do a controlled experiment • An experiment does many things – the interesting bit

Do a controlled experiment • An experiment does many things – the interesting bit – and the boilerplate just to make it go • A control is another experiment – that only does the boilerplate • Do both and “subtract”; finds interesting bit gcc -c $F control: $F passes gcc &&oink $F | grep 'error: . . . ‘ but not oink

topformflat: “explaining hierarchical structure” • To delta, a file is a sequence of lines

topformflat: “explaining hierarchical structure” • To delta, a file is a sequence of lines • topformflat “explains” the nesting of C/C++ • Simple flex filter that copies input to output – but doesn’t print newlines nested deeper than a nesting-depth argument • Strategy: repeatedly minimize with increasing nesting depths

topformflat Example void foo() { for(. . . ) { x -= 5; bar();

topformflat Example void foo() { for(. . . ) { x -= 5; bar(); } while(. . . ) { j++; } } void bar() { z |= 17; foo(); } void baz() {. . . }

topformflat Example, level=0 void foo() {for(. . . ){x -= 5; bar(); }while(. .

topformflat Example, level=0 void foo() {for(. . . ){x -= 5; bar(); }while(. . . ){j++; }} void bar() {z |= 17; foo(); } void baz() {. . . }

topformflat Example, level=1 void foo() { for(. . . ) { x -= 5;

topformflat Example, level=1 void foo() { for(. . . ) { x -= 5; bar(); } while(. . . ) { j++; } } void bar() { z |= 17; foo(); } void baz() {. . . } deleted

topformflat Example, level=2 void foo() { for(. . . ) { x -= 5;

topformflat Example, level=2 void foo() { for(. . . ) { x -= 5; bar(); } while(. . . ) { j++; } } void bar() { z |= 17; foo(); } void baz() {. . . }

Science: Most bugs exhibitable by small inputs • On any input size, the result

Science: Most bugs exhibitable by small inputs • On any input size, the result is almost always small – for C++ input to a compiler, 1 -2 pages of code. • Seems to be a phenomenon of computation – there actually is Science in Computer Science! • but not always – delta worked for a week and still had 50 files – a buffer had to fill up and then flush

The “Configuration File Trick” • Delta generalizes to many situations if you – parameterize

The “Configuration File Trick” • Delta generalizes to many situations if you – parameterize the process with a file – minimize the file. • Simon Goldsmith was instrumenting Java system binaries – “during class-loading JVM would seg-fault; nothing really comprehensible would happen” – wrote a script to read a config file for which instrumented classes to put into the jar file – use delta to minimize the config file

Simulated Annealing • Simulated Annealing – Large, non-convex sub-space – Gradient of goodness –

Simulated Annealing • Simulated Annealing – Large, non-convex sub-space – Gradient of goodness – Random local moves • likely to find another point in the sub-space – Moves parameterizable by a temperature. • Some say the ability to sometimes get worse is essential – I say: locality, randomness, and temperature

Delta as Simulated Annealing • space: files that pass your test • goodness: smaller

Delta as Simulated Annealing • space: files that pass your test • goodness: smaller file is better • local moves: chop out a chunk of file – note that we never “get worse” – so delta is greedy • temperature: chunk size – we have an exponential “annealing schedule”, which is not unusual, says wikipedia anyway.

Delta surprisingly effective • Especially given how ignorant and general it is • Most

Delta surprisingly effective • Especially given how ignorant and general it is • Most ideas for improvements are how to make the local moves better at staying in the space – These ideas generally require knowing what the file means. • Important point: But note how well delta already does knowing nothing! – and topformflat only knows nesting and quotes!

Improvement: use knowledge of dependencies to improve moves If you know the language semantics,

Improvement: use knowledge of dependencies to improve moves If you know the language semantics, reject moves that would violate it, or only make moves that would produce a legal file decl use

Fan Mail • From: Flash Sheridan • This is just a quick thank-you note

Fan Mail • From: Flash Sheridan • This is just a quick thank-you note for Delta. . it immediately reduced a. . . bug file from 16 K lines to ten (GCC bug 22604). • Oddly enough, it initially found a different bug (22603), since I'd only specified "internal compiler error", not "segmentation fault".

Fan Mail, p. 2 • From: Flash Sheridan • Delta has become even more

Fan Mail, p. 2 • From: Flash Sheridan • Delta has become even more valuable since my initial thank-you note. • I'm not sure it's helped with all of the GCC bugs I've been filing. . . but I couldn't have filed most of them without Delta. • Delta has always been able to find a radically smaller file, which I have been able to attach to my bug report.

Fan Mail, p. 3 • From: Richard Guenther • delta is saving a lot

Fan Mail, p. 3 • From: Richard Guenther • delta is saving a lot of gcc developers life ; ) I would guess 1 of 3 bugs sumitted to the gcc bugzilla get their testcase reduced using delta. • . . . a little bit more accurate would be to say we're using delta to reduce all testcases from the gcc bugzilla in case they get entered unreduced.

Delta: This simple dumb script is everywhere! One class devoted to it in both

Delta: This simple dumb script is everywhere! One class devoted to it in both Berkeley and Stanford Software Engineering Courses – Berkeley: “We've just assigned a delta-related homework to the students today” – Stanford: “I gave them a homework assignment for CS 295 using delta. Feedback was positive but unquantified. ” Why did it take so long to think of this simple thing?