EndUser Shape Analysis BorYuh Evan Chang Xavier Rival
End-User Shape Analysis Bor-Yuh Evan Chang 張博聿 Xavier Rival George C. Necula U of Colorado, Boulder INRIA/ENS Paris U of California, Berkeley National Taiwan University – August 11, 2009
Programming Languages Research at the University of Colorado, Boulder
Software errors cost a lot ~$60 billion annually (~0. 5% of US GDP) – 2002 National Institute of Standards and Technology report > total annual revenue of > 10 x annual budget of Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 3
But there’s hope in program analysis Microsoft uses and distributes the Static Driver Verifier Airbus applies the Astrée Static Analyzer Companies, such as Coverity and Fortify, market static source code analysis tools Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 4
Because program analysis can eliminate entire classes of bugs For example, – Reading from a closed file: read( – Reacquiring a locked lock: ); acquire( ); How? – Systematically examine the program – Simulate running program on “all inputs” – “Automated code review” Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 5
Program analysis by example: Checking for double acquires Simulate running program on “all inputs” …code … // x now points to an unlocked lock analysis acquire(x ); acquire( state … code … x acquire(x); … code … Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 6
Program analysis by example: Checking for double acquires Simulate running program on “all inputs” …code … undecidability // x now points to an unlocked lock in a linked list ideal analysis state or or x x or … x acquire( ); … code … Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 7
Must abstract Abstraction too coarse or not precise enough (e. g. , lost x is always unlocked) …code … // x now points to an unlocked lock in a linked list ideal analysis state or x acquire(x acquire( ); … code … mislabels good code as buggy or analysis state … x ? x For decidability, must abstract—“model all abstract inputs” (e. g. , merge objects) Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 8
To address the precision challenge Traditional program analysis mentality: “ Why can’t developers write more specifications for our analysis? analysis Then, we could verify so much more. ” “ Since developers won’t write specifications, we will use default abstractions (perhaps coarse) that work hopefully most of the time. ” End-user approach: approach “ Can we design program analyses around the user? Developers write testing code. Can we adapt the analysis to use those as specifications? ” Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 9
Summary of overview Challenge in analysis: Finding a good abstraction precise enough but not more than necessary Powerful, generic abstractions expensive, hard to use and understand Built-in, default abstractions often not precise enough (e. g. , data structures) End-user approach: approach Must involve the user in abstraction without expecting the user to be a program analysis expert Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 10
Overview of contributions Extensible Inductive Shape Analysis (Xisa) Precise inference of data structure properties Able to check, for instance, the locking example Targeted to software developers Uses data structure checking code for guidance Ø Turns testing code into a specification for static analysis Efficient ~10 -100 x speed-up over generic approaches Ø Builds abstraction out of developer-supplied checking code Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 11
End-user approach Extensible Inductive Shape Analysis Precise inference of data structure properties …
Shape analysis is a fundamental analysis Data structures are at the core of – Traditional languages (C, C++, Java) – Emerging web scripting languages Improves verifiers that try to – Eliminate resource usage bugs … (locks, file handles) – Eliminate memory errors (leaks, dangling pointers) – Eliminate concurrency errors (data races) – Validate developer assertions Enables program transformations – Compile-time garbage collection – Data structure refactorings Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 13
Shape analysis by example: Removing duplicates Example/Testing l 2 2 4 Code Review/Static Analysis 4 l “sorted dl list” // l is a sorted doubly-linked list program-specific for each node cur in list l { intermediate state remove cur if duplicate; more complicated l 2 4 4 l } cur assert l is sorted, doubly-linked with no duplicates; l 2 4 l “segment with no duplicates” “sorted dl list” cur “no duplicates” Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 14
Shape analysis is not yet practical Choosing the heap abstraction difficult for precision Some representative approaches: approaches 89 Parametric in low-level, analyzer-oriented predicates TVLA [Sagiv et al. ] + Very general and expressive - Harder for non-expert Built-in high-level predicates Space Invader [Distefano et al. ] End-user approach: approach - Harder to extend + No additional user effort (if precise enough) Parametric in high-level, developer-oriented predicates Xisa + Extensible + Targeted at developers Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 15
Our approach: Executable specifications Utilize “run-time checking code” code as specification for static analysis. h. dll(p) = if (h =null) then true else h!prev= p and h!next. dll(h) checker • p specifies where prev should point Contribution: Build the abstraction assert(sorted_dll(l, …)); l for analysis out of for each nodecurinlistl { developer-specified checking code removecurif duplicate; Contribution: Automatically generalize checkers for complicated intermediate states } l cur assert(sorted_dll_nodup(l, …)); l Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 16
Xisa is … An automated shape analysis with a precise memory abstraction based around invariant checkers h. dll(p) = if (h = null) then true else h!prev = prev and h!next. dll(h) checkers Xisa • Extensible and targeted for developers – Parametric in developer-supplied checkers—viewed as inductive definitions in separation logic • Precise yet compact abstraction for efficiency – Data structure-specific based on properties of interest to the developer Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 17
Shape analysis is an abstract interpretation on abstract memory descriptions with … Splitting of summaries l l cur To reflect updates precisely l l cur And summarizing for termination cur l l cur Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis cur 18
Roadmap: Components of Xisa Learn information about the checker to use it as an abstraction h. dll(p) = if (h = null) then true else h!prev = prev and h!next. dll(h) checkers level-type inference on checker definitions splitting and interpreting update Compare and contrast summarizing manual code review and our automated shape analysis abstract interpretation Xisa shape analyzer Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 19
Overview: Split summaries to interpret updates precisely Want abstract update to be “exact”, that is, to update one “concrete memory cell”. The example at a high-level: iterate using cur changing the doubly-linked list from purple to red. Challenge: How does the analysis “split” summaries and know where to “split”? l l split at cur l update cur purple to red cur l cur Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 20
“Split forward” by unfolding inductive definition l dll(cur, p) p cur get: cur!next l Ç null h. dll(p) = if(h =null) then cur l p n cur Analysis doesn’t forget the empty case dll(n, cur) true else h!prev= p and h!next. dll(h) Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 21
“Split backward” also possible and necessary l “dll segment” p n dll(n, cur) cur!prev!next = cur!next; cur for each node cur in list l { remove cur if duplicate; } How does the analysis do this unfolding? assert l is sorted, doubly. Why is this unfolding allowed? linked with no duplicates; (Key: Segments are also inductively defined) get: cur!prev!next Technical Details: l null n cur How l Ç dll(n, cur) [POPL’ 08] h. dll(p) = does the analysis know to do thisif unfolding? (h =null) then “dll segment” p 0 n cur dll(n, cur) true else h!prev= p and h!next. dll(h) Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 22
Roadmap: Components of Xisa Derives additional information to guide unfolding h. dll(p) = if (h = null) then true else h!prev = prev and h!next. dll(h) level-type inference on checker definitions How do we decide where to unfold? splitting and interpreting update summarizing checkers Contribution: Turns testing code into specification for static analysis abstract interpretation Xisa shape analyzer … to be discussed this afternoon Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 23
Summary of interpreting updates Splitting of summaries needed for precision Unfolding checkers is a natural way to do splitting When checker traversal matches code traversal Checker parameter type analysis Useful for guiding unfolding in difficult cases, for example, “back pointer” traversals Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 24
Times negligible for data structure operations (often in sec or 1/10 sec) Results: Performance Expressiveness: Expressiveness Different data structures Max. Num. Graphs at a Program Pt Benchmark singly-linked list reverse doubly-linked list copy doubly-linked list remove 1 Analysis Time (ms) ms TVLA: 290 ms 1 Space Invader only analyzes 2 lists (built-in) 5 1. 0 1. 5 5. 4 17. 9 doubly-linked list remove and back 5 18. 1 search tree with parent insert 3 TVLA: 850 ms 16. 6 search tree with parent insertand back 5 64. 7 two-level skip list rebalance 1 11. 7 Linux scull driver (894 loc) (char arrays ignored, functions inlined) 4 3969. 6 Verified shape invariant as given by the checker is preserved across the operation. Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 25
Demo: Doubly-linked list reversal Body of loop over the elements: elements Swaps the next and prev fields of curr. Already reversed segment Node whose next and prev fields were swapped Not yet reversed list http: //www. cs. colorado. edu/~bec/ Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 26
Experience with the tool Checkers are easy to write and try out – Enlightening (e. g. , red-black tree checker in 6 lines) – Harder to “reverse engineer” for someone else’s code – Default checkers based on types useful Future expressiveness and usability improvements – Pointer arithmetic and arrays (in progress) – More generic checkers: polymorphic “element kind higher-order parameterized unspecified” by other predicates Future evaluation: Bor-Yuh Evan Chang 張博聿, user University study of Colorado at Boulder - End-User Shape Analysis 27
Near-term future work: Exploiting common specification framework Scenario: Scenario Code instrumented with lots of checker calls (perhaps automatically with object invariants) assert( mychecker(x) ); // … operation on x … assert( mychecker(x) ); • Very slow to execute • Hard to prove statically (in general) Can we prove parts statically? Static Analysis View: Testing View: Hybrid checking Incrementalize invariant checking Example: Example Insert in a sorted list l u v w Preservation of sortedness shown statically Emit run-time check for new element: u · v · w Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 28
Conclusion Extensible Inductive Shape Analysis precision demanding program analysis improved by novel user interaction Developer: Gets results corresponding to intuition Analysis: Focused on what’s important to the developer Practical precise tools for better software with an end-user approach! Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis 29
Programming Languages Research at the University of Colorado, Boulder
Who we are Faculty Amer Diwan Jeremy Siek Bor-Yuh Evan Chang Sriram Sankaranarayanan Ph. D. Students Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder 31
Outline • Gradual Programming – A new collaborative project involving Diwan, Jeremy Siek, and myself Amer • Brief Sketches of Other Activities Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder 32
Gradual Programming: Bridging the Semantic Gap
Have you noticed a time where your program is not optimized where you expect? Observation: Observation A disconnect between programmer intent and program meaning “I need a map data structure” Load class file Run class initialization Create hashtable semantic gap Problem: Problem Tools (IDEs, checkers, optimizers) have no knowledge of what the programmer cares about … hampering programmer productivity, software reliability, and execution efficiency Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder 34
Example: Iteration Order Must specify an iteration order class Open. Array extends Object { even when it should not matter private Double data[]; public boolean contains(Object look. For) { for (i = 0; i < data. length; i++) { if (data[i]. equals(look. For)) return true; } return false; } } Compiler cannot choose a different iteration order (e. g. , parallel) parallel Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder 35
Wild and Crazy Idea: Use Non-Determinism • Programmer starts with a potentially non-deterministic program • Analysis identifies instances of “underdeterminedness” • Programmer eliminates “underdeterminedness” Question: Question What does this mean? “over-determined” just right Is it “under-determined”? class Open. Array extends Object { Depends, is the Response private Double. Response: data[]; iteration order important? public boolean contains(Object look. For) { for (i 0 =. . 0; data. length-1 i i < data. length; { i++) { if (data[i]. equals(look. For)) return true; } return false; } } Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder starting point “under-determined” 36
Let’s try a few program variants public boolean contains(Object look. For) { for (i = 0; i < data. length; i++) { if(data[i]. equals(look. For)) return true; } return false; } public boolean contains(Object look. For) { for (i = data. length-1; i >= 0; i--) { if(data[i]. equals(look. For)) return true; } return false; } public boolean contains(Object look. For) { parallel_for (0, data. length-1) i => { if(data[i]. equals(look. For)) return true; } return false; } Do they compute the same result? Approach: Approach Try to verify equivalence of program variants up to a specification Yes Pick any one No Ask user What about here? Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder 37
Surprisingly, analysis says no. Why? Exceptions! a. data= null a. contains( ) left-to-right iteration returns true right-to-left iteration throws Null. Pointer. Exception Need user interaction to refine specification that captures programmer intent Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder 38
Proposal Summary • “Fix semantics per program”: Abstract constructs with many possible concrete implementations • Apply program analysis to find inconsistent implementations • Interact with the user to refine the specification • Language designer role can enumerate the possible implementations Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder 39
Bridging the Semantic Gap “I need a map data structure” “Yes, I need iteration in sorted order” “Looks like iterator order matters for your program” “Let’s use a balanced binary tree (Tree. Map)” Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder 40
Other Activities
Formal Methods Prof. Sriram Sankaranarayanan (CS) Cyber-physical systems verification – hybrid automata theory, control systems verification, analysis of Simulink and Stateflow diagrams – advanced mathematical techniques: • convex optimization: linear and semi-definite • differential equations: set-valued analysis • SMT solvers over non-linear theories – applications to automotive software (with NEC labs and GM labs) Prof. Aaron Bradley (ECEE) Decision procedures, Model checking Prof. Fabio Somenzi (ECEE) Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder 42
Programming Languages and Analysis Prof. Amer Diwan (CS) Performance analysis of computer systems How do we know that we have not perturbed our data? Using machine learning and statistical techniques to reason about data Tool-assisted program transformations Algorithmic optimizations for performance Program metamorphosis for improving code quality Prof. Jeremy Siek (ECEE/CS) Gradual type checking: static (Java) dynamic (Python) Meta-programming: programs that write programs Compilers for optimizing scientific codes Prof. Bor-Yuh Evan Chang (CS) End-user program analysis Precise analysis (shape, collections) Interactive analysis refinement (type checking + symbolic evaluation) Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder 43
Applying to Colorado • Computer Science Department information http: //www. cs. colorado. edu/grad/admission/ • Deadlines Dec 1 for Fall (Sep 1 for Spring) • Graduate Advisor: Nicholas Vocatura nicholas. vocatura@colorado. edu • Talk to me about application fee waiver http: //www. cs. colorado. edu/~bec/ Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder 44
- Slides: 44