Efficient Checking of Component Specifications in Java Systems


























































- Slides: 58

Efficient Checking of Component Specifications in Java Systems Steven P. Reiss Brown University October 5, 2005 CHET 1

Our Goal n To Improve Programming n n n More reliable More secure More robust More understandable Easier To Deal With Real Systems n n n Not yesterday’s Some today’s Worrying about tomorrow’s CHET 2

Model Checking n Is the next great thing for programmers n n n But with minor exceptions it is not used n n Will find all our bugs automatically Will fix all our problems Not on an everyday basis Not for everyday programs Not by most programmers What is needed here n n n Must be “automatic” -- no effort required Must be fast -- “compilation speed” Must be helpful -- accurate, precise CHET 3

The Problem of Components n Java programs are built on class libraries n n Creators know how they should be used n n n Standard java libraries Open source libraries Libraries created for an application Each has its own pattern of usage Typically fail if not used correctly Make sure they are used correctly n n Throughout the program Each instance Statically With “real” Java programs CHET 4

The Solution Create a Component Specification Language Find Instances of Component Usage Check Each Instance for Validity CHET 5

Specification Language n Define how components should be used In a way that matches their use n Once for all potential instances n n So that it can be done by programmers n n And the specification can be understood Solution n Use finite automata n Over parameterized program events n Matches call sequences, variable usage, etc. CHET 6

Specification Instances n Components are used multiple times n List, Iterator, Xml. Writer, … Need to handle each use separately n Uses must be found automatically n n n As specific as possible (statically) Solution Using flow analysis over the class files n Trigger events define instances n Other events used in particular instances n CHET 7

Checking Specifications n Each instance must be checked Independently n To ensure the specification is met n n Solution Create a simple model program per instance n Check if model program meets specification n n Using model checking techniques n Do all this efficiently CHET 8

Keeping this Practical n Most components are used through calls n n n Most component usage is single threaded n n Control flow determines call sequences Data flow determines which calls Can often ignore thread interactions This is simpler than the general problem n n n Need to track fewer variables Need to worry less about variable values Need to worry less about interweaving CHET 9

CHET Overview Specifications Flow Analysis Application Instances Abstract Program Builder Program Checker Report CHET 10

Iterator Usage E 1 (New) -- TRIGGER RETURN(iterator) C 1 = result E 2 (Has. Next) CALL (has. Next) this = C 1 E 3 (Next) CALL (next) this = C 1 E 4 (Has. More) CALL (has. More. Elements) this = C 1 E 5 (Next. Elt) Call(next. Element) this = C 1 CHET 11

Comodification Checking E 1 (New) -- TRIGGER ALLOC (Vector) C 1 = result E 2 (Add) CALL (add) this = C 1 E 3 (Add 1) CALL (add. Element) this = C 1 E 4 (Add 2) CALL (add. All) this = C 1 E 5 (Iter) RETURN (iterator) this = C 1, C 2 = return E 6 (Next) CALL (next) this = C 2 E 7 (Next 1) CALL (next. Element) this = C 2 CHET 12

Xml Writer Usage E 0 (New) -- TRIGGER ALLOC (Xml. Writer) C 1 = new E 1 (Begin) CALL (begin) this = C 1 E 2 (End) CALL (end) this = C 1 E 3 (Field) CALL (field) this = C 1 E 4 (Cdata) CALL (cdata) this = C 1 E 5 (Text) CALL (text) this = C 1 E 6 (Xml) CALL (write. Xml) this = C 1 E 7 (Close) CALL (close) this = C 1 CHET 13

File Open-Close E 1 (Create) - T ALLOC (File. Writer)) C 1 = new E 2 (Open) RETURN (<init>) this = C 1 E 3 (Close) CALL (close) E 4 (Nest) E 5 (Close. Next)) E 6 (Nest 1) CALL (<init>) arg 1 = C 2, C 3 = this = C 1 E 7 (Close. Nest 1) CALL (close) this = C 3 CALL (<init>) arg 1 = C 1, C 2 = this E 8 (Nest 2) CALL (<init> arg 1 = C 3, C 4 = this CALL (close) this = C 2 E 9 (Close. Nest 2) CALL (close) this = C 4 CHET 14

Catching Errors E 1 (Create) -- TRIGGER ALLOC (Error) C 1 = new E 2 (Throw) THROW throw = C 1 E 3 (Catch) CATCH catch = C 1 CHET 15

Web Crawler Library E 1 (Begin) T CALL (begin. Processing)) C 1 = arg 1 E 7 (Redirect) CALL (set. Redirect. Html)) this = C 1 E 2 (Open) RETURN (open. Connection)) this = C 1 E 8 (Note. Err)) CALL (set. Error)) this = C 1 E 3 (Save) CALL (save. Html)) this = C 1 E 9 (Text) CALL (process. Text)) this = C 1 E 4 (File) CALL (get. Html. File)) this = C 1 E 10 (Text. Brk)) CALL (process. Text. Break)) this = C 1 E 5 (Header) CALL (save. Header)) this = C 1 E 11 (Finish) CALL (end. Processing)) this = C 1 E 6 (Links) CALL (save. Links)) this = C 1 CHET 16

Nested Locks E 0 (Alloc) -- TRIGGER ALLOC (Object) C 1 = new E 1 (Lock_X) LOCK lock = C 1 E 2 (Unlock_X) UNLOCK lock = C 1 E 3 (Lock_Y) LOCK C 2 = new E 4 (Unlock_Y) UNLOCK lock = C 2 CHET 17

Events & Parameters n n n n n CALL (caller this, argi, calling this) RETURN (this, return value) ENTRY (this, argi) FIELD (this) [set to int, null, nonnull] ALLOC (new object) CATCH (catch object) THROW (throw object) LOCK (lock object) UNLOCK (lock object) CHET 18

Why Event-Based Specification n ESP and others use code patterns n n n However they are hard to generalize n n Iterator can use next. Element or next Nested opens and alternatives Xml writer alternatives Events and automata generalize n n These are closer to programs And hence easier to understand Easy to define abstract patterns Still understandable by programmers CHET 19

Finding All Instances n Done using flow analysis n n n Of the program and its libraries Handling all specifications at once Each trigger event yields a source n n We determine where this source can flow This determines which events are relevant n n To this particular instance But its not that easy n n Multiple-parameter events Accurate flow and type analysis required CHET 20

Example Vector<A> v = … Iterator it = v. iterator(); while (it. has. Next()) { A x = it. next(); … } … for (it = v. iterator(); it. has. Next(); ) { A y = it. next(); … } CHET Trigger Source 21

Flow Analysis n Identify sources n n Tracking sources and where they flow n n Through symbolic execution Result: Determine at each location n n From trigger events What sources are used This lets us check event parameters n Trigger source used on call => event CHET 22

Flow Analysis Goals n Complete analysis Ensure we track all possible uses a source n Must include libraries as well as user code n n Accurate analysis Must know types for virtual calls n Must understand full Java semantics n Must handle all methods (including native, etc. ) n CHET 23

Flow Analysis Techniques n Done at the byte code level n n n Full Interprocedural flow analysis n n Using a work queue approach Of user code and libraries Handling all the complexities of Java Selectively context sensitive n n Tracking types and values Through symbolic execution Flow sensitive, not path sensitive Tradeoff accuracy and speed n Accuracy where important, speed otherwise CHET 24

Flow Analysis Issues n Speed versus accuracy n n n What to track n n Start with the minimum possible Add more information to get needed accuracy Trigger sources; all other sources Java Issues that arose n n n n Static initializers Constructors Native methods Reflection Callbacks Data structures Exceptions CHET 25

What to Track: Sources n Local Sources Anything generated via a new operator n Track values stored in fields of the source n n Array Sources Created by new array operators n Track values stored in the array n n Fixed Sources Results from native methods, built-in values n Can be mutable (changed on a cast) n CHET 26

Sources n Model Sources Generated by trigger events n One-to-one association with instances n n Field Sources Track the values of fields n Only for fields used in specifications n Determine where the fields are used n n Others n Privacy, … CHET 27

Values n Flow analysis deals with values These are sets of sources n Associated with each field, local, stack, … n n Value contains additional information Data type (for type analysis) n Can. Be or Must. Be NULL flags n Integer value range (or indefinite) n n Operations applied symbolically CHET 28

Static Initializers n Problem n n Called implicitly at first use Must return before class can be used n n Accurate field analysis requires this But it can call methods of the class Some classes initialized by JVM Solution n Track whether initializer has been started Add some system classes by default Don’t process methods before started CHET 29

Constructors n Problem Most methods assume constructor done n Accurate field analysis requires this n But constructors can be quite complex n n Solution Track current set of constructors we are in n Only process method if n n We have constructed an object of this class OR n We are called from within the constructor CHET 30

Native & Reflexive Methods n Problem n n These are hidden from static analysis Solution 1: Default handling Use a fixed source of return type n Use mutable sources where appropriate n n Solution 2: Internal Special handling n arraycopy : copy array values CHET 31

Native & Reflexive Methods n Solution 3: Resource-based return n n User specifies return type in resource file Can be specified as mutable On a function basis On a call-site basis Solution 4: Method substitution n Resource file can specify alternative method Thread. start => Thread. run Access. Controller. do. Privileged => run CHET 32

Native and Reflexive Methods n Solution 5: Ignore Resource file can specify calls to ignore n Most calls to swing, awt, … are black boxes n Can be done by method, class or package n n With exceptions n Solution 6: User Substitution User can provide alternative dummy method n Use it as the replacement method n Complex uses of reflection n CHET 33

Callbacks n Problem n n Some callbacks are hidden in native code Callbacks need to have proper arguments n n n For accurate analysis Lots of user code is through callbacks Solution n Note callbacks in resource file n n n Associate callback method with registration Provide calling sequence as well Simulate callbacks with proper arguments n Automatically during analysis CHET 34

Data Structures n Problem Maps, collections are hard to analyze n Expensive and inaccurate to look at code n n Solution Introduce prototype sources n With procedural models of methods n n Simulate what the methods do in the source n Don’t use the method code per se n Extend to iterators, etc. based on prototypes CHET 35

Prototype Map n Tracks the contents of the map Can track selective key-value pairs n Tracks empty, non-empty, either n n Handles all the map operations Updating internal contents n Returning appropriate values n n Returns prototype iterators n That are aware of prototype contents CHET 36

Prototypes n Provide more accurate analysis n n Provide more efficient analysis n n Know the type of items stored in table Avoid merging of multiple tables Know when tables are null and not Speed up of 30% Are relatively easy to implement n n Collections: < 900 lines of source Maps: < 500 lines of source CHET 37

Exceptions n Problem Normal exceptions are easy to handle n What to do with hidden exceptions n n catch (Throwable …) n Synchronized regions n Solution n Restrict analysis to explicit exceptions n Unless explicitly told not to CHET 38

Finding One Instance n Trigger event => Model Source n n This determines the basic instance Where model source flows Determines event locations n Based on event type n Based on event parameters n CHET 39

Example n E 1 is the trigger n n E 2 occurs whenever n E 1 (New) -TRIGGER RETURN(iterator) C 1 = result E 2 (Has. Next)) CALL (has. Next)) this = C 1 E 3 (Next) CALL (next) this = C 1 E 4 (Has. More)) CALL (has. More. Elements) this = C 1 E 5 (Next. Elt)) Call(next. Element) this = C 1 n CHET Provides a model source M M flows to a call to Iterator. has. Next E 3, E 4, E 5 similarly 40

Multi-Parameter Specifications n n n Find all possible instances (statically) Start with model source for trigger Find all locations for next NEW event n n Based on flow of the model source Build a new instance for the source pair Continue to handle additional NEW sources Note that we have to consider all sets n And not just complete sets CHET 41

Example n E 1 is the trigger n n Writer constructor call n n E 1 (Create) ALLOC (File. Writer)) C 1 = new E 2 (Open) RETURN (<init>) this = C 1 E 3 (Close) CALL (close) this = C 1 E 4 (Nest) CALL (<init>) arg 1 = C 1, C 2 = this E 5 (Close. Next)) CALL (close) this = C 2 E 6 (Nest 1) CALL (<init>) arg 1 = C 2, C 3 = this E 7 (Close. Nest 1) CALL (close) this = C 3 E 8 (Nest 2) CALL (<init> arg 1 = C 3, C 4 = this E 9 (Close. Nest 2) CALL (close) this = C 4 n n With M as arg 1 Yields new source M 1 Instance <M, M 1> If additional call n n CHET Model source M With M 1 as arg 1 Build new instance 42

Where Are We n We have n Specified how components should be used n Using parameterized automata n Found all instances of each specification n Using detailed flow analysis n Next we need to n Check each instance n By creating a model program n And looking at all its possible executions CHET 43

Checking an Instance n Build an abstract program for each instance n n n Using flow-sensitive analysis Abstract program organized into routines Abstract program generates event sequences n n n Some nodes output events Determine all event sequences that can be generated Ensure that they are all valid wrt specification CHET 44

Abstract Programs n Methods represented by automata Each defined as a directed graphs n Nodes of the graph represent actions n Arcs represent nondeterministic traversal n n Control flow embedded in nodes Calls, asynchronous calls n Actions can do tests (on variables, returns) n Actions can dead-end n n If is represented as two test nodes CHET 45

Sample Program CHET 46

Sample Conditional CHET 47

Abstract Program Actions n n n Enter a routine Exit a routine Call a routine Generate a particular event Set a variable to a given value n n n n Correspond to program variables Set the return value of the routine Test a variable or return value for a value Exit (call to System. exit) Asynchronous call of a routine Begin synchronized region End synchronized region (Wait, Notify) events CHET 48

Abstract Program Variables n Which variables are used in the program n n Can be given as part of the specification Otherwise determined automatically Using a separate cursory flow analysis n Determine which fields directly affect event generation in the abstract program n n Conditional using field branches around event n This is done before building the program CHET 49

Simplifying Abstract Programs n n Simplification essential for fast checking Eliminate routines obviously not used n n Then apply FSA minimization techniques n n n Through a quick transitive closure check Throw away nodes with no effects Combine nodes where possible No effects n n n If no thread starts, then all thread operations If no conditionals for a variable (return), no sets Conditional without internal nodes Enter-exit only for a routine Call of empty routine CHET 50

Checking the Abstract Program n n Find all possible event sequences Compute all possible runs of the program n n n State = automata state + variable values For each routine and start state n n n Determining what state each run leaves the specification automata in Determine the possible states at each node Determining the set of final states This can then be applied recursively n Context-free model checking CHET 51

Checking Threaded Programs n Identify each potential thread (asynch) n Build a single automata for that thread n By inlining the calls of the various routines n By merging nested calls beyond a level n Then extend the state for checking To include all thread states n But limit the number of threads of each type n CHET 52

Finding an Example Run n Once we know a final state n n We want a concrete execution sequence Done as a breadth-first search Over abstract program execution n Find a shortest execution n May give up if too costly n CHET 53

Interactive Result Viewer CHET 54

Experience n Checking CHET on CLIME n n Coverage n n Manually verify all instances identified Correctness n n 521 tests, 21 errors detected (789/29) Check all errors, ensure valid Spot check others Check known failing cases False Positives n About 1/3 of errors flagged are false positives n n n System. exit Generally due to erroneous control flow (exceptions) Much less frequently due to overestimation of flows CHET 55

Experience -- Timings CHET 56

For More Information n Steven Reiss n n Spr@cs. brown. edu Software, papers, etc n http: //www. cs. brown. edu/people/spr CHET 57

Questions / Comments CHET 58