Efficient Checking of Component Specifications in Java Systems

  • Slides: 58
Download presentation
Efficient Checking of Component Specifications in Java Systems Steven P. Reiss Brown University October

Efficient Checking of Component Specifications in Java Systems Steven P. Reiss Brown University October 5, 2005 CHET 1

Our Goal n To Improve Programming n n n More reliable More secure More

Our Goal n To Improve Programming n n n More reliable More secure More robust More understandable Easier To Deal With Real Systems n n n Not yesterday’s Some today’s Worrying about tomorrow’s CHET 2

Model Checking n Is the next great thing for programmers n n n But

Model Checking n Is the next great thing for programmers n n n But with minor exceptions it is not used n n Will find all our bugs automatically Will fix all our problems Not on an everyday basis Not for everyday programs Not by most programmers What is needed here n n n Must be “automatic” -- no effort required Must be fast -- “compilation speed” Must be helpful -- accurate, precise CHET 3

The Problem of Components n Java programs are built on class libraries n n

The Problem of Components n Java programs are built on class libraries n n Creators know how they should be used n n n Standard java libraries Open source libraries Libraries created for an application Each has its own pattern of usage Typically fail if not used correctly Make sure they are used correctly n n Throughout the program Each instance Statically With “real” Java programs CHET 4

The Solution Create a Component Specification Language Find Instances of Component Usage Check Each

The Solution Create a Component Specification Language Find Instances of Component Usage Check Each Instance for Validity CHET 5

Specification Language n Define how components should be used In a way that matches

Specification Language n Define how components should be used In a way that matches their use n Once for all potential instances n n So that it can be done by programmers n n And the specification can be understood Solution n Use finite automata n Over parameterized program events n Matches call sequences, variable usage, etc. CHET 6

Specification Instances n Components are used multiple times n List, Iterator, Xml. Writer, …

Specification Instances n Components are used multiple times n List, Iterator, Xml. Writer, … Need to handle each use separately n Uses must be found automatically n n n As specific as possible (statically) Solution Using flow analysis over the class files n Trigger events define instances n Other events used in particular instances n CHET 7

Checking Specifications n Each instance must be checked Independently n To ensure the specification

Checking Specifications n Each instance must be checked Independently n To ensure the specification is met n n Solution Create a simple model program per instance n Check if model program meets specification n n Using model checking techniques n Do all this efficiently CHET 8

Keeping this Practical n Most components are used through calls n n n Most

Keeping this Practical n Most components are used through calls n n n Most component usage is single threaded n n Control flow determines call sequences Data flow determines which calls Can often ignore thread interactions This is simpler than the general problem n n n Need to track fewer variables Need to worry less about variable values Need to worry less about interweaving CHET 9

CHET Overview Specifications Flow Analysis Application Instances Abstract Program Builder Program Checker Report CHET

CHET Overview Specifications Flow Analysis Application Instances Abstract Program Builder Program Checker Report CHET 10

Iterator Usage E 1 (New) -- TRIGGER RETURN(iterator) C 1 = result E 2

Iterator Usage E 1 (New) -- TRIGGER RETURN(iterator) C 1 = result E 2 (Has. Next) CALL (has. Next) this = C 1 E 3 (Next) CALL (next) this = C 1 E 4 (Has. More) CALL (has. More. Elements) this = C 1 E 5 (Next. Elt) Call(next. Element) this = C 1 CHET 11

Comodification Checking E 1 (New) -- TRIGGER ALLOC (Vector) C 1 = result E

Comodification Checking E 1 (New) -- TRIGGER ALLOC (Vector) C 1 = result E 2 (Add) CALL (add) this = C 1 E 3 (Add 1) CALL (add. Element) this = C 1 E 4 (Add 2) CALL (add. All) this = C 1 E 5 (Iter) RETURN (iterator) this = C 1, C 2 = return E 6 (Next) CALL (next) this = C 2 E 7 (Next 1) CALL (next. Element) this = C 2 CHET 12

Xml Writer Usage E 0 (New) -- TRIGGER ALLOC (Xml. Writer) C 1 =

Xml Writer Usage E 0 (New) -- TRIGGER ALLOC (Xml. Writer) C 1 = new E 1 (Begin) CALL (begin) this = C 1 E 2 (End) CALL (end) this = C 1 E 3 (Field) CALL (field) this = C 1 E 4 (Cdata) CALL (cdata) this = C 1 E 5 (Text) CALL (text) this = C 1 E 6 (Xml) CALL (write. Xml) this = C 1 E 7 (Close) CALL (close) this = C 1 CHET 13

File Open-Close E 1 (Create) - T ALLOC (File. Writer)) C 1 = new

File Open-Close E 1 (Create) - T ALLOC (File. Writer)) C 1 = new E 2 (Open) RETURN (<init>) this = C 1 E 3 (Close) CALL (close) E 4 (Nest) E 5 (Close. Next)) E 6 (Nest 1) CALL (<init>) arg 1 = C 2, C 3 = this = C 1 E 7 (Close. Nest 1) CALL (close) this = C 3 CALL (<init>) arg 1 = C 1, C 2 = this E 8 (Nest 2) CALL (<init> arg 1 = C 3, C 4 = this CALL (close) this = C 2 E 9 (Close. Nest 2) CALL (close) this = C 4 CHET 14

Catching Errors E 1 (Create) -- TRIGGER ALLOC (Error) C 1 = new E

Catching Errors E 1 (Create) -- TRIGGER ALLOC (Error) C 1 = new E 2 (Throw) THROW throw = C 1 E 3 (Catch) CATCH catch = C 1 CHET 15

Web Crawler Library E 1 (Begin) T CALL (begin. Processing)) C 1 = arg

Web Crawler Library E 1 (Begin) T CALL (begin. Processing)) C 1 = arg 1 E 7 (Redirect) CALL (set. Redirect. Html)) this = C 1 E 2 (Open) RETURN (open. Connection)) this = C 1 E 8 (Note. Err)) CALL (set. Error)) this = C 1 E 3 (Save) CALL (save. Html)) this = C 1 E 9 (Text) CALL (process. Text)) this = C 1 E 4 (File) CALL (get. Html. File)) this = C 1 E 10 (Text. Brk)) CALL (process. Text. Break)) this = C 1 E 5 (Header) CALL (save. Header)) this = C 1 E 11 (Finish) CALL (end. Processing)) this = C 1 E 6 (Links) CALL (save. Links)) this = C 1 CHET 16

Nested Locks E 0 (Alloc) -- TRIGGER ALLOC (Object) C 1 = new E

Nested Locks E 0 (Alloc) -- TRIGGER ALLOC (Object) C 1 = new E 1 (Lock_X) LOCK lock = C 1 E 2 (Unlock_X) UNLOCK lock = C 1 E 3 (Lock_Y) LOCK C 2 = new E 4 (Unlock_Y) UNLOCK lock = C 2 CHET 17

Events & Parameters n n n n n CALL (caller this, argi, calling this)

Events & Parameters n n n n n CALL (caller this, argi, calling this) RETURN (this, return value) ENTRY (this, argi) FIELD (this) [set to int, null, nonnull] ALLOC (new object) CATCH (catch object) THROW (throw object) LOCK (lock object) UNLOCK (lock object) CHET 18

Why Event-Based Specification n ESP and others use code patterns n n n However

Why Event-Based Specification n ESP and others use code patterns n n n However they are hard to generalize n n Iterator can use next. Element or next Nested opens and alternatives Xml writer alternatives Events and automata generalize n n These are closer to programs And hence easier to understand Easy to define abstract patterns Still understandable by programmers CHET 19

Finding All Instances n Done using flow analysis n n n Of the program

Finding All Instances n Done using flow analysis n n n Of the program and its libraries Handling all specifications at once Each trigger event yields a source n n We determine where this source can flow This determines which events are relevant n n To this particular instance But its not that easy n n Multiple-parameter events Accurate flow and type analysis required CHET 20

Example Vector<A> v = … Iterator it = v. iterator(); while (it. has. Next())

Example Vector<A> v = … Iterator it = v. iterator(); while (it. has. Next()) { A x = it. next(); … } … for (it = v. iterator(); it. has. Next(); ) { A y = it. next(); … } CHET Trigger Source 21

Flow Analysis n Identify sources n n Tracking sources and where they flow n

Flow Analysis n Identify sources n n Tracking sources and where they flow n n Through symbolic execution Result: Determine at each location n n From trigger events What sources are used This lets us check event parameters n Trigger source used on call => event CHET 22

Flow Analysis Goals n Complete analysis Ensure we track all possible uses a source

Flow Analysis Goals n Complete analysis Ensure we track all possible uses a source n Must include libraries as well as user code n n Accurate analysis Must know types for virtual calls n Must understand full Java semantics n Must handle all methods (including native, etc. ) n CHET 23

Flow Analysis Techniques n Done at the byte code level n n n Full

Flow Analysis Techniques n Done at the byte code level n n n Full Interprocedural flow analysis n n Using a work queue approach Of user code and libraries Handling all the complexities of Java Selectively context sensitive n n Tracking types and values Through symbolic execution Flow sensitive, not path sensitive Tradeoff accuracy and speed n Accuracy where important, speed otherwise CHET 24

Flow Analysis Issues n Speed versus accuracy n n n What to track n

Flow Analysis Issues n Speed versus accuracy n n n What to track n n Start with the minimum possible Add more information to get needed accuracy Trigger sources; all other sources Java Issues that arose n n n n Static initializers Constructors Native methods Reflection Callbacks Data structures Exceptions CHET 25

What to Track: Sources n Local Sources Anything generated via a new operator n

What to Track: Sources n Local Sources Anything generated via a new operator n Track values stored in fields of the source n n Array Sources Created by new array operators n Track values stored in the array n n Fixed Sources Results from native methods, built-in values n Can be mutable (changed on a cast) n CHET 26

Sources n Model Sources Generated by trigger events n One-to-one association with instances n

Sources n Model Sources Generated by trigger events n One-to-one association with instances n n Field Sources Track the values of fields n Only for fields used in specifications n Determine where the fields are used n n Others n Privacy, … CHET 27

Values n Flow analysis deals with values These are sets of sources n Associated

Values n Flow analysis deals with values These are sets of sources n Associated with each field, local, stack, … n n Value contains additional information Data type (for type analysis) n Can. Be or Must. Be NULL flags n Integer value range (or indefinite) n n Operations applied symbolically CHET 28

Static Initializers n Problem n n Called implicitly at first use Must return before

Static Initializers n Problem n n Called implicitly at first use Must return before class can be used n n Accurate field analysis requires this But it can call methods of the class Some classes initialized by JVM Solution n Track whether initializer has been started Add some system classes by default Don’t process methods before started CHET 29

Constructors n Problem Most methods assume constructor done n Accurate field analysis requires this

Constructors n Problem Most methods assume constructor done n Accurate field analysis requires this n But constructors can be quite complex n n Solution Track current set of constructors we are in n Only process method if n n We have constructed an object of this class OR n We are called from within the constructor CHET 30

Native & Reflexive Methods n Problem n n These are hidden from static analysis

Native & Reflexive Methods n Problem n n These are hidden from static analysis Solution 1: Default handling Use a fixed source of return type n Use mutable sources where appropriate n n Solution 2: Internal Special handling n arraycopy : copy array values CHET 31

Native & Reflexive Methods n Solution 3: Resource-based return n n User specifies return

Native & Reflexive Methods n Solution 3: Resource-based return n n User specifies return type in resource file Can be specified as mutable On a function basis On a call-site basis Solution 4: Method substitution n Resource file can specify alternative method Thread. start => Thread. run Access. Controller. do. Privileged => run CHET 32

Native and Reflexive Methods n Solution 5: Ignore Resource file can specify calls to

Native and Reflexive Methods n Solution 5: Ignore Resource file can specify calls to ignore n Most calls to swing, awt, … are black boxes n Can be done by method, class or package n n With exceptions n Solution 6: User Substitution User can provide alternative dummy method n Use it as the replacement method n Complex uses of reflection n CHET 33

Callbacks n Problem n n Some callbacks are hidden in native code Callbacks need

Callbacks n Problem n n Some callbacks are hidden in native code Callbacks need to have proper arguments n n n For accurate analysis Lots of user code is through callbacks Solution n Note callbacks in resource file n n n Associate callback method with registration Provide calling sequence as well Simulate callbacks with proper arguments n Automatically during analysis CHET 34

Data Structures n Problem Maps, collections are hard to analyze n Expensive and inaccurate

Data Structures n Problem Maps, collections are hard to analyze n Expensive and inaccurate to look at code n n Solution Introduce prototype sources n With procedural models of methods n n Simulate what the methods do in the source n Don’t use the method code per se n Extend to iterators, etc. based on prototypes CHET 35

Prototype Map n Tracks the contents of the map Can track selective key-value pairs

Prototype Map n Tracks the contents of the map Can track selective key-value pairs n Tracks empty, non-empty, either n n Handles all the map operations Updating internal contents n Returning appropriate values n n Returns prototype iterators n That are aware of prototype contents CHET 36

Prototypes n Provide more accurate analysis n n Provide more efficient analysis n n

Prototypes n Provide more accurate analysis n n Provide more efficient analysis n n Know the type of items stored in table Avoid merging of multiple tables Know when tables are null and not Speed up of 30% Are relatively easy to implement n n Collections: < 900 lines of source Maps: < 500 lines of source CHET 37

Exceptions n Problem Normal exceptions are easy to handle n What to do with

Exceptions n Problem Normal exceptions are easy to handle n What to do with hidden exceptions n n catch (Throwable …) n Synchronized regions n Solution n Restrict analysis to explicit exceptions n Unless explicitly told not to CHET 38

Finding One Instance n Trigger event => Model Source n n This determines the

Finding One Instance n Trigger event => Model Source n n This determines the basic instance Where model source flows Determines event locations n Based on event type n Based on event parameters n CHET 39

Example n E 1 is the trigger n n E 2 occurs whenever n

Example n E 1 is the trigger n n E 2 occurs whenever n E 1 (New) -TRIGGER RETURN(iterator) C 1 = result E 2 (Has. Next)) CALL (has. Next)) this = C 1 E 3 (Next) CALL (next) this = C 1 E 4 (Has. More)) CALL (has. More. Elements) this = C 1 E 5 (Next. Elt)) Call(next. Element) this = C 1 n CHET Provides a model source M M flows to a call to Iterator. has. Next E 3, E 4, E 5 similarly 40

Multi-Parameter Specifications n n n Find all possible instances (statically) Start with model source

Multi-Parameter Specifications n n n Find all possible instances (statically) Start with model source for trigger Find all locations for next NEW event n n Based on flow of the model source Build a new instance for the source pair Continue to handle additional NEW sources Note that we have to consider all sets n And not just complete sets CHET 41

Example n E 1 is the trigger n n Writer constructor call n n

Example n E 1 is the trigger n n Writer constructor call n n E 1 (Create) ALLOC (File. Writer)) C 1 = new E 2 (Open) RETURN (<init>) this = C 1 E 3 (Close) CALL (close) this = C 1 E 4 (Nest) CALL (<init>) arg 1 = C 1, C 2 = this E 5 (Close. Next)) CALL (close) this = C 2 E 6 (Nest 1) CALL (<init>) arg 1 = C 2, C 3 = this E 7 (Close. Nest 1) CALL (close) this = C 3 E 8 (Nest 2) CALL (<init> arg 1 = C 3, C 4 = this E 9 (Close. Nest 2) CALL (close) this = C 4 n n With M as arg 1 Yields new source M 1 Instance <M, M 1> If additional call n n CHET Model source M With M 1 as arg 1 Build new instance 42

Where Are We n We have n Specified how components should be used n

Where Are We n We have n Specified how components should be used n Using parameterized automata n Found all instances of each specification n Using detailed flow analysis n Next we need to n Check each instance n By creating a model program n And looking at all its possible executions CHET 43

Checking an Instance n Build an abstract program for each instance n n n

Checking an Instance n Build an abstract program for each instance n n n Using flow-sensitive analysis Abstract program organized into routines Abstract program generates event sequences n n n Some nodes output events Determine all event sequences that can be generated Ensure that they are all valid wrt specification CHET 44

Abstract Programs n Methods represented by automata Each defined as a directed graphs n

Abstract Programs n Methods represented by automata Each defined as a directed graphs n Nodes of the graph represent actions n Arcs represent nondeterministic traversal n n Control flow embedded in nodes Calls, asynchronous calls n Actions can do tests (on variables, returns) n Actions can dead-end n n If is represented as two test nodes CHET 45

Sample Program CHET 46

Sample Program CHET 46

Sample Conditional CHET 47

Sample Conditional CHET 47

Abstract Program Actions n n n Enter a routine Exit a routine Call a

Abstract Program Actions n n n Enter a routine Exit a routine Call a routine Generate a particular event Set a variable to a given value n n n n Correspond to program variables Set the return value of the routine Test a variable or return value for a value Exit (call to System. exit) Asynchronous call of a routine Begin synchronized region End synchronized region (Wait, Notify) events CHET 48

Abstract Program Variables n Which variables are used in the program n n Can

Abstract Program Variables n Which variables are used in the program n n Can be given as part of the specification Otherwise determined automatically Using a separate cursory flow analysis n Determine which fields directly affect event generation in the abstract program n n Conditional using field branches around event n This is done before building the program CHET 49

Simplifying Abstract Programs n n Simplification essential for fast checking Eliminate routines obviously not

Simplifying Abstract Programs n n Simplification essential for fast checking Eliminate routines obviously not used n n Then apply FSA minimization techniques n n n Through a quick transitive closure check Throw away nodes with no effects Combine nodes where possible No effects n n n If no thread starts, then all thread operations If no conditionals for a variable (return), no sets Conditional without internal nodes Enter-exit only for a routine Call of empty routine CHET 50

Checking the Abstract Program n n Find all possible event sequences Compute all possible

Checking the Abstract Program n n Find all possible event sequences Compute all possible runs of the program n n n State = automata state + variable values For each routine and start state n n n Determining what state each run leaves the specification automata in Determine the possible states at each node Determining the set of final states This can then be applied recursively n Context-free model checking CHET 51

Checking Threaded Programs n Identify each potential thread (asynch) n Build a single automata

Checking Threaded Programs n Identify each potential thread (asynch) n Build a single automata for that thread n By inlining the calls of the various routines n By merging nested calls beyond a level n Then extend the state for checking To include all thread states n But limit the number of threads of each type n CHET 52

Finding an Example Run n Once we know a final state n n We

Finding an Example Run n Once we know a final state n n We want a concrete execution sequence Done as a breadth-first search Over abstract program execution n Find a shortest execution n May give up if too costly n CHET 53

Interactive Result Viewer CHET 54

Interactive Result Viewer CHET 54

Experience n Checking CHET on CLIME n n Coverage n n Manually verify all

Experience n Checking CHET on CLIME n n Coverage n n Manually verify all instances identified Correctness n n 521 tests, 21 errors detected (789/29) Check all errors, ensure valid Spot check others Check known failing cases False Positives n About 1/3 of errors flagged are false positives n n n System. exit Generally due to erroneous control flow (exceptions) Much less frequently due to overestimation of flows CHET 55

Experience -- Timings CHET 56

Experience -- Timings CHET 56

For More Information n Steven Reiss n n Spr@cs. brown. edu Software, papers, etc

For More Information n Steven Reiss n n Spr@cs. brown. edu Software, papers, etc n http: //www. cs. brown. edu/people/spr CHET 57

Questions / Comments CHET 58

Questions / Comments CHET 58