Bandera Extracting Finitestate Models from Java Source Code
Bandera: Extracting Finite-state Models from Java Source Code Faculty James Corbett Matthew Dwyer John Hatcliff Students and Post-docs Shawn Laubach Corina Pasareanu Robby Hongjun Zheng Roby Joehanes Ritesh Desai Venkatesh Ranganath Oksana Tkachuk
Goal: Increase Software Reliability Trends: Size, complexity, concurrency, distributed Cost of software engineer……………. Cost of CPU cycle………………. . Future: Automated Fault Detection
The Dream void add(Object o) { buffer[head] = o; head = (head+1)%size; } Object take() { … tail=(tail+1)%size; return buffer[tail]; } OK Program Property 1: … Property 2: … … Requirement or Checker Error trace
Model Checking OK Finite-state model or Model Checker (F W) Temporal logic formula Error trace Line Line … Line 5: … 12: … 15: … 21: … 25: … 27: … 41: … 47: …
Why use Model Checking? l Automatically check, e. g. , – invariants, simple safety & liveness properties – absence of dead-lock and live-lock, – complex event sequencing properties, “Between the window open and the window close, button X can be pushed at most twice. ” In contrast to testing, gives complete coverage by exhaustively exploring all paths in system, l It’s been used for years with good success in hardware and protocol design This suggests that model-checking can complement existing software quality assurance techniques. l
What makes model-checking software difficult? OK Finite-state model (F W) or Model Checker Error trace Line Temporal logic formula 5: … 12: … 15: … 21: … Problems using existing checkers: Model construction l Property specification l State explosion l Output interpretation l
Model Construction Problem void add(Object o) { buffer[head] = o; head = (head+1)%size; } Object take() { … tail=(tail+1)%size; return buffer[tail]; } Program l Gap Model Checker Model Description Semantic gap: Programming Languages methods, inheritance, dynamic creation, exceptions, etc. Model Description Languages automata
What makes model-checking software difficult? OK Finite-state model (F W) or Model Checker Error trace Line Temporal logic formula 5: … 12: … 15: … 21: … Problems using existing checkers: Model construction l Property specification l State explosion l Output interpretation l
Property Specification Problem l Difficult to formalize a requirement in temporal logic “Between the window open and the window close, button X can be pushed at most twice. ” …is rendered in LTL as. . . []((open / <>close) -> ((!push. X / !close) U (close / ((push. X / !close) U (close / (!push. X U close)))))
Property Specification Problem Forced to state property in terms of model rather than source: l We want to write source level specifications. . . Heap. b. head == Heap. b. tail l We are forced to write model level specifications. . . (((_collect(heap_b) == 1) && (Bounded. Buffer_col. instance[_index(heap _b)]. head == Bounded. Buffer_col. instance[_index(heap _b)]. tail) ) || ((_collect(heap _b) == 3) && (Bounded. Buffer_col_0. instance[_index(heap _b)]. head == Bounded. Buffer_col_0. instance[_index(heap _b)]. tail) ) || ((_collect(heap _b) == 0) && TRAP))
What makes model-checking software difficult? OK Finite-state model (F W) or Model Checker Error trace Line Temporal logic formula 5: … 12: … 15: … 21: … Problems using existing checkers: Model construction l Property specification l State explosion l Output interpretation l
State Explosion Problem l Cost is exponential in the number of components Bit x 1, …, x. N l 2^N states Moore’s law and algorithm advances can help – Holzmann: 7 days (1980) ==> 7 seconds (2000) l Explosive state growth in software limits scalability
What makes model-checking software difficult? OK Finite-state model (F W) or Model Checker Error trace Line Temporal logic formula 5: … 12: … 15: … 21: … Problems using existing checkers: Model construction l Property specification l State explosion l Output interpretation l
Output Interpretation Problem Line Line … Line void add(Object o) { buffer[head] = o; head = (head+1)%size; } Object take() { … tail=(tail+1)%size; return buffer[tail]; } Program Gap Model Description 5: … 12: … 15: … 21: … 25: … 27: … 41: … 47: … Error trace Raw error trace may be 1000’s of steps long l Must map line listing onto model description l Mapping to source is made difficult by l – Semantic gap & clever encodings of complex features – multiple optimizations and transformations
Bandera: An open tool set for model-checking Java source code Graphical User Interface Optimization Control Checker Inputs Bandera Temporal Specification Model Checkers void add(Object o) { buffer[head] = o; head = (head+1)%size; } Object take() { … tail=(tail+1)%size; return buffer[tail]; } Java Source Transformation & Abstraction Tools Error Trace Mapping Bandera Checker Outputs
Addressing the Model Construction Problem void add(Object o) { buffer[head] = o; head = (head+1)%size; } Object take() { … tail=(tail+1)%size; return buffer[tail]; } Java Source Static Analyses Abstract Interpretation Slicing Optimizations Model Compiler Model Description Model extraction: compiling to model checker inputs: Numerous analyses, optimizations, two intermediate languages, multiple back-ends l Slicing, abstract interpretation, specialization l Variety of usage modes: simple. . . highly tuned l
Addressing the Property Specification Problem An extensible language based on field-tested temporal property specification patterns []((open / <>close) -> ((!push. X / !close) U (close / ((push. X / !close) U (close / (!push. X U close))))) Using the pattern system: 2 -bounded existence Between {open} and {close} {push. X} exists at. Most {2} times;
Addressing the State Explosion Problem Property void add(Object o) { buffer[head] = o; head = (head+1)%size; } … Java Source Model Compiler Model Descriptions Generate models customized wrt property! l Result: multiple models --- even as many as one per property l Aggressive customization via slicing, abstract interp
Addressing the Output Interpretation Problem void add(Object o) { buffer[head] = o; head = (head+1)%size; } Model Description Intermediate Representations Object take() { … tail=(tail+1)%size; return buffer[tail]; } Java Source Model Checker Model Compiler + simulator Error trace Like a debugger: error traces mapped back to source Run error traces forwards and backwards l Program state queried l Heap structures navigated l Locks, wait sets, blocked sets displayed l Line 5: … 12: … 15: … 21: …
Bandera Architecture Property Tool Abstraction Analyses Engine BIRC Translators BIR SPIN d. SPIN Java Jimple Parser SMV Slicer Error Trace Display Simulator JPF
Property Specification /** * observable * EXP Full: (head == tail); */ class Bounded. Buffer { Object [] buffer; int head, tail, bound; public synchronized void add(Object o) {…} public synchronized Object take () {…} } Requirement: If a buffer becomes full, it will eventually become non-full. Bandera Specification: Full. To. Non. Full: forall[b: Bounded. Buffer]. {!Full(b)} responds to {Full(b)} globally;
Property Specification
Property Specification
Property Specification
Property-directed Slicing indirectly relevant Slice mentioned in property Source program Resulting slice slicing criterion generated automatically from observables mentioned in the property l backwards slicing automatically finds all components that might influence the observables. l
Property-directed Slicing /** * @observable EXP Full: (head == tail) */ class Bounded. Buffer { Object [] buffer_; int bound; int head, tail; Slicing Criterion All statements that assign to head, tail. removed by slicing public synchronized void add(Object o) { while ( tail == head ) try { wait(); } catch ( Interrupted. Exception ex) {} buffer_[head] = o; head = (head+1) % bound; notify. All(); Included in slicing critirion. . . } } indirectly relevant
Property-directed Slicing Dependencies for concurrent Java Data Dependence Thread 1 Control Dependence Interference Dependence [SAS’ 99] Thread 2 z<0 x : = 3; Synchronization Dependence enter monitor(o) Ready Dependence notify(o) y = x + 1; enter monitor(o) z : = 4; wait(o) x : = z;
Abstraction Engine Collapses data domains via abstract interpretation: Code int x = 0; if (x == 0) x = x + 1; Data domains int (n<0) : neg (n==0): zero (n>0) : pos Signs x = zero; if (x == zero) x = pos; Signs neg zero pos
Abstraction Component Functionality PVS Concrete Abstract Inferred Variable Type x y done count …. o b int bool int Signs Object Buffer Jimple Signs Bool int. Abs …. Point Buffer BASL Compiler Abstraction Library Abstraction Engine Bandera Abstraction Specification Language Abstracted Jimple
Abstraction Specification abstraction Signs abstracts int begin TOKENS = { NEG, ZERO, POS }; abstract(n) begin n < 0 n == 0 n > 0 end -> {NEG}; -> {ZERO}; -> {POS}; operator + add Compiled begin (NEG , NEG) -> {NEG} ; (NEG , ZERO) -> {NEG} ; (ZERO, NEG) -> {NEG} ; (ZERO, ZERO) -> {ZERO} ; (ZERO, POS) -> {POS} ; (POS , ZERO) -> {POS} ; (POS , POS) -> {POS} ; (_, _)-> {NEG, ZERO, POS}; /* case (POS, NEG), (NEG, POS) */ end public class Signs { public static final int NEG = 0; // mask 1 public static final int ZERO = 1; // mask 2 public static final int POS = 2; // mask 4 public static int abstract(int n) { if (n < 0) return NEG; if (n == 0) return ZERO; if (n > 0) return POS; } public static int add(int arg 1, int arg 2) { if (arg 1==NEG && arg 2==NEG) return NEG; if (arg 1==NEG && arg 2==ZERO) return NEG; if (arg 1==ZERO && arg 2==NEG) return NEG; if (arg 1==ZERO && arg 2==ZERO) return ZERO; if (arg 1==ZERO && arg 2==POS) return POS; if (arg 1==POS && arg 2==ZERO) return POS; if (arg 1==POS && arg 2==POS) return POS; return Bandera. choose(7); /* case (POS, NEG), (NEG, POS) */ }
Specification Creation Tools abstraction Signs abstracts int begin TOKENS = { NEG, ZERO, POS }; abstract(n) begin n < 0 n == 0 n > 0 end -> {NEG}; -> {ZERO}; -> {POS}; Automatic Generation operator + add begin (NEG , NEG) -> {NEG} ; (NEG , ZERO) -> {NEG} ; (ZERO, NEG) -> {NEG} ; (ZERO, ZERO) -> {ZERO} ; (ZERO, POS) -> {POS} ; (POS , ZERO) -> {POS} ; (POS , POS) -> {POS} ; (_, _)-> {NEG, ZERO, POS}; end Example: Start safe, then refine: +(NEG, NEG)={NEG, ZERO, POS} Proof obligations submitted to PVS. . . Forall n 1, n 2: neg? (n 1) and neg? (n 2) implies not pos? (n 1+n 2) Forall n 1, n 2: neg? (n 1) and neg? (n 2) implies not zero? (n 1+n 2) Forall n 1, n 2: neg? (n 1) and neg? (n 2) implies not neg? (n 1+n 2)
Abstraction Library Current Library Contains: l Range(i, j) : i. . j modeled precisely, e. g. , – Range(0, 0) is the signs abstraction – Range(2, 4) has tokens {lt 2, 2, 3, 4, gt 4} l Modulo(k), e. g. , – Modulo(2) is the even-odd abstraction l Specific(v, …) : identifies values of interest, e. g. , – Specific(10) has tokens {eq 10, not 10} l User extendable for base type predicates
Back End l Bandera Intermediate Representation (BIR) – guarded command language – includes: locks, threads, references, heap – info to help translators (live vars, invisible) entermonitor r 0 r 1. count = 0; … Jimple BIR loc s 5: live { r 0, r 1 } when lock. Avail(r 0. lock) do { lock(r 0. lock); } goto s 6; loc s 6: live { r 1 } when true do invisible { r 1. count = 0; } goto s 7;
Bounded Buffer BIR process Bounded. B() Bounded. Buffer_ref = ref { Bounded. Buffer_col, Bounded. Buffer_col_0 }; Bounded. Buffer_rec = record { bound_ : range -1. . 4; head_ : range -1. . 4; tail_ : range -1. . 4; BIRLock : lock wait reentrant; }; Bounded. Buffer_col : collection [3] of Bounded. Buffer_rec; Bounded. Buffer_col_0 : collection [3] of Bounded. Buffer_rec; ……. ………. loc s 34: live { b 2, b 1, add_JJJCTEMP_0, add_JJJCTEMP_6, add_JJJCTEMP_8 } when true do invisible { add_JJJCTEMP_8 : = (add_JJJCTEMP_6 % add_JJJCTEMP_8); } goto s 35; loc s 35: live { b 2, b 1, add_JJJCTEMP_0, add_JJJCTEMP_8 } when true do { add_JJJCTEMP_0. head_ : = add_JJJCTEMP_8; } goto s 36; loc s 36: live { b 2, b 1, add_JJJCTEMP_0 } when true do { notify. All(add_JJJCTEMP_0. BIRLock); } goto s 37; loc s 37: live { b 2, b 1, add_JJJCTEMP_0 } when true do { unlock(add_JJJCTEMP_0. BIRLock); } goto s 38;
Bounded Buffer Promela typedef Bounded. Buffer_rec { type_8 bound_; type_8 head_; type_8 tail_; type_18 BIRLock; } … … loc_25: atomic { printf("BIR: 25 0 1 OKn"); if : : (_collect(add_JJJCTEMP_0) == 1) -> add_JJJCTEMP_8 = Bounded. Buffer_col. instance[_index(add_JJJCTEMP_0)]. tail_; : : (_collect(add_JJJCTEMP_0) == 2) -> add_JJJCTEMP_8 = Bounded. Buffer_col_0. instance[_index(add_JJJCTEMP_0)]. tail_; : : else -> printf("BIR: 25 0 1 Null. Pointer. Exceptionn"); assert(0); fi; goto loc_26; }
Translators Plug-in component that interfaces to specific model checker – Translates BIR to checker input language – Parses output of checker for error trace l Currently – SPIN, d. SPIN, SMV translators complete – JPF (from NASA Ames) integrated – XMC, FDR translators in progress l
Case Studies l l l Small examples thus far (< 2000 loc) – illustrating use of property-pattern system and other components Scheduler from DEOS real-time OS kernel – (1600, 22 classes, seven tasks) Now trying systems up to 20, 000 loc – collection of 15 open-source 100% pure Java – Jigsaw web-server from W 3 C – Tomcat, James (from Apache/Jakarta) In general, 1 -2 minutes for model extraction on (~2000 k systems) State space reductions can dramatically reduce cost
Summary l Bandera provides an open platform for experimentation l Separates model checking from extraction – uses existing model checkers – supports multiple model checkers l Specialize models for specific properties using automated support for slicing, abstraction, etc. l Designed for extensibility – well-defined internal representations and interfaces l We hope this will contribute to the definition of APIs for software model-checkers
Other Work on Software Model-checking l Java – JPF (NASA Ames) – JCAT (Torino) – Java to SAL (Stanford) l. C – SLAM (Microsoft Research) – AX, Fea. Ver (Lucent)
Current Status A reasonable subset of concurrent Java – not handled: recursive methods, exceptions, inner classes, native methods, libraries(*) l Public release: October 2000 l http: //www. cis. ksu. edu/santos/bandera Demo tomorrow morning
- Slides: 40