CS 5103 Software Engineering Lecture 15 Static Bug
CS 5103 Software Engineering Lecture 15 Static Bug Detection and Verification
Static bug detection Ø Ø Static bug detection is a minor approach for software quality assurance, compared with testing Compared to testing Ø Work for specific kinds of bugs Ø Sometimes not scalable Ø Generate false positives Ø Easy to start (no build, no setup, no install …) Ø Ø 2 Sometimes can guarantee the software to be free of certain kinds of bugs No need for debugging
State-of-art: static bug detection Ø Type-specific detection (Fixed Specification and improvement is provided) Ø Major or important type of bugs Ø Ø A large bunch of techniques for each kind of bugs Most of them have severe limitations preventing them from practical usage Specification based detection Ø 3 Null pointer, memory leak, unsafe cast, injection, buffer overflow, Dynamic SQL error, racing, deadlock, dead loop, html error, UI inconsistency, i 18 n bugs, … Model checking, symbolic execution, theorem proving
Specification Ø Ø Ø 4 A description of the correct behavior of software We must have formal specification to do static bug detection Three main types of specifications Ø Value Ø Temporal Ø Data Flow
Value Specification Ø Ø 5 The value (s) of one or several variable (s) must satisfy a certain constraint Example: Ø Final Exam Score <= 100 Ø sortedlist(0) >= sortedlist(1) Ø http_url. starts. With(“http”) Ø Sql_query belongs to Language_SQL
Temporal Specification Ø Ø Ø 6 Two events (or a series of events) must happen in a certain order Example Ø lock() -> unlock() Ø file. open() -> file. close() and file. open() -> file. read() Ø They are different, right? Temporal Logic Ø Lock() -> F(unlock()) Ø (!read())U(open())
Data Flow Specification Ø Ø Ø 7 Data from a certain source must / must not flow to a certain sink Example: Ø ! Contact Info -> Internet Ø Password -> encryption -> Internet Data Flow Specification are mainly for security usage
General Specifications Ø 8 Common behaviors of all software Ø a/b -> b!=0 Ø a. field -> a!=null Ø a[x] -> x<a. length() Ø p. malloc() -> p. free() Ø lock(s) -> unlock(s) Ø while(Condition) -> F(!Condition) Ø <script> xxx </script> -> ! User_input -> xxx Ø ! Hard-coded string -> User Interface Divide by 0 Null Pointer Reference Buffer Overflow Memory Leak deadlock Infinite Loop XSS I 18 n error
Checking Specifications Basic ways Ø Value Specifications Ø Ø Temporal Specification Ø Ø Model Checking Data Flow Specification Ø 9 Symbolic execution Graph traversal (Data Dependence Graph)
Static symbolic execution Ø Basic Example y = read(); y = 2 * y; if (y <= 12) y = 3; else y = y + 1; print ("OK"); Prove y > 0? Here T is the condition for the statement to be executed, (y=s) is the relationship of all variables to the inputs after the statement is executed T (y=s), s is a symbolic variable for input T (y=2*s) T^y<=12 (y = 3) T^!(y<=12) (y= 2*s + 1) T^ 2*s<=12 (y= 3 ) | T^!(2*s<=12) (y=2*s + 1) (2*s <= 12 & y = 3) & y <= 0 Not Satisfiable !(2*s <= 12) & (y = 2*s + 1) & y<=0 Not Satisfiable
Static symbolic execution Ø Complex Example T (y=s), s is a symbolic variable for input y = read(); T (p = 1, y = s) p = 1; while(y < 10){ T (p = 1, y = s) T^ (y = s(y+=1, s p = 1) T^s<10 2<s+1<10 + 2, p = 2) | s+1<=2 (y = s + 2, p = 3) y = y + 1; if y >2 p = p + 1; T^!(2 … < s + 1< 10) (y = s + 1, p = 2) else p = p + 2; T^s + 1<=2 (y = s + 1, p = 3) } print (p); Prove p > 0? 11
Checking Specifications Basic ways Ø Value Specifications Ø Ø Temporal Specification Ø Ø Model Checking Data Flow Specification Ø 12 Symbolic execution Graph traversal (Data Dependence Graph)
Model Checking Ø Basic idea Ø Ø Ø 13 Transform the program to an automaton Program states are state of the automaton, and statements are transitions / edges Checking temporal properties on the automaton by traversing it
Model Checking: Model Building Ø Basic approach: Ø Use Control Flow Graph: Ø Ø Use Abstract states Ø Ø View all program states after a statement with same abstract values as ONE state Use Concrete values Ø 14 View all program states after a statement as ONE state View all program states after a statement with same concrete values as ONE state: usually impossible
An example with CFG-model Ø 15 Checking whether a file is closed in all cases boolean load(){ f. open(); line = f. read(); while(line!=null){ if(line. contains('key')){ f. close() return true; }else if(line. contains('value')){ f. close() } line = f. read(); } ==null return false; } ret f is not open Start opened new line read !=null key value none closed
An example with CFG-model Ø Traversing the model to find contrary examples f is not open Start opened new line read !=null key value none ==null 16 closed ret closed
An example with CFG-model Ø Read must before close f is not open Start opened new line read !=null key value none ==null 17 closed ret closed
Temporal Logic Ø Ø The basic idea of model checking is to find a certain path in the model that violate the specification Describe the sequential relationship among a number of events: the specification Ø Ø 18 So that any specification can just be read by a path finding tool Do not need to bother writing a path finding tool for each proof
Usage of Temporal Logic Ø Ø Describe the sequential relationship among a number of events U: until Ø PUQ means that P has to be true until Q is true Ø Ø Ø F: Future Ø FP means that P will be true some time in future Ø Ø 19 !read(f)Uopen(f) !close(f)Uopen(f) -> Fclose(f) -> !Fread(f)
Checking Specifications Basic ways Ø Ø Value Specifications Ø Symbolic execution Ø Abstract Interpretation Temporal Specification Ø Ø Data Flow Specification Ø 20 Model Checking Graph traversal (Data Dependence Graph)
Some Simple check with Graph Traversal Check x flows to w Check (!z used as divider)U(Z is written) 21
Problems of static bug detection Ø Lack of Specifications Ø Very rare project-specific formal specification Ø Solutions: Ø Ø Ø 22 General specifications (for typical bugs) Mining specifications (for API-specific, project-specifications) False Positives vs. Efficiency Ø More sensitivities -> higher cost Ø Path sensitivity is rarely achieved Ø Combination of all sensitivities -> Incomputable problems
State-of-practice: static bug detection Ø Ø 23 Findbugs Ø A tool developed by researchers from UMD Ø Widely used in industry for code checking before commit Ø The idea actually comes from Lint Ø A code style enforcing tool for C language Ø Find bad coding styles and raise warnings Ø Bad naming Ø Hard coded strings Ø …
Idea: do it reversely Ø Most static bug detection tools Ø Set up a specification (either from users or well-defined ones) Ø Ø Check all possible cases to guarantee that the specification hold Otherwise provide counter-examples Findbugs Ø 24 E. g. , Devisor should not be 0, null pointer should not be referred to, the salary of a personal cannot be negative Detect code patterns for bugs Ø E. g. , a = null, b = a. field; Ø str. replace(“ ”, “”);
Characters of Findbugs Ø Ø Based on existing concrete code patterns Check code patterns locally: only do innerprocedure analysis Ø Ø Perform bug ranking according to the probability and potential severity of bugs Ø Ø 25 What are the advantages and disadvantages of doing so? Probability: the bug is likely to be true Severity: the bug may cause severe consequence if not fixed
Application of Findbugs-like tools Ø Findbugs is adopted by a number of large companies such as Google Ø Ø Usually only the issues with highest confidence/severity are reported as issues A statistics in Google 2009: Ø Ø 26 More than 4000 issues are identified, in which 1700 bugs are confirmed, and 1100 are fixed. The software department of USAA is using PMD, an alternative of Findbugs
Patterns to be checked Ø 27 404 bug patterns in 6 major categories Ø Bad Practice / Dodgy code Ø Correctness Ø Internationalization Ø Vulnerability / Security Ø Multithread correctness Ø Performance
Bad Practice / Dodgy code Ø Ø Hackish code, not stable and may harm future maintenance Examples: Ø Equals method should not assume type of object argument boolean Equals(Object o){ Myclass my = (Myclass)o; return my. id = this. id; } Ø Abstract class defines covariant compare. To() method int compare. To(Myclass obj){ … } 28
Correctness Ø Ø The code pattern may result in incorrect behavior of the software Examples: Ø DMI: Collections should not contain themselves List s = new …; … if(s. contains(s)){ … } Ø DMI: Invocation of hash. Code on an array Int[] x = new int[10]; … x. hashcode(); 29
Internationalization Ø Ø A code pattern that will hard future i 18 n of the software Example: Ø Use to. Upper. Case, to. Lower. Case on localized strings String s = get. Locale(key); s. to. Upper. Case(); Ø Perfrom tobytes() on localized strings String s = get. Locale(key); s. get. Bytes(); 30
Multi-thread correctness Ø Ø A code pattern that may cause incorrectness in multi-thread execution Examples Ø Synchronization on boxed primitive private static Boolean inited = Boolean. FALSE; . . . synchronized(inited) { if (!inited) { init(); inited = Boolean. TRUE; } }. . . 31
Vulnerability/Security Ø Ø The code pattern may result in vulnerability or security issues Examples: Ø SQL: A SQL query is generated from a non-constant String str = “select” + bb + ” ddd” + … server. execute(str); Ø This code directly writes an HTTP parameter to JSP output, which allows for a cross site scripting vulnerability Para = request. get. Parameter(key); out. print(Para); 32
Performance Ø Ø The code pattern may harm the performance of the software Examples: Ø SBSC: Method concatenates strings using + in a loop String s = ""; for (int i = 0; i < field. length; ++i) { s = s + field[i]; } String. Buffer buf = new String. Buffer(); for (int i = 0; i < field. length; ++i) { buf. append(field[i]); } String s = buf. to. String(); 33
Major problem: False positives Ø Overall precision Ø Ø Ø 34 5% to 10% on open source and industry projects Developers want to make sure they do not waste effort on a false positive Usually more bugs than developers can fix
Solution: Bug ranking Ø Ø Ø Ranking bug categories Some categories are more likely to be bugs than others How to give scores to each category? Ø Ø Ø 35 Check large number of issues in the history of software How large a proportion is fixed? Raise precision to about 30% in the 25% top ranked bugs
Findbugs Ø Ø Ø Disadvantages Ø Can not guarantee the software to be free of certain bugs Ø Still involve many false positives Advantages Ø Easy to start Ø Scalable Ø Relatively less false positives Some what like testing Ø 36 Becomes the most popular and practical static bug detection techniques
Review of Static Bug Detection Ø Specification-based static bug detection Ø Ø Value Specifications : Symbolic Execution, Abstract Interpretation Temporal Specifications: Model Checking Data Flow Specifications: Dependence Graph, Traversing Pattern-based static bug detection Ø Findbugs Ø Bug Ranking
- Slides: 37