Towards Building Scalable Static Analysis Infrastructures Qirun Zhang
Towards Building Scalable Static Analysis Infrastructures Qirun Zhang
Program Analysis • Static vs. Dynamic – Under vs. Over approximate – Compile-time vs. Run-time • Dynamic analysis • Program analysis and Static analysis
• The Research Trend • Money Issues • Software errors cost the U. S. economy an estimated $59. 5 billion annually, $22. 2 billion could be eliminated by an improved testing infrastructure that enables earlier and more effective identification and removal of software defects. • -----National Institute of Standards and Technology (NIST)
Evolution of Commercial Static Analyzer • 1 st Generation (Flagging suspicious code) – Lint 1979 (Stephen C. Johnson) • 2 nd Generation (Path-sensitive, Interprocedural) – Stanford checker (Coverity) – http: //lwn. net/Articles/23273/ • 3 rd Generation (SAT, SMT) – Source : http: //www. coverity. com/library/pdf/Coverity_White_Paper. SAT-Next_Generation_Static_Analysis. pdf
Static Analysis is HARD • Commercial tools are expensive • Two components: parser and analyzer • Parser – engineering efforts • Analyzer – theoretical barriers
Parser
Parser • At the first glance – Recall what we do to process HTML files. • Does Regular Expression work for you? – Automaton Theory (http: //en. wikipedia. org/wiki/Context-free_language) • An illustrative example – Removing the comments in C • /**; **/printf/*/; /*/("/*//; //*/")/*/; //*/printf("*/"); //; /* • Example 2. c
Parser • Plan to implement your own parser? – Preprocessor? – Language Extensions? (c 89, gnu 89, c 94, c 99, gnu 99 etc. ) • Is “asm” a key word? • Have you ever tried ”int a; int bb[a]; ” ? – Who says C is simple? • Quoted from CIL http: //www. cs. berkeley. edu/~necula/cil/ • Example (1. c)
Parser • Regular Expression (No) • Parser (Hard) • Finally – via compilers – LLVM/Clang vs. Gcc ? – What do we want from Gcc? Or do you want to play with a monster with 1000+ KLOC ? • http: //www. cse. iitb. ac. in/~uday/gcc-workshop/downloads/workshopslides/gcc-compilation-intro. pdf • Example 1. c • My approach – Demo (apache, example 3. c)
Analyzer
• How hard is static analysis – Actually, it’s very hard. – So many evidences: • W. Landi. Undecidability of static analysis. ACM Letters on Programming Languages and Systems, 1: 323– 337, 1992. • G. Ramalingam. The undecidability of aliasing. ACM Transactions on Programming Languages and Systems, 16(5): 1467– 1471, 1994. • T. W. Reps. Undecidability of context-sensitive data-independence analysis. ACM Transactions on Programming Languages and Systems, 22(1): 162– 186, 2000. • A. Charlesworth. The undecidability of associativity and commutativity analysis. ACM Trans. Program. Lang. Syst. , 24(5): 554– 565, 2002. – What does it mean by undecidable? (previous CFG exp)
• What is done by previous researchers? – Data flow analysis frameworks • Traditional problems: Reaching definition, live variable, constant propagation… • Interprocedural, pointer analysis – Abstract Interpretations, constrain-based analysis, type-based analysis.
Data flow analysis- a taste • Liveness analysis – Definition – Motivation
Data flow analysis- a taste • * Ref. http: //www. cis. upenn. edu/~cis 570/slides/lecture 04. pdf
Data flow analysis- a taste
Data flow analysis- a taste
Data flow analysis- a taste • Big events in DFA – Restriction • Mnotone (Ullman, 1976) – Representation • Bit vector (generalized framework) (Uday, TOPLAS 1994 ) – Space? • BDD (John Whaley, PLDI 2004 Best paper) – Iterations? • …
What we have got so far… • Parser: How to get information from the source code • Analysis framework: How to handle the gathered information • And next. . ?
Building a Scalable Static Analysis: A Raod. Map • Three Phrase – Parser (done) – Analyzer (in progress) – Application (? ) • Analyzer -- The research trend? – How to handle loop? – Using SAT/SMT solver • Example ISSTA 08 • Application --A potential topic – Path-Sensitive Inference of Function Precedence Protocols ICSE 07
A concrete example • Objective – Giving the touch of undecidable – Berries in bounded model checking • Q&A?
- Slides: 20