Unleashing the Power of Static Analysis Manuvir Das

  • Slides: 39
Download presentation
Unleashing the Power of Static Analysis Manuvir Das Principal Researcher Center for Software Excellence

Unleashing the Power of Static Analysis Manuvir Das Principal Researcher Center for Software Excellence Microsoft Corporation 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06

Talking the talk … • Static analysis tools can make a huge impact on

Talking the talk … • Static analysis tools can make a huge impact on how software is engineered • The trick is to properly balance research with a focus on deployment • The Center for Software Excellence (CSE) at Microsoft is doing this (well? ) today 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 2

… walking the walk • CSE impact on Windows Vista – Found 100, 000+

… walking the walk • CSE impact on Windows Vista – Found 100, 000+ fixed bugs – Added 500, 000+ specifications – Answered thousands of emails • We are program analysis researchers – But we measure our success in adoption – And we feel the pain of the customer 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 3

Context • The Nail (Windows) – Manual processes do not scale to real software

Context • The Nail (Windows) – Manual processes do not scale to real software • The Hammer (Static Analysis) – Automated methods for searching programs • The Carpenter (CSE) – A systematic, heavily automated, approach to improving the quality of software 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 4

What is static analysis? • grep == static analysis • static analysis == grep

What is static analysis? • grep == static analysis • static analysis == grep • syntax trees, CFGs, alias analysis, dataflow analysis, dependency analysis, binary analysis, symbolic evaluation, model checking, specifications, … 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 5

Roadmap • Engineering process • Static analysis tools • Lessons 8/17/06 Unleasing Static Analysis,

Roadmap • Engineering process • Static analysis tools • Lessons 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 6

Engineering process 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 7

Engineering process 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 7

Build Architecture Main Branch Team Branch …… Desktop 8/17/06 …… …… Team Branch Desktop

Build Architecture Main Branch Team Branch …… Desktop 8/17/06 …… …… Team Branch Desktop Unleasing Static Analysis, Manuvir Das, SAS ’ 06 8

Quality Gates Main Branch Team Branch …… Desktop Team Branch …… …… Team Branch

Quality Gates Main Branch Team Branch …… Desktop Team Branch …… …… Team Branch Desktop • Lightweight tools – run on developer desktop & feature branches – issues tracked within the program artifacts • Enforced by rejection at gate 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 9

Central Bug Filing Main Branch Team Branch …… Desktop Team Branch …… …… Team

Central Bug Filing Main Branch Team Branch …… Desktop Team Branch …… …… Team Branch Desktop • Heavyweight tools – run on main branch – issues tracked through a central bug database • Enforced by bug cap 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 10

Static analysis tools 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 11

Static analysis tools 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 11

1. Code correctness • Reject code with null pointer dereferences, uninitialized memory, resource leaks,

1. Code correctness • Reject code with null pointer dereferences, uninitialized memory, resource leaks, … • Inter-procedural simulation – PREfix – Process the call graph bottom-up – Perform symbolic evaluation on a fixed number of paths through every function – Build incomplete symbolic function models – Use symbolic state to avoid infeasible paths – Report defects when bad states arise 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 12

2. Integer overflow • Reject code with potential security holes due to unchecked integer

2. Integer overflow • Reject code with potential security holes due to unchecked integer arithmetic size 1 = … size 2 = … data = My. Alloc(size 1+size 2); for (i = 0; i < size 1; i++) data[i] = … • Construct an expression tree for every interesting expression in the code • Ensure that every operation is checked 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 13

3. Architecture layering • Reject code that breaks the component architecture of the product

3. Architecture layering • Reject code that breaks the component architecture of the product – No dependencies from lower layers of the system to higher layers of the system • Dependency analysis tool – Ma. X – Construct a graph of dependencies between binaries (DLLs) in the system • Obvious : call graph • Subtle : registry, RPC, … 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 14

4. Security • Problem – A security issue is discovered through internal testing, or

4. Security • Problem – A security issue is discovered through internal testing, or in the field (MSRC, Watson) • Diagnosis – Identify the code pattern that caused the bug • Detection (defect by example) – Specify the code pattern formally in OPAL – Use checkers to find instances of the pattern 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 15

Reg. Key leak defect status = Reg. Open. Key. Ex. W( HKEY_LOCAL_MACHINE, L"SOFTWARE\Microsoft\Windows NT\Current.

Reg. Key leak defect status = Reg. Open. Key. Ex. W( HKEY_LOCAL_MACHINE, L"SOFTWARE\Microsoft\Windows NT\Current. Version\Perflib", 0 L, KEY_READ, & h. Local. Key); if (status == ERROR_SUCCESS) b. Local. Key = TRUE; … block of code that uses h. Local. Key … if (b. Local. Key) Close. Handle(h. Local. Key); • Bug: registry key is closed by calling the generic Close. Handle API – May fail to clean up some data that is specific to registry key data structures 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 16

Reg. Key leak code pattern • Search for code paths along which a registry

Reg. Key leak code pattern • Search for code paths along which a registry key is opened, and then closed using the generic Close. Handle API • Specification: – define a sequence of relevant actions – e. g. A(k)…B(h) – define the actions (e. g. A, B, k and h) 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 17

Reg. Key leak specification defect Reg. Key. Close. Handle { // A(x)…B(x) sequence Open.

Reg. Key leak specification defect Reg. Key. Close. Handle { // A(x)…B(x) sequence Open. Key(key); Close. Handle(handle) message “Registry key closed using generic Close. Handle API!” // A(x) pattern Open. Key(key) /Reg. Open. Key. Ex[AW](@d+)? $/ (_, _, &key) where (return == 0) // B(x) pattern Close. Handle(handle) /Close. Handle(@d+)? $/ (handle) } This is the entire specification effort for the codebase 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 18

Safety properties void main () { if (dump) Closed Open; fil = fopen(dump. File,

Safety properties void main () { if (dump) Closed Open; fil = fopen(dump. File, ”w”); if (p) x = 0; else x = 1; Open Print/Close Error Close Opened * Open if (dump) Close; fclose(fil); } 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 19

ESP • Symbolic state: FSA + execution state • Branch points: Does execution state

ESP • Symbolic state: FSA + execution state • Branch points: Does execution state uniquely determine branch direction? – Yes: process appropriate branch – No: split & update state, and process both branches • Merge points: Do states agree on FSA? – Yes: merge states – No: process states separately 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 20

ESP example entry [Closed] T dump F Open [Opened|dump=T] T p F x =

ESP example entry [Closed] T dump F Open [Opened|dump=T] T p F x = 0 [Opened|dump=T] [Opened|dump=T, x=0] T x = 1 dump 8/17/06 [Closed|dump=F] [Opened|dump=T, p=F, x=1] F Close [Closed|dump=T] [Closed|dump=F] exit Unleasing Static Analysis, Manuvir Das, SAS ’ 06 21

5. Concurrency • Deadlocks, data races, orphan locks, … • Sequential analysis of lock

5. Concurrency • Deadlocks, data races, orphan locks, … • Sequential analysis of lock sequences at every program point of every thread – Cycle in lock ordering: deadlock – Access without consistent locking: data race – Exit while holding critical section: orphan lock! • Inter-procedural dataflow analysis – ESPC – Instance of ESP – lock sequences control merge – Understands Win 32 locking semantics 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 22

6. Buffer overruns • Defect: a buffer access index is out of bounds •

6. Buffer overruns • Defect: a buffer access index is out of bounds • Detection: check that index is within bounds • Problem: where are the buffer bounds stored? – Tools must track buffer size from allocation to access – Exhaustive global analysis is infeasible • Solution: turn global analysis into local analysis – Standard Annotation Language (SAL) – Specify buffer sizes at function interfaces – Perform modular (one function at a time) analysis 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 23

SAL example 1 • wcsncpy [precondition] destination buffer must have enough allocated space wchar_t

SAL example 1 • wcsncpy [precondition] destination buffer must have enough allocated space wchar_t wcsncpy ( wchar_t *dest, wchar_t *src, size_t num ); wchar_t wcsncpy ( __pre __notnull __pre __writable. To(element. Count(num)) wchar_t *dest, wchar_t *src, size_t num ); wchar_t wcsncpy ( __out_ecount(num) wchar_t *dest, wchar_t *src, size_t num); 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 24

SAL example 2 • memcpy void * memcpy ( void * dest, void *

SAL example 2 • memcpy void * memcpy ( void * dest, void * src, size_t num ); void * memcpy ( __pre __notnull __pre __writable. To(byte. Count(num)) __post __readable. To(byte. Count(num)) void * dest, __pre __notnull __pre __deref __readonly __pre __readable. To(byte. Count(num)) void * src, size_t num ); void * memcpy ( __out_bcount_full(num) void * dest, __in_bcount(num) void * src, size_t num ); 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 25

SAL primer • Usage example: a 0 RT func(a 1 … an T par)

SAL primer • Usage example: a 0 RT func(a 1 … an T par) • Interface contracts ai : SAL annotation – pre, post, object invariants • Basic properties – null, readonly, valid, range, … • Buffer extents – writable. To(size), readable. To(size) • Buffer size formats – (byte|element)Count, end. Pointer, sentinel, … 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 26

SAL ecosystem • • esp. X/PREfast/… : Use annotations to find defects SALstats :

SAL ecosystem • • esp. X/PREfast/… : Use annotations to find defects SALstats : Identify parameters that should be annotated MIDL Compiler : Translate MIDL directives to annotations SALinfer : Infer annotations using global static analysis 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 27

SALinfer example void work() { int tmp[200]; wrap(tmp, 200); } size(tmp, 200) void wrap(int

SALinfer example void work() { int tmp[200]; wrap(tmp, 200); } size(tmp, 200) void wrap(int *buf, int len) { int *buf 2 = buf; int len 2 = len; zero(buf 2, len 2); } size(buf, len) size(buf 2, len 2) write(buf) void zero(int *buf, int len) { int i; for(i = 0; i <= len; i++) buf[i] = 0; } size(buf, len) write(buf) 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 write(buf 2) write(buf) 28

SALinfer example void work() { int tmp[200]; wrap(tmp, 200); } void wrap(__out_ecount(len) int *buf,

SALinfer example void work() { int tmp[200]; wrap(tmp, 200); } void wrap(__out_ecount(len) int *buf, int len) { int *buf 2 = buf; int len 2 = len; zero(buf 2, len 2); } void zero(__out_ecount(len) int *buf, int len) { int i; for(i = 0; i <= len; i++) buf[i] = 0; } 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 29

esp. X example void zero(__out_ecount(len) int *buf, int len) { int i; for(i =

esp. X example void zero(__out_ecount(len) int *buf, int len) { int i; for(i = 0; i <= len; i++) Constraints: buf[i] = 0; } (C 1) i >= 0 (C 2) i <= len assume(size. Of(buf) == len) (C 3) size. Of(buf) == len for(i = 0; i <= len; i++) inv (i >= 0 && i <= len) assert(i >= 0 && i < size. Of(buf)) buf[i] = 0; 8/17/06 Goal: i >= 0 && i < size. Of(buf) Subgoal 1: i >=0 by (C 1) Subgoal 2: i < len by FAIL (C 3) size. Of(buf) Warning: Cannot validate buffer access. Overflow occurs when i == len Unleasing Static Analysis, Manuvir Das, SAS ’ 06 30

SAL impact • Windows Vista – Mandate: Annotate 100, 000 mutable buffers – Developers

SAL impact • Windows Vista – Mandate: Annotate 100, 000 mutable buffers – Developers annotated 500, 000+ parameters – Developers fixed 20, 000+ bugs • Office 12 – Developers fixed 6, 500+ bugs • Visual Studio, SQL, Exchange, … • External customers – CRT + Windows headers SAL annotated – SAL aware compiler shipped with VS 2005 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 31

SAL evaluation Vista – mutable string buffer parameters • Annotation cost: [–] 100, 000

SAL evaluation Vista – mutable string buffer parameters • Annotation cost: [–] 100, 000 parameters required annotations [+] 4 out of 10 automatic • Defect detection value: [+] 1 buffer overrun exposed per 20 annotations • Locked in progress: [+] 9. 4 out of 10 buffer accesses validated 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 32

Lessons 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 33

Lessons 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 33

Forcing functions for change • Gen 1: Manual Review – Too many code paths

Forcing functions for change • Gen 1: Manual Review – Too many code paths to think about • Gen 2: Massive Testing – Inefficient detection of simple errors • Gen 3: Global Program Analysis – Delayed results • Gen 4: Local Program Analysis – Lack of calling context limits accuracy • Gen 5: Specifications 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 34

Acceptance of specifications • Developers like incremental specs – No specifications, no bugs •

Acceptance of specifications • Developers like incremental specs – No specifications, no bugs • Developers like useful specs – More specifications, more real bugs • Developers like informative specs – Make implicit information explicit – Avoid repeating what the code says 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 35

Defect detection myths • Soundness matters – sound == find only real bugs –

Defect detection myths • Soundness matters – sound == find only real bugs – The real measure is Fix Rate • Completeness matters – complete == find all the bugs – There will never be a complete analysis • Developers only fix real bugs – Developers fix bugs that are easy to fix, and – Unlikely to introduce a regression 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 36

Theory is important • Fundamental ideas have been crucial – Hoare logic – Dataflow

Theory is important • Fundamental ideas have been crucial – Hoare logic – Dataflow analysis – Abstract interpretation – Graph algorithms – Context-sensitive analysis – Alias analysis 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 37

Summary • Static analysis tools can make a huge impact on how software is

Summary • Static analysis tools can make a huge impact on how software is engineered • The trick is to properly balance research with a focus on deployment • The Center for Software Excellence (CSE) at Microsoft is doing this (well? ) today 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 38

http: //www. microsoft. com/cse http: //research. microsoft. com/manuvir © 2006 Microsoft Corporation. All rights

http: //www. microsoft. com/cse http: //research. microsoft. com/manuvir © 2006 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. 8/17/06 Unleasing Static Analysis, Manuvir Das, SAS ’ 06 39