Aggressive Program Analysis Framework for Static Error Checking

  • Slides: 27
Download presentation
Aggressive Program Analysis Framework for Static Error Checking in Open 64 Hongtao Yu Wei

Aggressive Program Analysis Framework for Static Error Checking in Open 64 Hongtao Yu Wei Huo Zhao. Qing Zhang Xiao. Bing Feng Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences { htyu, huowei, zqzhang, fxb }@ict. ac. cn

Outline • • Introduction Framework Field-sensitive pointer analysis Flow- and context-sensitive dataflow analysis •

Outline • • Introduction Framework Field-sensitive pointer analysis Flow- and context-sensitive dataflow analysis • Conclusion INSTITUTE OF COMPUTING TECHNOLOGY

Introduction • Open 64 is a high performance compiler • However, existed scalar analysis

Introduction • Open 64 is a high performance compiler • However, existed scalar analysis of Open 64 is not precise enough to serve for static error checking. – The original interprocedural framework is • Flow-insensitive • Field-insensitive • Context-insensitive for some problems INSTITUTE OF COMPUTING TECHNOLOGY

Improvement • Our aim is to improve the interprocedural phase, to gain more precision

Improvement • Our aim is to improve the interprocedural phase, to gain more precision in analysis – To gain flow-sensitivity, we integrate the intraprocedural analysis phase into interprocedural phase – To gain context-sensitivity, transfer functions modeling procedure effects are computed for each procedure – To gain field-sensitivity, fields are distinguished by <base, offset, size>. INSTITUTE OF COMPUTING TECHNOLOGY

Error checking • Several error checking problems can be abstract as a common dataflow

Error checking • Several error checking problems can be abstract as a common dataflow problem and be solved using the fix point theory on semi-lattices – Uninitialized reference – Null pointer deference – File I/O behavior INSTITUTE OF COMPUTING TECHNOLOGY

Contributions • We have designed and implemented: – A flow- and context-sensitive interprocedural framework,

Contributions • We have designed and implemented: – A flow- and context-sensitive interprocedural framework, under which kinds of error checking are performed. – Two efficient field-sensitive pointer analysis. • One is a directly implementation of Steensgaard, B. Points-to Analysis by Type Inference of Programs with Structures and Unions. In Proceedings of the 6 th International Conference on Compiler Construction, 1996. • The other is an improvement of it INSTITUTE OF COMPUTING TECHNOLOGY

Framework IPL summay phase IPA_LINK Interprocedural Analyzer FICI = Flow- and Context-insensitive. FICS= Flow-insensitive

Framework IPL summay phase IPA_LINK Interprocedural Analyzer FICI = Flow- and Context-insensitive. FICS= Flow-insensitive but Context-sensitive. FSCS= Flow- and Context-sensitive. FICS and FSCS have not been implemented yet. FICI pointer analysis Build Call Graph pointer analysis Interprocedural control flow optimization Construct SSA Form for each procedure Static error checker INSTITUTE OF COMPUTING TECHNOLOGY FICS FSCS pointer analysis

Pointer analysis architecture • Employs several field-sensitive algorithms that differ in precision and efficiency.

Pointer analysis architecture • Employs several field-sensitive algorithms that differ in precision and efficiency. • Pointer analysis are performed in the increasing order of precision. – The first is a field-sensitive unification-based pointer analysis. – Each analysis is performed on the base of the former analysis so that we can obtain higher efficiency than performing the analysis separately. – Up to now, only the field-sensitive unification-based pointer analysis has been implemented. INSTITUTE OF COMPUTING TECHNOLOGY

Interprocedural control flow optimization • Dead Function Elimination (DFE) – Delete uninvoked functions •

Interprocedural control flow optimization • Dead Function Elimination (DFE) – Delete uninvoked functions • Fake Control Flow Elimination (FCFE) – Recognizes the program points where control flow must terminate – Flow- and context-sensitive problem – Based on Gated Single Assignment Form (GSA). INSTITUTE OF COMPUTING TECHNOLOGY

Fake Control Flow Elimination Taken from Hyper. SAT-1. 7 #define error(str) report(__FILE__, __LINE__, str,

Fake Control Flow Elimination Taken from Hyper. SAT-1. 7 #define error(str) report(__FILE__, __LINE__, str, ERROR) void parse. Cluster. File(preproc_t *p, char *name) { …. . tmp_fsize = ftell(fp); if (tmp_fsize > INT_MAX) { L 1: error("File too large!"); } else { fsize = (size_t)tmp_fsize; } rewind(fp); buf = xmalloc(fsize); …. . } void report(char *file, unsigned line, char *str, char sev) { if (sev == ERROR) { … L 2: exit(EXIT_UNKNOWN); } else if (sev == WARRNING) { printf(…); } else if (sev == NOTE) { printf(…); } } INSTITUTE OF COMPUTING TECHNOLOGY B 3 tmp_fsize = ftell(fp); B 1 tmp_fsize > INT_MAX B 2 N fsize = (size_t) tmp_fsize B 5 Y error ("File too large!") rewind(fp); buf = xmalloc(fsize) B 4

Fake Control Flow Elimination Taken from Hyper. SAT-1. 7 #define error(str) report(__FILE__, __LINE__, str,

Fake Control Flow Elimination Taken from Hyper. SAT-1. 7 #define error(str) report(__FILE__, __LINE__, str, ERROR) void parse. Cluster. File(preproc_t *p, char *name) { …. . tmp_fsize = ftell(fp); if (tmp_fsize > INT_MAX) { L 1: error("File too large!"); } else { fsize = (size_t)tmp_fsize; } rewind(fp); buf = xmalloc(fsize); …. . } void report(char *file, unsigned line, char *str, char sev) { if (sev == ERROR) { … L 2: exit(EXIT_UNKNOWN); } else if (sev == WARRNING) { printf(…); } else if (sev == NOTE) { printf(…); } } INSTITUTE OF COMPUTING TECHNOLOGY B 3 tmp_fsize = ftell(fp); B 1 tmp_fsize > INT_MAX B 2 N Y fsize = (size_t) tmp_fsize B 5 rewind(fp); buf = xmalloc(fsize) exit

Unification-based pointer analysis • We have implemented two efficient fieldsensitive pointer analysis. – One

Unification-based pointer analysis • We have implemented two efficient fieldsensitive pointer analysis. – One is a directly implementation of Steensgaard, B. Points-to Analysis by Type Inference of Programs with Structures and Unions. In Proceedings of the 6 th International Conference on Compiler Construction, 1996. – The other is an improvement of it INSTITUTE OF COMPUTING TECHNOLOGY

Example KLOC Field-insensitive Steensgaard Classification Field Ops Classes Max Min Average Field-sensitive Steensgaard Classification

Example KLOC Field-insensitive Steensgaard Classification Field Ops Classes Max Min Average Field-sensitive Steensgaard Classification Time Classes Max (secs) Min Average Time (secs) mcf 0. 9 562 7 527 1 80. 28 0. 02 34 376 1 16. 52 0. 02 bzip 2 2. 4 87 2 69 18 43. 5 0. 03 gzip 3. 6 244 4 222 4 61 0. 05 37 72 1 6. 59 0. 05 parser 7. 4 2375 18 2115 3 131. 94 0. 13 61 2003 1 38. 93 0. 11 vpr 8. 7 1649 23 1408 2 71. 69 0. 17 159 955 1 10. 37 0. 14 crafty 9. 7 830 6 268 30 138. 33 0. 5 74 268 1 11. 21 0. 22 twolf 15 6457 1 6457 0. 48 80 4495 1 80. 71 0. 23 vortex 25 8062 90 7631 1 89. 57 1. 17 274 7616 1 29. 42 0. 60 gap 28 11480 9 11459 1 1275. 5 1. 04 134 11023 1 85. 67 0. 60 INSTITUTE OF COMPUTING TECHNOLOGY

Experiment explanation • Example : – The benchmark name ; • KLOC: – The

Experiment explanation • Example : – The benchmark name ; • KLOC: – The size of the benchmark (line numbers counted by kilo lines); • Field OPs: – The number of indirect memory access to fields of structural objects; • Classes: – The number of alias classes of the total Field OPs above; memory access operations in the same alias class are regarded as aliased • Max: – the maximal number of Field OPs in the same alias class; • Min: – the minimal number of Field OPs in the same alias class; • Average: – the average number of Field OPs in the same alias class • Time: – the time for analyzing the benchmark; INSTITUTE OF COMPUTING TECHNOLOGY

Example KLOC Field-sensitive Steensgaard Classification Field Ops Classes Max Min Average Aggressively Field-sensitive Classification

Example KLOC Field-sensitive Steensgaard Classification Field Ops Classes Max Min Average Aggressively Field-sensitive Classification Time Classes Max (secs) Min Average Time (secs) mcf 0. 9 562 34 376 1 16. 52 0. 02 69 39 1 8. 14 0. 02 bzip 2 2. 4 87 2 69 18 43. 5 0. 03 7 26 6 12. 42 0. 03 gzip 3. 6 244 37 72 1 6. 59 0. 05 44 49 1 5. 54 0. 05 parser 7. 4 2375 61 2003 1 38. 93 0. 11 88 1989 1 26. 98 0. 13 • The main idea of improvement is to consider memory layout in high-level analysis in order to precisely distinguish fields of structure objects. INSTITUTE OF COMPUTING TECHNOLOGY

Field-sensitive Steensgaard Classification int i 1, *i 2, **i 3, **i 4; float f

Field-sensitive Steensgaard Classification int i 1, *i 2, **i 3, **i 4; float f 1, **f 2; struct { int a, *b, *c; } s 1, *s 2; struct { int d, *e; float f, *g } s 3, *s 4; s 2 = &s 1; s 4 = &s 3; f 2 = &s 4 ->g; *f 2 = &f 1; i 3 = &s 2 ->b; i 4 = &s 2 ->c; *i 4 = &i 1; i 2 = (int*) s 2; i 2 = (int*) s 4; INSTITUTE OF COMPUTING TECHNOLOGY τ1 τ7 τ2 τ3 τ8 τ9 τ4 τ5 τ6 i 2: s 4: i 3: i 4: f 2: s 1: s 3: τ11 τ10 τ1 τ2 τ3 τ4 τ5 τ6 τ7 τ7 s 1. a: s 3. d: s 1. b: s 3. e: s 1. c: s 3. f: s 3. g: i 1: f 1: τ8 τ8 τ9 τ9 τ10 τ10 τ11

Aggressive Field-sensitive Classification τ1 τ7 τ2 τ3 τ4 τ5 τ6 INSTITUTE OF COMPUTING TECHNOLOGY

Aggressive Field-sensitive Classification τ1 τ7 τ2 τ3 τ4 τ5 τ6 INSTITUTE OF COMPUTING TECHNOLOGY τ8 τ9 τ10 τ11 τ12 τ13 i 2: s 4: i 3: i 4: f 2: s 1: s 3: τ1 τ2 τ3 τ4 τ5 τ6 τ7 τ7 s 1. a: s 3. d: s 1. b: s 3. e: s 1. c: s 3. f: s 3. g: i 1: f 1: τ8 τ8 τ9 τ9 τ10 τ11 τ12 τ13

Another Improvement • Improving the type hierarchy object struct simple blank (a) Original Type

Another Improvement • Improving the type hierarchy object struct simple blank (a) Original Type hierarchy (b) The improved type hierarchy We make a change for the type hierarchy and corresponding change in type system that enables the result of joining simple type and struct type is another struct type with only fields possibly overlapped with the scalar joined. INSTITUTE OF COMPUTING TECHNOLOGY

Flow- and context-sensitive dataflow analysis • A transfer function evaluator – Computes transfer functions

Flow- and context-sensitive dataflow analysis • A transfer function evaluator – Computes transfer functions for each procedure – Traverse the procedure call graph in a bottom-up order, from callees to callers. – To handle recursions, we reduced the call graph to a SCCDAG in each SCC. • A dataflow value propagator – A data flow value is an element of the semi-lattice – Propagates dataflow value from the entry to the exit of each procedure’s local CFG in a top-down manner – Traverse the procedure call graph in a top-down manner INSTITUTE OF COMPUTING TECHNOLOGY

Transfer functions • A single transfer function has the form – x 1…xn is

Transfer functions • A single transfer function has the form – x 1…xn is a list of formal-in parameters, either a declared formal parameter or a location whose value at the procedure entry may be accessed by the procedure or the procedures it invokes – y is a formal-out parameter, include not only the return value of this procedure but also all the locations whose value at the procedure exit may be accessed out of the procedure. – The function body of f gives the mapping relations between inputs x 1…xn and the output y. INSTITUTE OF COMPUTING TECHNOLOGY

Dataflow value propagator • At a procedure entry – Perform the “meet” operation on

Dataflow value propagator • At a procedure entry – Perform the “meet” operation on the data flow values from difference call sites. • At each callsite – Propagate value of each actual-in parameter to the corresponding formal-in parameter of the callee – Obtain the value of each formal-out parameter by applying the transfer function and propagate it to the corresponding actual-out parameter. – Need to perform iterations for loops and recursions INSTITUTE OF COMPUTING TECHNOLOGY

Checking uninitialized reference • Abstract the task as solving a dataflow problem – Any

Checking uninitialized reference • Abstract the task as solving a dataflow problem – Any memory object in any reference site has a dataflow value “define”, “may define” or “undefine”. To determine which values they have. I. A memory object is initialized on all paths from the program entry to the current reference site, the memory object in this site has the value “define”. II. If on some path the memory object is not initialized, the vaule will be “undefine”. III. The initialization through indirect memory operations (e. g. the deference of pointers) results in a value “may define”. INSTITUTE OF COMPUTING TECHNOLOGY

Checking uninitialized reference (2) • First compute a transfer function for each procedure, the

Checking uninitialized reference (2) • First compute a transfer function for each procedure, the transfer function says that which global variables are modified and how to be modified by this procedure. I. II. III. If on every path in the procedure form entry to exit a global variable is modified by direct assignments, the variable is “must mod”. if in every path variable is modified by either direct or indirect assignments, the variable is “may mod”. Otherwise the variable must not be modified on some of the paths, so it is “may not mod” • The transfer function evaluator performs an interprocedural modified side effect analysis to the whole program. INSTITUTE OF COMPUTING TECHNOLOGY

Checking uninitialized reference (3) • Then propagates the modification property at procedure entry to

Checking uninitialized reference (3) • Then propagates the modification property at procedure entry to the exit. – The “must mod” and “may mod” value of a transfer function are both regarded as “define” – The “may not mod” value is regarded as “undefine” – Only propagate for scalar variables that are either local or global currently. INSTITUTE OF COMPUTING TECHNOLOGY

Example Kloc Time (sec) Reports Bugs FP Rate mcf bzip 2 bftpd-2. 3 gzip

Example Kloc Time (sec) Reports Bugs FP Rate mcf bzip 2 bftpd-2. 3 gzip Hyper. SAT-1. 7 parser TOTAL 0. 9 2. 4 2. 8 3. 6 6 7. 4 23. 1 0. 05 0. 08 0. 11 0. 7 1. 06 3 0 2 1 1 0 7 3 0 1 1 0 0 5 0% 0% 50% 0% 100% 0% 28. 6% – – Time: the time for checking the benchmark; Reports: the total number of warnings produced on the benchmark; Bugs: the number of true bugs found; FP Rate: the false positive rate; • Compare with GCC with the warning option –Wuninitialized. No warnings are reported by GCC, because GCC can only check uninitialized reference for auto variables intraprocedurally. INSTITUTE OF COMPUTING TECHNOLOGY

Conclusion • We have introduced our work of constructing an aggressive framework for program

Conclusion • We have introduced our work of constructing an aggressive framework for program analysis in order to do error checking in Open 64. • We – Integrates intraprocedural analysis into interprocedural phase in order to do flow- and contextsensitive whole program analysis. – Improves the original alias analysis to be fieldsensitive and compared the three unification-based methods. • An error checking instance, checking uninitialized reference is displayed. INSTITUTE OF COMPUTING TECHNOLOGY

www. themegallery. com

www. themegallery. com