CIL Infrastructure for C Program Analysis and Transformation

  • Slides: 22
Download presentation
CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott Mc. Peak,

CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott Mc. Peak, S. P. Rahul, Westley Weimer http: //www. cs. berkeley. edu/~necula/cil ETAPS – CC ’ 02 Friday, April 12

What is CIL? Distills C language n n into a few key forms with

What is CIL? Distills C language n n into a few key forms with precise semantics Parser + IR + Program Merger for C Maintains types, close ties to source Highly structured, clean subset of C Handles ANSI/GCC/MSVC

Why CIL? Analyses and Transformations Easy to use n impersonates compiler & linker n

Why CIL? Analyses and Transformations Easy to use n impersonates compiler & linker n $ make project CC=cil Easy to work with n n n converts away tricky syntax leaves just the heart of the language separates concepts

C Feature Separation CIL separates language components n n pure expressions statements with side-effects

C Feature Separation CIL separates language components n n pure expressions statements with side-effects control-flow embedded CFG Keeps all programmer names n n temps serialize side-effects simplified scoping

Example: C Lvalues An exp referring to a region of storage Example: rec[1]. fld[2]

Example: C Lvalues An exp referring to a region of storage Example: rec[1]. fld[2] May involve 1, 2, 3 memory accesses n n n 1 if rec and fld are both arrays 2 if either one is a pointer 3 if rec and fld are both pointers Syntax (AST) is insufficient

CIL Lvalues An exp referring to a region of storage lval : : =

CIL Lvalues An exp referring to a region of storage lval : : = <base ´ offset> base : : = Var(varinfo) | Mem(exp) offset : : = None | Field(f ´ offset) | Index(exp ´ offset)

CIL Lvalues Example: rec[1]. fld[2] becomes either: <Var(rec), Index(1, Field(fld, Index(2, None)))> or: <Mem(2

CIL Lvalues Example: rec[1]. fld[2] becomes either: <Var(rec), Index(1, Field(fld, Index(2, None)))> or: <Mem(2 + Lvalue(<Mem(1 + Lvalue(<Var(rec), None>)), Field(fld, None)>), None> Full static and operational semantics

Semantics CIL gives syntax-directed semantics Example judgment: G(x) = t G ` Var(x) ß

Semantics CIL gives syntax-directed semantics Example judgment: G(x) = t G ` Var(x) ß (&x, t) environment meaning lvalue form

CIL Lvalue Semantics G ` o@b ß (a, t) G `<b, o> ß (a,

CIL Lvalue Semantics G ` o@b ß (a, t) G `<b, o> ß (a, t) G(x) = t G `Var(x) ß (&x, t) G ` e : Ptr(t) G `Mem(e) ß (e, t) G ` b ß (a, t) G `None@b ß (a, t) G ` b ß (a 1, Arr(t 1)) G `o@(a 1+e*|t 1|, t 1) ß (a 2, t 2) G `Index(e, o)@b ß (a 2, t 2)

CIL Source Fidelity typedef struct { int fld[3]; } * Myptr; Myptr rec; rec[2].

CIL Source Fidelity typedef struct { int fld[3]; } * Myptr; Myptr rec; rec[2]. fld[1] = ’h’; CIL output: struct __anonstruct 1 { int fld[3] ; }; typedef struct __anonstruct 1 * Myptr; Myptr rec; (rec + 2)->fld[1] = (int)’h’; SUIF 2. 2. 0 -4 output: typedef int __ar_1[3]; struct type_1 { __ar_1 fld; }; struct type_1 * rec; (((((int *)(((char *)&((((struct type_1 *) (rec))))[2])+0 U))))[1]) =(104);

Corner Cases Your analysis will not have to handle: n n return ({goto L;

Corner Cases Your analysis will not have to handle: n n return ({goto L; p; }) && ({L: 5; }); return &(--x ? : z) - & (x++, x); Full handling of n n n GNU-isms, MSVC-isms attributes initializers

Corner Cases Your analysis will not have to handle: n return ({goto L; p;

Corner Cases Your analysis will not have to handle: n return ({goto L; p; }) && ({L: 5; }); int tmp; goto L; if (p) { L: tmp = 1; } else { tmp = 0; } return tmp;

Stack. Guard Transform Cowan et al. , USENIX ’ 98 Buffer overrun defense n

Stack. Guard Transform Cowan et al. , USENIX ’ 98 Buffer overrun defense n n n push return addess on private stack pop before returning only change functions with local arrays 40 lines of commented code with CIL Quite easy: uses visitors for tree replacement, explicit returns, etc.

Other Transforms Instrument and log all calls: 150 lines Eliminate break, continue, switch: 110

Other Transforms Instrument and log all calls: 150 lines Eliminate break, continue, switch: 110 1 memory access per assignment: 100 Make each function have a single return statement: 90 Make all stack arrays heap-allocated: 75 Log all value/addr memory writes: 45

Whole-Program Merger C has incremental linking, compilation n coupled with a weak module system!

Whole-Program Merger C has incremental linking, compilation n coupled with a weak module system! Example (vortex / gcc / c++2 c): /* foo. c */ /* bar. c */ struct list { int head; struct chain { int head; struct list * tail; struct chain * tail; }; struct list * mylist; extern struct chain mylist; *

Merging a Project Determine what files to merge Merge the files n n n

Merging a Project Determine what files to merge Merge the files n n n handle file-scoped identifiers C uses name equivalence for types but modules need structural equivalence Key: Each global identifier has 1 type!

Other Merger Details Remove duplicate declarations n every file includes <stdio. h> Match struct

Other Merger Details Remove duplicate declarations n every file includes <stdio. h> Match struct pointer with no defined body in file A to defined body in file B Be careful when picking representatives

How Does it Work? Make project, pass all files through CIL Run your transform

How Does it Work? Make project, pass all files through CIL Run your transform and analysis Emit simplified C Compile simplified C with GCC/MSVC … and it works!

Large Programs Program #LOC *. [ch] Notes SPECINT 95 360 K GIMP-1. 2. 2

Large Programs Program #LOC *. [ch] Notes SPECINT 95 360 K GIMP-1. 2. 2 800 K large libraries linux-2. 4. 5 2. 5 M 132% compile time ACE (in C) 2 M 2000 files Used in the CCured and BLAST projects

Merged Kernel Stats Stock monolithic Linux 2. 4. 5 kernel http: //manju. cs. berkeley.

Merged Kernel Stats Stock monolithic Linux 2. 4. 5 kernel http: //manju. cs. berkeley. edu/cil/vmlinux. c Statistics: Before | After n n n 324 files 11. 3 M-words 7. 3 M-LOC (post-process) | One 12. 5 MB file | 1. 5 M-words | 470 K-LOC $ make CC=“cil –merge” HOSTCC=“cil –merge” LD=“cil –merge” AR=“cil –mode=AR –merge”

Conclusion CIL distills C to a precise, simple subset n n n easy to

Conclusion CIL distills C to a precise, simple subset n n n easy to analyze well-defined semantics close to the original source Well-suited to complex analyses and source-to-source transforms Parses ANSI/GCC/MSVC C Rapidly merges large programs

Questions? Try CIL out: http: //www. cs. berkeley. edu/~necula/cil Complete source, documentation and test

Questions? Try CIL out: http: //www. cs. berkeley. edu/~necula/cil Complete source, documentation and test cases freely available