CIL Infrastructure for C Program Analysis and Transformation




![Example: C Lvalues An exp referring to a region of storage Example: rec[1]. fld[2] Example: C Lvalues An exp referring to a region of storage Example: rec[1]. fld[2]](https://slidetodoc.com/presentation_image/5adff984e531a6112624f10edb9903d5/image-5.jpg)

![CIL Lvalues Example: rec[1]. fld[2] becomes either: <Var(rec), Index(1, Field(fld, Index(2, None)))> or: <Mem(2 CIL Lvalues Example: rec[1]. fld[2] becomes either: <Var(rec), Index(1, Field(fld, Index(2, None)))> or: <Mem(2](https://slidetodoc.com/presentation_image/5adff984e531a6112624f10edb9903d5/image-7.jpg)


![CIL Source Fidelity typedef struct { int fld[3]; } * Myptr; Myptr rec; rec[2]. CIL Source Fidelity typedef struct { int fld[3]; } * Myptr; Myptr rec; rec[2].](https://slidetodoc.com/presentation_image/5adff984e531a6112624f10edb9903d5/image-10.jpg)








![Large Programs Program #LOC *. [ch] Notes SPECINT 95 360 K GIMP-1. 2. 2 Large Programs Program #LOC *. [ch] Notes SPECINT 95 360 K GIMP-1. 2. 2](https://slidetodoc.com/presentation_image/5adff984e531a6112624f10edb9903d5/image-19.jpg)



- Slides: 22
CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott Mc. Peak, S. P. Rahul, Westley Weimer http: //www. cs. berkeley. edu/~necula/cil ETAPS – CC ’ 02 Friday, April 12
What is CIL? Distills C language n n into a few key forms with precise semantics Parser + IR + Program Merger for C Maintains types, close ties to source Highly structured, clean subset of C Handles ANSI/GCC/MSVC
Why CIL? Analyses and Transformations Easy to use n impersonates compiler & linker n $ make project CC=cil Easy to work with n n n converts away tricky syntax leaves just the heart of the language separates concepts
C Feature Separation CIL separates language components n n pure expressions statements with side-effects control-flow embedded CFG Keeps all programmer names n n temps serialize side-effects simplified scoping
Example: C Lvalues An exp referring to a region of storage Example: rec[1]. fld[2] May involve 1, 2, 3 memory accesses n n n 1 if rec and fld are both arrays 2 if either one is a pointer 3 if rec and fld are both pointers Syntax (AST) is insufficient
CIL Lvalues An exp referring to a region of storage lval : : = <base ´ offset> base : : = Var(varinfo) | Mem(exp) offset : : = None | Field(f ´ offset) | Index(exp ´ offset)
CIL Lvalues Example: rec[1]. fld[2] becomes either: <Var(rec), Index(1, Field(fld, Index(2, None)))> or: <Mem(2 + Lvalue(<Mem(1 + Lvalue(<Var(rec), None>)), Field(fld, None)>), None> Full static and operational semantics
Semantics CIL gives syntax-directed semantics Example judgment: G(x) = t G ` Var(x) ß (&x, t) environment meaning lvalue form
CIL Lvalue Semantics G ` o@b ß (a, t) G `<b, o> ß (a, t) G(x) = t G `Var(x) ß (&x, t) G ` e : Ptr(t) G `Mem(e) ß (e, t) G ` b ß (a, t) G `None@b ß (a, t) G ` b ß (a 1, Arr(t 1)) G `o@(a 1+e*|t 1|, t 1) ß (a 2, t 2) G `Index(e, o)@b ß (a 2, t 2)
CIL Source Fidelity typedef struct { int fld[3]; } * Myptr; Myptr rec; rec[2]. fld[1] = ’h’; CIL output: struct __anonstruct 1 { int fld[3] ; }; typedef struct __anonstruct 1 * Myptr; Myptr rec; (rec + 2)->fld[1] = (int)’h’; SUIF 2. 2. 0 -4 output: typedef int __ar_1[3]; struct type_1 { __ar_1 fld; }; struct type_1 * rec; (((((int *)(((char *)&((((struct type_1 *) (rec))))[2])+0 U))))[1]) =(104);
Corner Cases Your analysis will not have to handle: n n return ({goto L; p; }) && ({L: 5; }); return &(--x ? : z) - & (x++, x); Full handling of n n n GNU-isms, MSVC-isms attributes initializers
Corner Cases Your analysis will not have to handle: n return ({goto L; p; }) && ({L: 5; }); int tmp; goto L; if (p) { L: tmp = 1; } else { tmp = 0; } return tmp;
Stack. Guard Transform Cowan et al. , USENIX ’ 98 Buffer overrun defense n n n push return addess on private stack pop before returning only change functions with local arrays 40 lines of commented code with CIL Quite easy: uses visitors for tree replacement, explicit returns, etc.
Other Transforms Instrument and log all calls: 150 lines Eliminate break, continue, switch: 110 1 memory access per assignment: 100 Make each function have a single return statement: 90 Make all stack arrays heap-allocated: 75 Log all value/addr memory writes: 45
Whole-Program Merger C has incremental linking, compilation n coupled with a weak module system! Example (vortex / gcc / c++2 c): /* foo. c */ /* bar. c */ struct list { int head; struct chain { int head; struct list * tail; struct chain * tail; }; struct list * mylist; extern struct chain mylist; *
Merging a Project Determine what files to merge Merge the files n n n handle file-scoped identifiers C uses name equivalence for types but modules need structural equivalence Key: Each global identifier has 1 type!
Other Merger Details Remove duplicate declarations n every file includes <stdio. h> Match struct pointer with no defined body in file A to defined body in file B Be careful when picking representatives
How Does it Work? Make project, pass all files through CIL Run your transform and analysis Emit simplified C Compile simplified C with GCC/MSVC … and it works!
Large Programs Program #LOC *. [ch] Notes SPECINT 95 360 K GIMP-1. 2. 2 800 K large libraries linux-2. 4. 5 2. 5 M 132% compile time ACE (in C) 2 M 2000 files Used in the CCured and BLAST projects
Merged Kernel Stats Stock monolithic Linux 2. 4. 5 kernel http: //manju. cs. berkeley. edu/cil/vmlinux. c Statistics: Before | After n n n 324 files 11. 3 M-words 7. 3 M-LOC (post-process) | One 12. 5 MB file | 1. 5 M-words | 470 K-LOC $ make CC=“cil –merge” HOSTCC=“cil –merge” LD=“cil –merge” AR=“cil –mode=AR –merge”
Conclusion CIL distills C to a precise, simple subset n n n easy to analyze well-defined semantics close to the original source Well-suited to complex analyses and source-to-source transforms Parses ANSI/GCC/MSVC C Rapidly merges large programs
Questions? Try CIL out: http: //www. cs. berkeley. edu/~necula/cil Complete source, documentation and test cases freely available