Finding and Understanding Bugs in C Compilers Xuejun
Finding and Understanding Bugs in C Compilers Xuejun Yang Chen Eric Eide John Regehr University of Utah
• C compilers should be correct – Part of trusted computing base – Used to compile OS and safety critical applications • But sometimes compilers are incorrect – Fail to compile a valid program – Generate wrong code 2
Contributions • Developed Csmith, a random C program generator that is expressive and generates unambiguous code • Used Csmith to find 382 bugs in widely used C compilers – Most of the bugs have been fixed 3
Random Generator: Csmith C program gcc -O 0 gcc -O 2 clang -Os … results majority vote minority 4
5
6
Why Csmith Works • Unambiguous: avoid undefined or unspecified behaviors that create ambiguous meanings of a program Integer undefined behavior Use without initialization Unspecified evaluation order Use of dangling pointer Null pointer dereference OOB array access • Expressiveness: support most commonly used C features Integer operations Loops (with break/continue) Conditionals Function calls Const and volatile Structs and Bitfields Pointers and arrays Goto 7
8
Avoiding Undefined/unspecified Behaviors Problem Generation Time Solution Run Time Solution Integer undefined behaviors • Constant folding/propagation • Algebraic simplification Safe math wrappers Use without initialization explicit initializers OOB array access Force index within range Null pointer dereference Inter-procedural points-to analysis Use of dangling pointers Inter-procedural points-to analysis Unspecified evaluation order Inter-procedural effect analysis Take modulus 9
… assign no LHS *q RHS call validate ok? Generation Time Analyzer func_2 Code Generator 10
… assign LHS RHS call func_2 Generation Time Analyzer Code Generator 11
… assign yes LHS *p RHS call validate update facts ok? Generation Time Analyzer func_2 Code Generator 12
• From March, 2008 to present: Compiler Bugs reported (fixed) GCC 104 (86) LLVM 228 (221) Others (Compcert, icc, armcc, tcc, cil, suncc, open 64, etc) 50 Total 382 Accounts for 1% total valid GCC bugs reported in the same period Accounts for 3. 5% total valid LLVM bugs reported in the same period • Do they matter? – 25 priority 1 bugs for GCC – 8 of our bugs were re-reported by others 13
Bug Dist. Across Compiler Stages GCC LLVM Front end 1 11 Middle end 71 93 Back end 28 78 Unclassified 4 46 Total 104 228 14
Coverage of GCC Coverage of LLVM/Clang 100% 90% 80% +0. 45% +0. 18% +0. 15% +0. 05% 70% +0. 26% 60% +0. 85% 50% 40% 30% 20% 10% 0% Line Function Branch Check-C test suite Check-C + 10, 000 random programs test suite + 10, 000 random programs 15
Common Compiler Bug Pattern Analysis Safety Check N if (condition 1 && condition 2 ) Y Transformation Compiler Optimization missing safety condition 16
Comp. Cert Bugs • Certified C compiler • 11 bugs reported – All in the unproved front end or back end – No bugs in the proved part • Developing compiler optimizations within a proof framework is helpful for compiler correctness 17
Conclusion • By randomly generating expressive and unambiguous test cases, we have found, and continue to find, compiler bugs effectively • Csmith is open source: http: //embed. cs. utah. edu/csmith 18
- Slides: 18