Secure Programming Brice Dobry brice dobryfuturewei com Related

Secure Programming Brice Dobry brice. dobry@futurewei. com

Related Previous Work C∀ (Cforall) - https: //cforall. uwaterloo. ca • Multi-year collaboration with Uwaterloo on this language project • Goals • Design and implement a language based on C • Modern safety and productivity features • Backwards compatibility with C (and familiarity for C programmers) • Key feature is the addition of polymorphism into the type system

Related Previous Work C∀ (Cforall) - https: //cforall. uwaterloo. ca • Same language design principles as C • • • C syntax (looks familiar) Statically typed Procedural (non-object-oriented) Low-overhead runtime (no global GC) ⇒ true system language 98% backwards compatible to leverage legacy code, tools, OS Continuing language (has long term future > 10+ years) New features lead to increased safety and productivity Build on C as not mature; C++, C♯, Java, Go, . . . are mature C++ did not fix problems in C, it only added new ones Excerpted from Peter Buhr’s STW slides

Related Previous Work C∀ (Cforall) - https: //cforall. uwaterloo. ca • “Describe not Prescribe” • Do not force programmers to “do the right thing” • No notion or requirement for rewriting legacy C program in C∀ • Provide pathway from low-level C to high-level C∀ • Write new programs as exploratory combination of C and C∀ • Programmer is free to reach back to C for any reason • Evolutionary step forward from C Excerpted from Peter Buhr’s STW slides

Related Previous Work μC++ - https: //plg. uwaterloo. ca/~usystem/u. C++. html • Another language from Waterloo with a focus on adding high-level concurrency to C++ • Language-level support for coroutines, monitors, mutexes, and signaling mechanisms • Language and compiler integration allows for more flexible and consistent primitives and static checking that would not be possible with library solutions • Integrated with a high-performance runtime

Related Previous Work • Syrup Language - https: //openarkcompiler-incubator. gitee. io/syrup/language-spec/ • Internally designed prototype language • Goals • Productivity – better abstractions • Safety – automatic memory management, non-null references, no pointer arithmetic, strict mutability checking • Performance – static and deterministic results, high-performance reference counting • Pervasive – support all platforms (small footprint, small runtime), heterogeneous and distributed support • Balance of performance, footprint, complexity, and safety customized for our needs

Secure Programming • Bugs often show up because of missing information • The programmer didn’t realize something that can show up at runtime • The compiler doesn’t know what assumptions the programmer makes • The language has undefined behaviors, which are exploited to maximize performance, but can also be exploited by bad actors • The combination of the programming language, compiler, and tooling can avoid large classes of issues, and force or incentivize safer programming

The Information Problem • Information is known (or assumed) by the programmer • The language (e. g. , C) does not provide a way to include this information in the program • Other programmers (or the same programmer at a later time) are unaware of the assumptions • The tools (e. g. , compiler, static analysis) cannot reliably check for common mistakes because there is not enough information

Examples x = *p; (int*)malloc(n * sizeof(int)); y = a[3]; Developer is assuming that p is non-null Developer is assuming that n is within some reasonable range Developer is assuming that a has valid data at this offset

Types of Missing Information • Nullability • Is this pointer allowed to be null or must it be a valid pointer? • Value range • What is the valid range of values that are expected in an integer? • Buffer size • How big is the buffer passed into a function? • Taint • Is this data trusted?

Danger of Tainted Data OOB int arr[100]; arr[tainted] = 0; Buffer Overflow char num[8]; itoa(tainted, num); void foo(int tainted) { … Hanging Loop unsigned x = 0; for (; x < tainted; x++) … Divide-by-0 x = y / tainted Integer overflow if (x < tainted – y) process(x) Buffer Overflow char dest[8]; int len = strlen(tainted); strcpy(dest, tainted, len); void bar(char* tainted) { … privilege escalation system(tainted) tainted = “rm –rf /*”

Other Common Mistakes • Misusage of safe functions • Invalid STL iterator • Not handling error conditions • Resource leak • Race condition • Integer overflow • Misusing signed integer for unsigned • Information leak

Secure Coding Guidelines • Guidelines and static checking tools attempt to solve some of these problems • Cannot detect many types of problems due to a lack of information • For the problems they can detect, there are often too many false positives to be useful

How can we fix this? Option 1 Utilize source languages that provide more built-in safety • • • Requires learning a new language and rewriting code Can be useful for new projects and a worthwhile effort in some cases We have worked on multiple such languages in the past (Syrup, Cforall) Many existing languages improve on safety over C, each with their own tradeoffs: Go - https: //golang. org/ • Swift - https: //swift. org/ • Cforall - https: //cforall. uwaterloo. ca/features • Cyclone - https: //cyclone. thelanguage. org/ • Rust - https: //www. rust-lang. org/ • SCC: The Safe C Compiler - http: //pages. cs. wisc. edu/~austin/scc. html Ccured - https: //web. eecs. umich. edu/~weimerw/p/p 477 -necula. pdf Mem. Safe - https: //dl. acm. org/citation. cfm? id=2422147 Safe Arrays and Pointers for C - http: //www. animats. com/papers/languages/safea Checked. C - https: //www. microsoft. com/en-us/research/project/checked-c/ Safe-C - http: //www. safe-c. org/start-en. html

How can we fix this? Option 2 Develop the infrastructure to add information into existing programs • Involves adding annotations into existing C code to provide more information to the compiler and analysis tools • This is fully backwards compatible with old code and can be implemented incrementally • We have proposed the Secure-C extension to solve this problem

How can we fix this? Common Foundation Maple / Open. Ark. Compiler and Runtime • Provides a flexible framework for managing new metadata and performing related analyses based on the new information from the source code • Enables development of performance optimizations, specifically targeting the new security features, to ensure that performance is not sacrificed for security • Provides crucial security features like automatic null-pointer checking, array-bounds checking, and automatic memory management

Secure-C • The Secure-C proposal uses standard attributes to pass in the additional information to the compiler and tools • Enhanced safety with guarantees • Minimize development overhead • Minimize runtime overhead

Nullability • Null-pointer dereferences are a common bug • Excessive checks are a source of runtime overhead • Developer always knows whether they want to accept NULL • Call-site has more information about the arguments Page 18

$Nullability int foo(int * _Nonnull x) { return *x; } • Checking is done$

Nullability int foo(int * _Nonnull x) { return *x; } • Checking is done at call site • If nullability cannot be determined statically, compiler will require (or insert) a run-time check Page 19

$Nullability #include <stddef. h> void foo(int * _Nonnull ptr); void bar() { int *x;$

Nullability #include <stddef. h> void foo(int * _Nonnull ptr); void bar() { int *x; . . . foo(x); } call_error. c: 12: 7: error: implicit conversion from nullable pointer type 'int *' to non-nullable pointer type 'int * _Nonnull' foo(x); ^ Page 20

$Nullability #include <stddef. h> void foo(int * _Nonnull ptr); void bar() { int *x;$

Nullability #include <stddef. h> void foo(int * _Nonnull ptr); void bar() { int *x; . . . SECURE_C_ASSERT(x != NULL); foo(x); } • Assertion automatically inserted by tool Page 21

$Nullability #include <stddef. h> void foo(int * _Nonnull ptr); OK void bar() { int$

Nullability #include <stddef. h> void foo(int * _Nonnull ptr); OK void bar() { int *x; . . . if (x != NULL) { foo(x); } } • If developer includes a null-check, the compiler can statically determine safety Page 22

Value Range • Programmer often knows information about a variable that is not captured by its type • Enable prevention of overflow, guaranteed input checking, and additional optimization opportunities • Value-range annotated types can be used to enforce sanitization of tainted data Page 23

Value Range const char cipher[26] = {'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'a', 'b', 'c', 'd', 'e', 'f'}; void print_letter(int digit) __attribute__((value_range(digit >= 0 && digit < 26))) { printf("%c", cipher[digit]); } • Same idea as nullability check • Compiler statically checks • Requires (or inserts) run-time checks when necessary Page 24

Safe Arrays • Another source of errors is accessing outside the bounds of an array • The developer should always know the size of an array they are accessing • Need some way to give that information to the compiler so that it can perform analysis Page 25

Safe Arrays void copybyref(int *a, int *b __attribute__((safe_array(n))), size_t n) __attribute__((safe_array(a, n)), (safe_array(b, n))) { for (int i = 0; i < lengthof(a); i++) { a[i] = b[i]; } } • Compiler can detect and report some issues statically at compile time • Compiler can require (or insert) runtime checks • Disallow subscript operator on pointers not annotated as safe_array Page 26

Taint • Reading data from external sources, using without proper checking may lead to problems • Information leak, Buffer over/under-flow, Null dereference, Out-of-bound array access, Integer overflow, Infinite/long loops, Invalid control flow, … void test() { char x = getchar(); system(&x); // warn: untrusted data is passed to a system call } void test() { char s[10], buf[10]; fscanf(stdin, "%s", s); sprintf(buf, s); // warn: untrusted data as a format string } void test() { size_t ts; scanf("%zd", &ts); int *p = (int *)malloc(ts * sizeof(int)); // warn: untrusted data as buffer size } void test() { char s[] = "abc"; int x = getchar(); char c = s[x]; // warn: index is untrusted }

Annotations • These annotations can be added into existing C code incrementally • No need to make changes everywhere before it works • Compile unmodified code together with annotated code • We have developed a prototype tool to automatically insert annotations to assist developers

Audited Regions • Users can annotate code with regions that should be checked and regions that should be unchecked • Some code has to do inherently unsafe things (ex. OS) • Makes migrating legacy code easier / possible Page 29

Libraries • Many common data structures and algorithms are implemented over and over again in C • Wasteful re-implementation • Exposes more opportunities for errors • Secure-C will offer a suite of standard libraries • These libraries will implement all secure-c features • Tested and optimized for our use cases Page 30

Libraries: Containers • Kinds: • • Maps/Dictionaries Lists Sets Heaps Trees Strings Etc. • Challenges • Securely supporting polymorphism (void *s or macro functions both introduce more buggy code) Page 31

Libraries: Memory Management • Wrappers for malloc/free • Ensure proper typing and size (integrate with safe arrays) • Prevent double free (set pointer to NULL after freeing) • Memory pool library • Reference counting memory manager • Library + compiler support can provide automatic reference counting • Garbage collecting memory manager Page 32

Libraries: Concurrency • Just libraries: • Coroutine/channels (like Go) • Task/dispatcher (like GCD) • Library + compiler support • Monitors (structures with implicit lock/unlock) • Actors (maximum safety guarantees) Do we want to be opinionated and force one concurrency model or provide support for optimized and secured building blocks? Page 33

Basic Safety Checks • C allows many operations that are unsafe • Tools should detect these unsafe operations and report errors • Examples: • Pointer casting (cast from one pointer type to another) • Pointer arithmetic • Various statically checkable rules/recommendations from the Secure Coding Guidelines Page 34

Secure-C Annotation Details

Type Annotations int __attribute__((value_range(0, 9))) idx = 0; Secure-C Annotation • Constraint must hold true throughout the lifetime of the object • Enforced via the type system Page 36

Type Annotations • Checked at each assignment to an object with a type annotation • Expressions must be constant • Can be used in typedefs typedef int * __attribute__((non_null)) Non. Null. Int. Ptr; Page 37

Temporal Annotation int get_val(S s) Target Expression __attribute__(( secure_c_in(s. idx, value_range(0, 15)), secure_c_out(get_val, value_range(0, 99)) )) {. . . } Program point (in / out) Annotation • A property that is true of an expression at a specific program point • Function entry: secure_c_in • Function exit: secure_c_out Page 38

Temporal Annotation • Target expression can refer to: • Parameters of the function (and data accessed through those parameters) • Global variables • The function name itself (indicates return value) • Keyword _all_ can be used to refer to all elements of an array secure_c_in(arr[_all_], value_range(0, 15)) Page 39

Temporal Annotation • Serves two purposes: • Provides information to help with analysis • Defines a requirement to be checked int increment(struct foo s) Assert (verify) s->p is non-null at call-sites __attribute__((secure_c_in(s->p, non_null))) { return s->p->c++; } Page 40 Safely assume s->p is non-null inside the function

Transient Annotation List *l __attribute__((transient)); • Allows a structure to temporarily violate its requirements • Useful during initialization Page 41

$Transient Annotation struct foo { int * __attribute__((non_null)) buf; struct foo { unsigned length;$

Transient Annotation struct foo { int * __attribute__((non_null)) buf; struct foo { unsigned length; } int * __attribute__((non_null)) buf; unsigned length; struct foo *create. Foo(unsigned length) { } struct foo *f __attribute__((transient)); f = (struct foo *)malloc(sizeof(struct foo)); if (f == NULL) { length) { struct foo *create. Foo(unsigned return NULL; struct foo *f; } f = (struct foo *)malloc(sizeof(struct foo)); f->buf = (int *)malloc(sizeof(int) * length); if (f == NULL) { if (f->buf == NULL) { return NULL; } } return f; f->buf = (int *)malloc(sizeof(int) * length); } if (f->buf == NULL) { return NULL; Page 42 } OK! �� Error: Structure violates non-null attribute!

Transient Annotation • Requirements must be satisfied when the object “escapes” • “Escaping” means: • Assignment to another variable • Passing the object to a function call • Returning the object from the function Page 43

Nullability • Indicate whether a pointer is allowed to be null int foo(int * __attribute__((non_null)) a, int * __attribute__((nullable)) b) {. . . } a is known to be non -null Page 44 b may be null

$Nullability int increment(struct foo * __attribute__((non_null)) s) __attribute__((secure_c_in(s->p, non_null))) { return s->p->c++; } s->p$

Nullability int increment(struct foo * __attribute__((non_null)) s) __attribute__((secure_c_in(s->p, non_null))) { return s->p->c++; } s->p is known to be non-null on function entry a is may be null on function entry void foo(struct X* a) { if (a != NULL) { a->x = 1; // implicit promotion } } Page 45 a is known to be non -null in this scope

Value Range • Indicate a range of valid values for an integer 0 <= month <= 11 const char *print. Month( int __attribute__((value_range(0, 11))) month) { const char *months[12] = {"January", . . . , "December"}; printf("%s", months[month]); } Page 46

$Value Range struct Number { const int min; const int max; int *__attribute__((value_range(min, max)))$

Value Range struct Number { const int min; const int max; int *__attribute__((value_range(min, max))) value; } May use constant fields from the structure int get(int *arr, unsigned int length, unsigned int idx) __attribute__((secure_c_in(idx, value_range(0, length))) { return arr[idx]; } May use other parameters Page 47

Secure Buffer • Indicate the size of a buffer passed into a function int get. Element(int * __attribute__((secure_buffer(10))) buf, int idx) { if (idx < 10) { return buf[idx]; } return 0; } Page 48 buf points to a buffer of size 10

$Secure Buffer struct S { const int length; int __attribute__((secure_buffer(length))) buf; } May use$

Secure Buffer struct S { const int length; int __attribute__((secure_buffer(length))) buf; } May use constant fields from the structure int sum(foo *f, int len) __attribute__((secure_c_in(f->buf, secure_buffer(len)))) { int i, sum = 0; for (i=0; i<len; i++) { sum += f->buf[i]; } return sum; } Page 49 May use other parameters

Taint buf is untrusted at function entry void foo(char *buf) __attribute__(( secure_c_in(buf, untrusted), )); char *fgets(char *str, int count, FILE *stream) __attribute__(( secure_c_out(str, untrusted) )); // return value of read_int function is untrusted int read_int() __attribute__(( secure_c_out(read_int, untrusted) )); str is untrusted at function exit Return value of read_int is untrusted

Taint • (un)trusted for pointers vs pointer targets void foo(char *buf, . . . ) __attribute__(( secure_c_out(buf, untrusted) )); The pointer, buf, is untrusted void foo(char *buf, . . . ) __attribute__(( secure_c_out(*buf, untrusted) )); The value pointed to by buf is untrusted

Taint puts(const char *str) __attribute__(( secure_c_in(str, trusted) )); Requires that only trusted values may be passed to str // requires parameter 'n' trusted on function entry, and ensures parameter 'dst' trusted on function exit void *memcpy(void *dst, void *src, size_t n) __attribute__(( secure_c_in(n, trusted), secure_c_out(dst, trusted) )); Requires that only trusted values may be passed to str and ensures that dst is trusted upon exit

Sanitization • May happen in two ways • Validation • Check and return proper status code • Transformation • Modify parameter in-place (passed by reference, not const) or return modified value // Sanitization function returning status code int my. Sanitize. Func(char *buf, int len) __attribute__(( secure_c_in(buf, untrusted), secure_c_in(len, untrusted), secure_c_out(my. Sanitize. Func, trusted) )) { if (len > 0 && len < BUFFER_SIZE) { return STATUS_OK; } return STATUS_NOT_OK; }

Sanitization // Sanitization function returning checked len int my. Sanitize. Func(char *buf, int len) __attribute__(( secure_c_in(len, untrusted), secure_c_out(my. Sanitize. Func, trusted) )) { if (len > 0 && len < BUFFER_SIZE) { return len; } return INVALID_LENGTH; } // Sanitization function validating buffer void my. Sanitize. Func(char *buf, int len) __attribute__(( secure_c_in(buf, untrusted), secure_c_out(buf, trusted) )) { if (len > 0 && len < BUFFER_SIZE) { return; } assert(); // invalid buffer } // Sanitization function returning valid buf char * my. Santizie. Func(const char *buf, int len) __attribute__(( secure_c_in(buf, untrusted), secure_c_out(my. Sanitize. Func, trusted) )) { char * lbuf; if (len > 0 && len < BUFFER_SIZE) { // copy valid characters from buf to lbuf return lbuf; } return NULL; } Requirement: Sanitizations performed using user-defined functions that either marks untrusted parameters (pointer type) as trusted or returns trusted values

Case Study

Outline • Statistics • Case study • User experience • Current and future work • Future demos / case studies Page 56

Definitions: Pointer Uses • • • Subscript: Dereference: Function call: Member access: Cast: p[3] *p p(x) p->a • Explicit cast (T * _Nonnull)p • Implicit (assignment) T* _Nonnull q = p • Implicit (parameter) int foo(T * _Nonnull); foo(p); Page 57

Definitions: Analysis • Errors reported: potentially unsafe pointer uses as determined by secure-c (default mode) • Checks inserted: checks automatically inserted by secure-c to resolve errors (debug mode) • Safe by analysis: a pointer use is determined to be safe by secure-c’s analysis • Safe by annotation: a pointer use is determined to be safe because it is annotated as non-null • Redundant checks: a pointer that is known to be non-null is checked for null Page 58

Collecting Statistics -dump-stats flag

Nullability Case Study • bzip 2 • Real code (file compression) • ~8, 000 lines of code • Implemented as a library (bzlib) and a command line tool (bzip 2) Page 60

Unmodified Code Pointer uses * Stats collected after macro expansion Page 61

Unmodified Code Page 62

Unmodified Code Nullability Analysis These 2, 781 errors would need to be manually handled (either proving as a false positive, or inserting a null check) in order to achieve 100% recall. Page 63

Unmodified Code Page 64

Securified Code Use securify to insert annotations Page 65

Securified Code Nullability Analysis Page 66 Automatic Annotations

Securified Code Page 67

Explanation • Securify inserts annotations by inferring the assumptions that were already made by the developer • A single annotation can prevent multiple errors when the pointer is used multiple times • Moves the checking from the callee to the caller, where safety can often be proven by analysis Page 68 Many errors since ‘s’ is not non-null ‘s’ is known to be non-null at call-site

Inserting Checks secure-c -i -mode=debug -dump-stats -default-nullable -secure-buffer=false bzlib. c -- -Wno-nullability-completeness -ferror-limit=0 -include $PROJECT/clang/tools/securify/known_symbols. h -I $PROJECT/clang/tools/secure-c • 234 checks inserted by tool Page 69

Inserting Checks static void *_Nonnull _Check. Non. Null( const char *_Nonnull file, int line, const char *_Nonnull func, void *_Nullable p) { if (__builtin_expect(p == ((void *)0), 1)) { fprintf(stderr, "%s: %d: %s: illegal use of NULL pointern", file, line, func); abort(); } return p; } Page 70

$Inserting Checks • Nullability Error: int get_count(foo * _Nullable f) { return f->count; }$

Inserting Checks • Nullability Error: int get_count(foo * _Nullable f) { return f->count; } • Fixed with automatic check: int get_count(foo * _Nullable f) { return ((foo * _Nonnull)(_Check. Non. Null( __FILE__, __LINE__, __extension__ __PRETTY_FUNCTION__, f)))->count; } Page 71

Manual Analysis • We expect that a developer doing manual analysis can still do better than the tools • We can use information from manual analysis to further improve the tools Page 72

Manual Analysis • Common pattern seen: This is essentially common-subexpression elimination and could be automated. Page 73

Manual Analysis • Conservative handling of struct fields • Will be improved in future iterations if (strm->bzalloc == NULL) strm->bzalloc = default_bzalloc; void * _Nullable (* _Nonnull bzalloc)(void * _Nullable, int) = strm->bzalloc; Tool inserts redundant check here Page 74

Manual Analysis • Inter-procedural analysis • Neither securify or secure-c perform inter-procedural analysis • If all uses of a function can be analyzed, a nullable parameter could be changed to non-null and checks removed • This could be automated and included in securify Page 75

Comparison Existing checks + Inserted checks Page 76

Effect on Performance Compressed a tarball of Clang (2. 9 GB unzipped, 430 MB zipped) Page 77

User Experience • On old code • securify + secure-c (debug mode) does a pretty good job • securify-cc is a drop-in replacement for the compiler to do this process in one step CC=securify-cc (may require additional Makefile changes) • secure-cc is a drop-in replacement for the compiler to just do the checking • Performance effect of automated modifications is negligible • May need to manually replace runtime checks with if/else statements for production Page 78

User Experience • On new code • Annotations are simple and already in the developer’s mind • secure-c checking can easily be integrated into IDEs Page 79

User Experience Overview securify secure-c Secure-C Code Manual Annotation IDE Integration Manual Fixes Web Playground Drop in Compiler Replacements securify-cc secure-cc Integrate tools into IDE for simple use and interactivity Page 80 Try out tools/language without installing Simple integration with existing build systems

Thank you! Questions?