Software Security many vulnerabilities result from poor programming

Software Security Ø many vulnerabilities result from poor programming practices l cf. Open Web Application Security Top Ten include 5 software related flaws Ø often from insufficient checking / validation of program input Ø awareness of issues is critical

Software Quality vs Security Ø software quality and reliability l l accidental failure of program from theoretically random unanticipated input improve using structured design and testing not how many bugs, but how often triggered Ø software security is related l l l but attacker chooses input distribution, specifically targeting buggy code to exploit triggered by often very unlikely inputs which common tests don’t identify

Defensive Programming Ø a form of defensive design to ensure continued function of software despite unforeseen usage Ø requires attention to all aspects of program execution, environment, data processed Ø also called secure programming Ø assume nothing, check all potential errors Ø rather than just focusing on solving task Ø must validate all assumptions

Abstract Program Model

Security by Design Ø security and reliability common design goals in most engineering disciplines l society not tolerant of bridge/plane etc failures Ø software development not as mature l much higher failure levels tolerated Ø despite having a number of software development and quality standards l l main focus is general development lifecycle increasingly identify security as a key goal

Handling Program Input Ø incorrect handling a very common failing Ø input is any source of data from outside l l data read from keyboard, file, network also execution environment, config data Ø must identify all data sources Ø and explicitly validate assumptions on size and type of values before use

Input Size & Buffer Overflow Ø often have assumptions about buffer size l l l eg. that user input is only a line of text size buffer accordingly but fail to verify size resulting in buffer overflow Ø testing may not identify vulnerability l since focus on “normal, expected” inputs Ø safe coding treats all input as dangerous l hence must process so as to protect program

Interpretation of Input Ø program input may be binary or text l l binary interpretation depends on encoding and is usually application specific text encoded in a character set e. g. ASCII internationalization has increased variety also need to validate interpretation before use • e. g. filename, URL, email address, identifier Ø failure to validate may result in an exploitable vulnerability

Injection Attacks Ø flaws relating to invalid input handling which then influences program execution l often when passed as a parameter to a helper program or other utility or subsystem Ø most often occurs in scripting languages l l encourage reuse of other programs / modules often seen in web CGI scripts

Unsafe Perl Script

Safer Script Ø counter attack by validating input l l compare to pattern that rejects invalid input see example additions to script:

SQL Injection Ø another widely exploited injection attack Ø when input used in SQL query to database l l l similar to command injection SQL meta-characters are the concern must check and validate input for these

Code Injection Ø further variant Ø input includes code that is then executed l this type of attack is widely exploited

Cross Site Scripting Attacks Ø attacks where input from one user is later output to another user Ø XSS commonly seen in scripted web apps l l l with script code included in output to browser any supported script, e. g. Javascript, Active. X assumed to come from application on site Ø XSS reflection l l malicious code supplied to site subsequently displayed to other users

XSS Example Ø cf. guestbooks, wikis, blogs etc Ø where comment includes script code l e. g. to collect cookie details of viewing users Ø need to validate data supplied l including handling various possible encodings Ø attacks both input and output handling

Validating Input Syntax Ø to ensure input data meets assumptions l e. g. is printable, HTML, email, userid etc Ø compare to what is known acceptable Ø not to known dangerous l as can miss new problems, bypass methods Ø commonly use regular expressions l l pattern of characters describe allowable input details vary between languages Ø bad input either rejected or altered

Alternate Encodings Ø may have multiple means of encoding text l l due to structured form of data, e. g. HTML or via use of some large character sets Ø Unicode used for internationalization l l l uses 16 -bit value for characters UTF-8 encodes as 1 -4 byte sequences have redundant variants • e. g. / is 2 F, C 0 AF, E 0 80 AF • hence if blocking absolute filenames check all! Ø must canonicalize input before checking

Validating Numeric Input Ø may have data representing numeric values Ø internally stored in fixed sized value l l e. g. 8, 16, 32, 64 -bit integers or 32, 64, 96 float signed or unsigned Ø must correctly interpret text form Ø and then process consistently l l l have issues comparing signed to unsigned e. g. large positive unsigned is negative signed could be used to buffer overflow check

Input Fuzzing Ø powerful testing method using a large range of randomly generated inputs l l l to test whether program/function correctly handles abnormal inputs simple, free of assumptions, cheap assists with reliability as well as security Ø can also use templates to generate classes of known problem inputs l could then miss bugs

Writing Safe Program Code Ø next concern is processing of data by some algorithm to solve required problem Ø compiled to machine code or interpreted l l have execution of machine instructions manipulate data in memory and registers Ø security issues: l l l correct algorithm implementation correct machine instructions for algorithm valid manipulation of data

Correct Algorithm Implementation Ø issue of good program development Ø to correctly handle all problem variants l l c. f. Netscape random number bug supposed to be unpredictable, but wasn’t Ø when debug/test code left in production l l used to access data or bypass checks c. f. Morris Worm exploit of sendmail Ø interpreter incorrectly handles semantics Ø hence care needed in design/implement

Correct Machine Language Ø ensure machine instructions correctly implement high-level language code l l l often ignored by programmers assume compiler/interpreter is correct c. f. Ken Thompson’s paper Ø requires comparing machine code with original source l l slow and difficult is required for higher Common Criteria EAL’s

Correct Data Interpretation Ø data stored as bits/bytes in computer l l grouped as words, longwords etc interpretation depends on machine instruction Ø languages provide different capabilities for restricting/validating data use l l strongly typed languages more limited, safer others more liberal, flexible, less safe e. g. C Ø strongly typed languages are safer

Correct Use of Memory Ø issue of dynamic memory allocation l l used to manipulate unknown amounts of data allocated when needed, released when done Ø memory leak occurs if incorrectly released Ø many older languages have no explicit support for dynamic memory allocation l l rather use standard library functions programmer ensures correct allocation/release Ø modern languages handle automatically

Race Conditions in Shared Memory Ø when multiple threads/processes access shared data / memory Ø unless access synchronized can get corruption or loss of changes due to overlapping accesses Ø so use suitable synchronization primitives l correct choice & sequence may not be obvious Ø have issue of access deadlock

Interacting with O/S Ø programs execute on systems under O/S l l l mediates and shares access to resources constructs execution environment with environment variables and arguments Ø systems have multiple users l with access permissions on resources / data Ø programs may access shared resources l e. g. files

Environment Variables Ø set of string values inherited from parent l l can affect process behavior e. g. PATH, LD_LIBRARY_PATH Ø process can alter for its children Ø another source of untrusted program input Ø attackers use to try to escalate privileges Ø privileged shell scripts targeted l very difficult to write safely and correctly

Example Vulnerable Scripts Ø using PATH environment variables Ø cause script to execute attackers program Ø with privileges granted to script Ø almost impossible to prevent in some form

Use of Least Privilege Ø exploit of flaws may give attacker greater privileges - privilege escalation Ø hence run programs with least privilege needed to complete their function l l determine suitable user and group to use whether grant extra user or group privileges • latter preferred and safer, may not be sufficient l ensure can only modify files/dirs needed • otherwise compromise results in greater damage • recheck these when moved or upgraded

Root/Admin Programs Ø programs with root / administrator privileges a major target of attackers l l since provide highest levels of system access are needed to manage access to protected system resources, e. g. network server ports Ø often privilege only needed at start l can then run as normal user Ø good design partitions complex programs in smaller modules with needed privileges

System Calls and Standard Library Functions Ø programs use system calls and standard library functions for common operations l l l and make assumptions about their operation if incorrect behavior is not what is expected may be a result of system optimizing access to shared resources • by buffering, re-sequencing, modifying requests l can conflict with program goals

Secure File Shredder

Race Conditions Ø programs may access shared resources l e. g. mailbox file, CGI data file Ø need suitable synchronization mechanisms l e. g. lock on shared file Ø alternatives l lockfile - create/check, advisory, atomic advisory file lock - e. g. flock mandatory file lock - e. g. fcntl, need release • later mechanisms vary between O/S • have subtle complexities in use

Safe Temporary Files Ø many programs use temporary files Ø often in common, shared system area Ø must be unique, not accessed by others Ø commonly create name using process ID l l unique, but predictable attacker might guess and attempt to create own between program checking and creating Ø secure temp files need random names l l some older functions unsafe must need correct permissions on file/dir

Other Program Interaction Ø may use services of other programs Ø must identify/verify assumptions on data Ø esp older user programs l l now used within web interfaces must ensure safe usage of these programs Ø issue of data confidentiality / integrity l l within same system use pipe / temp file across net use IPSec, TLS/SSL, SSH etc Ø also detect / handle exceptions / errors

Handling Program Output Ø final concern is program output l l stored for future use, sent over net, displayed may be binary or text Ø conforms to expected form / interpretation l l assumption of common origin, c. f. XSS, VT 100 escape seqs, X terminal hijack Ø uses expected character set Ø target not program but output display device

Summary Ø discussed software security issues Ø handling program input safely l size, interpretation, injection, XSS, fuzzing Ø writing safe program code l algorithm, machine language, data, memory Ø interacting with O/S and other programs l ENV, least privilege, syscalls / std libs, file lock, temp files, other programs Ø handling program output