FINDING SECURITY VULNERABILITIES IN JAVA APPLICATIONS WITH STATIC

FINDING SECURITY VULNERABILITIES IN JAVA APPLICATIONS WITH STATIC ANALYSIS V. Benjamin Livshits and Monica S. Lam Stanford University

Introduction • Detect application-level vulnerabilities using static analysis in Java Web applications • SQL injection, cross-site scripting, HTTP splitting attacks • Attacks can result in data disclosure, data modification, and Denial of Service (Do. S) attacks • Firewalls do not prevent web application attacks that use the HTTP protocol (unless HTTP is blocked) • Web applications accept user input and generally utilize a back-end database

Unchecked Input Can Cause Vulnerabilities • Parameter tampering • URL manipulation • Hidden field Manipulation • HTTP header tampering • Cookie poisoning • SQL Injection • Cross-site scripting • HTTP response splitting • Path traversal • Command Injection

Code Auditing • The attacks may be mitigated through code auditing • The auditing should occur prior to deployment • Auditing is generally costly since it may be done by an external organization • Occasionally, a second audit is undertaken to ensure that the new patches did not create additional vulnerabilities • The authors propose using security tools during development

Static Analysis • Examine code without executing it • Vulnerability patterns are created using Program Query Language (PQL) that has similar syntax to Java • The tool processes Java bytecode for vulnerabilities matching user-created patterns • Source code is not required since it uses Java bytecode • Libraries can also be examined • Created an Eclipse plugin to be used during development • Utilizes a context-sensitive pointer-analysis

SQL Injection Vulnerability • SQL injection is where unvalidated user input gets passed directly to be executed by a back-end database • Can result in data disclosure, data modification, and command execution Http. Servlet. Request request =. . . ; String user. Name = request. get. Parameter("name"); Connection con =. . . String query = "SELECT * FROM Users " + " WHERE name = ’" + user. Name + "’"; con. execute(query); • The attacker can set name to ‘ OR 1 = 1; -- to get access to all records in the database • Can be avoided using the Prepared. Statement API class • Precompiled and parameters are not part of the executable SQL statement

Data Injection • Anything that the user can modify needs to be validated such as form parameters, HTTP headers, and cookies • Client-side filtering is not effective since the attacker can save the HTML, modify it, and resubmit it • Sanitization should also occur on the server-side • Parameters are generally sent via HTML forms and can be modified by the attacker

URL Data Injection Example • URL tampering is where the attacker modifies parameters • GET requests use form parameters that are part of the querystring of the URL • http: //www. mybank. com/myaccount? accountnumber=341 948&debit_amount=100 • http: //www. mybank. com/myaccount? accountnumber=341 948&debit_amount=-5000 • Server-side protection should examine the input to apply countermeasures

Hidden Field Modification • HTTP is stateless, so sometimes hidden values are embedded into HTML pages • <input type="hidden" name="total_price" value="25. 00"> • The hidden fields can be modified by the attacker and the page can be reloaded to resubmit it to the server

HTTP Header Modification • HTTP header values are usually abstracted from the user and handled by the web browser and the server • Tools can be used to manipulate the header fields • The referer field in an error page can be modified for cross-site scripting or HTTP response splitting attacks

Cookie Poisoning and Non-Web Input Sources • A cookie is stored on a user’s computer and sent to the web server to contain state which can identify the user and provide tokens • The data is usually stored in name-value pairs • These values can be modified to inject data • Command line utilities can be used for tasks such as initializing, cleaning, validating or migrating the database • Malicious parameters can make it into the command line

Cross-Site Scripting • Cross-site scripting involves web pages that are dynamically generated that contain input that has not been validated • Embedded scripts in web pages can steal account credentials and cookies, modify settings, and insert content

HTTP Response Splitting • HTTP response splitting can lead to web cache poisoning, cross-user defacement, sensitive page hijacking, and cross-site scripting • Using CR and LF line breaks, an attacker can make two HTTP responses be generated from a single HTTP request • The second HTTP response may be incorrectly associated with the next HTTP request • Since the second response can be controlled, an attacker may be able to forge or poison web pages in a caching proxy server • The caching proxy server is generally shared by users, so the effects can be amplified • For HTTP splitting to be applicable, the app has to include unchecked input for the response headers sent to the client • Location and Set-Cookie headers are likely targets

Path Traversal Attack • Path traversal enables an attacker to traverse directories and access files outside of the intended file • The attack vector is usually unchecked URL input parameters, cookies, and HTTP request headers • The attacker may be able to read or delete files on the server or create a Do. S attack by attempting to write to read-only files • Using Java, security policies can be used to restrict access to certain directories (like chroot in Linux)

Command Injection • Command injection occurs when an unchecked input is used as all or part of a command that is executed on the command line • CVE-2016 -10138 – Command injection as system user in Adups software on Android • https: //web. nvd. nist. gov/view/vuln/detail? vuln. Id=CVE-2016 -10138

Static Analysis • Tainted object problem contains source descriptors, sink descriptors, and derivation descriptors • Access path – sequence of operations on an object such as fields accesses, array index operations, and method calls (e. g. , v. f. g) • E denotes an empty access path • array index operations are denoted by [] • A source descriptor (method where input enters the program) is a 3 -tuple of (m, n, p) where m is the method, n is argument number, and p is the access path to the argument

Static Analysis • A sink descriptor (unsafe way that the data can be used by the program) is a 3 -tuple of (m, n, p) where m is the method, n is the argument number, and p is the access path to the argument • A derivation descriptor (how data propagates between objects) is of the form (m, ns, ps, nd, pd) where m is the derivation method, ns is the source object by argument number and ps is the access path, nd is the destination object by argument number and its access path is pd

Taint Analysis • Trace source to sink to see if it contains a tainted object • Derivation descriptor is introduced to propagate taint for String objects since they are immutable and create a new object based on the input(s) • Native methods and character-level manipulation also propagate the taint • The taint does not propagate through sanitization routines

Tainted Object Propagation • Source (Http. Servlet. Request. get. Parameter(String), − 1, E) • Sink (Connection. execute. Query(String), 1, E) • Derivation • (String. Buffer. append(String), 1, E, − 1, E) • (String. Buffer. to. String(), 0, E , − 1, E)

Formal Definitions

Source and Sink Identification • They identified sources and sinks by examining the documentation of J 2 EE APIs • They also instrumented applications to identify locations where application code is used by the server • They also identified additional derivation methods using static analysis

Points-to Analysis 1 2 3 4 5 6 7 8 String param = req. get. Parameter("user"); String. Buffer buf 1; String. Buffer buf 2; . . . buf 1. append(param); String query = buf 2. to. String(); con. execute. Query(query); • A conservative approach may decided that line 8 has a vulnerability • They try to ascertain if buf 1 and buf 2 can ever refer to the same object using a points-to analysis

Points-to Analysis • The points-to analysis uses an approximation of objects within a finite set of names for the objects • The pointsto(v, h) function where v is a variable and h is the allocation site • A security violation exists if there is a variable that has an access path for a source descriptor to a sink descriptor

Security Violations

Points-to Analysis • Their context-sensitive Java points-to analysis uses the BDD-Based Deductive Database tool which uses Datalog language for deductive databases • Both precise and scalable • Cannot handle dynamic-class loading and there is some support for resolving the target of reflective calls • It is undecidable to determine which heap objects point to a variable at runtime

Points-to Analysis • Using a sound tool may overestimate the taint among objects using a conservative approach resulting in false positives • Practical tools may use an unsound approach and underestimate the aliasing among objects which can result in false negatives • Context-sensitivity takes into account the invocation context of a method which adds precision

PQL for SQL Injection

PQL for Source-Sink Pairs

Context-Sensitivity 1 class Data. Source { 2 String url; 3 Data. Source(String url) { 4 this. url = url; 5 } 6 String get. Url(){ 7 return this. url; 8 } 9. . . 10 } 11 String passed. Url = request. get. Parameter(". . . "); 12 Data. Source ds 1 = new Data. Source(passed. Url); 13 String local. Url = "http: //localhost/"; 14 Data. Source ds 2 = new Data. Source(local. Url); 15 16 String s 1 = ds 1. get. Url(); 17 String s 2 = ds 2. get. Url(); • Context insensitive analysis would deem s 1 and s 2 to be tainted, even though it is really just s 1 that is tainted

Context-Sensitivity • The tool builds a call graph of which methods can be invoked at each method call • It can support up to 1014 contexts • A PQL query represents a pattern of events involving variables belonging to dynamic object instances • The user queries for source-sink pairs that represent a tainted object propagation • Transitively derive the tainted objects

Improving Precision • They improved the points-to analysis using an object- naming approach to reduce false positives • Containers (Hash. Map, Vector, *List, etc. ) cause imprecision since the data is often stored in an instance variable of the object • They create a new object name for an internal data structure at each site of allocation for the container • They give unique names to the returned String object from String manipulation methods in the Java API

Eclipse Plugin

Results • At the time, there was no benchmark for assessing vulnerabilities in web applications • They used open source J 2 EE Web applications mostly on Source. Forge based on size and popularity • board, blueblog, blossom, personalblog, snipsnap, pebble, webgoat, roller, etc. • They used 28 source, 18 sink, and 29 derivation methods in the experiments • The programs are pre-processed to create the relations for the pointer analysis

Results • They found 41 potential security violations and 29 were security errors with the remaining 12 being false positives • Each of the benchmark apps had a vulnerability except one

Validating Errors • The errors found may not actually be exploitable • Unknown if path will be taken at runtime or appropriate input can be generated • Server configuration needs to be taken into account • They reported the errors to the maintainers of the applications when they occurred in the application code as opposed to library code • Almost all issues reported were confirmed by the maintainers (more than a dozen) • The analysis ignores control flow which prevents them knowing if the input has been checked which can manifest as false positives

Found Vulnerabilities • Most vulnerabilities were in application code and not library code • They found 2 vulnerability in libraries (1 in Java hibernate and another in J 2 EE) • hibernate library provides a database to serialize objects to disk and then to load them as needed

False Positives • It reported 0 false positives for all apps except snipsnap • Context sensitivity and better object naming aided in reducing false positives • The 12 false positives in snipsnap from assuming that the output of String. Writer. to. String() will be tainted if the String. Writer is initialized with a tainted string

Related Work • Penetration testing involves providing malformed and malicious inputs to a web application and identifying vulnerabilities • It is an incomplete approach and may only yield a subset of all possible vulnerabilities if every part is not tested • Runtime Monitoring uses a proxy to examine and modify the traffic in between the client and server • Application-level firewalls use signatures to flag attacks and white- lists for appropriate inputs • Static analysis can be used to identify dangerous pre- defined patterns

Conclusion • Utilized tainted object propagation to statically identify web application vulnerabilities • The analysis is precise, scalable, and integrated into an Eclipse plugin • They found 29 security vulnerabilities in open-source web applications • Most were confirmed by the software maintainers • False positives only for snipsnap program

Questions? ?