Salvatore Guarnieri Marco Pistoia Omer Tripp IBM Software
Salvatore Guarnieri Marco Pistoia Omer Tripp IBM Software Group IBM T. J. Watson Research Center IBM Software Group pistoia@us. ibm. com omert@il. ibm. com Julian Dolby Stephen Teilhet Ryan Berg IBM T. J. Watson Research Center IBM Software Group steilhet@us. ibm. com ryan. berg@us. ibm. com sguarni@us. ibm. com dolby@us. ibm. com www. research. ibm. com/labasec
Java. Script is present on many popular Web sites 1
2
Consequences of Taint Violations • Read and write access to saved data in cookies and local data stores • Read and write access to data in the web page • Key loggers • Impersonation • Phishing via page modifications or redirects 3
Getting data from the DOM var el 1 = document. get. Element. By. Id("d 1"); Sanitizing some, but not function foo() { all, of the data var el 2 = document. get. Element. By. Id("d 2"); function bar() { var el 3 = new Element(); var s = encode. URIComponent(el 2. inner. Text); document. write(s); el 1. inner. HTML = el 2. inner. Text; Writing untrusted data document. location = el 3. inner. Text; into web page } bar(); } foo(); function baz(a, b) { a. f = document. URL; document. write(b. f); } Writing unchecked data var x = new Object(); to the web page baz(x, x); 4
Motivation Sources, Sinks, and Sanitizers Taint Analysis Results 5
var el 1 = document. get. Element. By. Id("d 1"); function foo() { var el 2 = document. get. Element. By. Id("d 2"); function bar() { var el 3 = new Element(); var s = encode. URIComponent(el 2. inner. Text); document. write(s); el 1. inner. HTML = el 2. inner. Text; document. location = el 3. inner. Text; } bar(); } foo(); function baz(a, b) { a. f = document. URL; document. write(b. f); } var x = new Object(); baz(x, x); 6
var el 1 = document. get. Element. By. Id("d 1"); function foo() { var el 2 = document. get. Element. By. Id("d 2"); function bar() { var el 3 = new Element(); var s = encode. URIComponent(el 2. inner. Text); document. write(s); el 1. inner. HTML = el 2. inner. Text; document. location = el 3. inner. Text; } bar(); } foo(); function baz(a, b) { a. f = document. URL; document. write(b. f); } var x = new Object(); baz(x, x); 7
Rules • A rule is a triple <Sources, Sinks, Sanitizers> • Not all sources are valid for all sinks, and not all sanitizers are valid for all sinks 8
Rules • A rule is a triple <Sources, Sinks, Sanitizers> • Not all sources are valid for all sinks, and not all sanitizers are valid for all sinks • Sources – Seeds of untrusted data – Field gets or returns of function calls – Ex: document. url 9
Rules • A rule is a triple <Sources, Sinks, Sanitizers> • Not all sources are valid for all sinks, and not all sanitizers are valid for all sinks • Sources – – – Seeds of untrusted data Field gets or returns of function calls Ex: document. url • Sinks – Security critical operations – Field puts or parameters to function calls – Ex: element. inner. HTML 10
Rules • A rule is a triple <Sources, Sinks, Sanitizers> • Not all sources are valid for all sinks, and not all sanitizers are valid for all sinks • Sources – – – • Seeds of untrusted data Field gets or returns of function calls Ex: document. url Sinks – – – Security critical operations Field puts or parameters to function calls Ex: element. inner. HTML • Sanitizers – Marks flow as non-dangerous – Function calls – Ex: encode. URIComponent(str) 11
Motivation Sources, Sinks, and Sanitizers Taint Analysis Results 12
Complexities of Java. Script • Reflective property access • Prototype chain property lookup • Lexical scoping • Function pointers • eval and its relatives eval("document. write('evil')"); function F() { +{ "bar"; foo() var m a =="foo" function(). . . this. bar var 42; = document. url; var kby==function(f) obj[a]; { } var f(); bar = function() { } write(y); function } G() { k(m); } G. prototype = new F(); var a = new G(); write(g. bar); 13
Demand Driven Taint Analysis • • • The seeds are the assignments to sources or return values from sources The analysis proceeds by tainting variables Variables consist of triplets: – Static Single Assignment (SSA) variable ID – Method where SSA variable is defined – Access path – Ex: (v 7, m, <f, g>) 14
Context Sensitive Taint Analysis • Start from taint sources • Propagate taint intraprocedurally through defuse • Inter-procedurally propagate taint forward • Resolve aliasing by using Andersen alias analysis • Record constraints on call sites, recursively • In the final constraintpropagation graph, detect paths between sources and sinks not intercepted by sanitizers m 1() m 2(p 1, p 2, p 3) m 3(q 1, q 2) 15
Analysis Example Taint variable: (v 2, foo, <f, *>) function foo(p 1, p 2) { p 1. f = p 2. f; } var a = new Object(); var b = new Object(); b. f = window. location. to. String(); var c = new Object(); var d = new Object(); d. f = "safe"; foo(a, b); foo(c, d); Install taint summary for foo: p 2. f -> p 1. f Since d. f is not tainted, c. f will not be tainted document. write(a. f); // This is a taint violation document. write(c. f); // This is NOT a taint violation 16
Motivation Sources, Sinks, and Sanitizers Taint Analysis Results 17
Data Sets • Developed a micro-benchmark suite of about 150 test scripts • Downloaded Web pages and ran Actarus on them 18
Real World Data Set • Crawled portions of top Alexa Web sites and downloaded pages to disk • Ran Actarus on a sample of the saved pages • Ran on over 12, 000 pages • Successfully analyzed over 9, 000 pages • ~22% failure due to a 4 minute timeout 19
Findings • Several vulnerable Web sites were found • Duplicates of vulnerabilities were found on many pages from the same site • Some exploits were found in third party code that was shared among several websites • 40% true positive rate • Vulnerabilities can be fixed with common sanitization routines 20
Findings Site Unique True Positives Total True Positives A 7 80 B 4 12 C 4 91 D 7 13 E 2 4 F 1 200 G 1 1 H 1 114 I 3 7 J 1 3 K 1 1 21
User Friendly Output • Flows are highlighted and numbered in the source code • Java. Script was pretty printed to improve readability and usefulness of line numbers 22
23
Future Work • Using string analysis to reduce false positives • Make analysis modular so library code does not have to be reanalyzed 24
Thank You E-mail: sguarni@us. ibm. com 25
- Slides: 26