Static Analysis of Java and Soot Mooly Sagiv

  • Slides: 75
Download presentation
Static Analysis of Java and Soot Mooly Sagiv

Static Analysis of Java and Soot Mooly Sagiv

Main Java Features • • Class based object oriented Type safe No explicit free

Main Java Features • • Class based object oriented Type safe No explicit free Portable with bytecode – Interpreted by the Java Virtual Machines • Clean and rich library • Verbose • Carefully designed

Java Bytecode Source Code Java Compile Linux JVM Win JVM Linux Machine Windows Machine

Java Bytecode Source Code Java Compile Linux JVM Win JVM Linux Machine Windows Machine Mac JVM Machine

Type of Java Bugs • • • Null dereferences “Memory” and resource leaks Data

Type of Java Bugs • • • Null dereferences “Memory” and resource leaks Data races Concurrent modification via Iterators Incorrect API usage

Program Slicing • Program Slice [Mark Weiser] – the statements of a program that

Program Slicing • Program Slice [Mark Weiser] – the statements of a program that may affect the values of some variables in a set V at some point of interest p read(n); i : = 1 ; sum : = 0; prod : = 1 ; while (i n) do { sum : = sum + i; prod : = prod * i; i : = i + 1 ; } print sum; print prod ; read(n); i : = 1 ; prod : = 1 ; while (i n) do { prod : = prod * i; i : = i + 1 ; } print prod ;

Applications of Slicing • • • Debugging Program Comprehension Reverse Engineering Program Testing Measuring

Applications of Slicing • • • Debugging Program Comprehension Reverse Engineering Program Testing Measuring Program Metrics Coverage, Overlap, Clustering • Refactoring • Program integration

Program Dependence Graph(PDG) • A directed graph • Nodes are basic instructions (statements/conditions) •

Program Dependence Graph(PDG) • A directed graph • Nodes are basic instructions (statements/conditions) • Two type of edges between u and v – Flow-Dependence • The value assigned at is directly used at v – Control-Dependence • The value of the condition u “controls” the execution of v

PDG Example read(n); i : = 1 ; sum : = 0; prod :

PDG Example read(n); i : = 1 ; sum : = 0; prod : = 1 ; while (i n) do { sum : = sum + i; prod : = prod * i; i : = i + 1 ; } print sum; print prod ; i : = 1 sum : = 0 prod : = 1 i n? F T T sum : = sum + i T F prod : = prod * i i : =i + 1 print sum print prod

Flow Dependences with Pointers List p, q, y; q = (List *) malloc(); p

Flow Dependences with Pointers List p, q, y; q = (List *) malloc(); p = q; l 1: p->d = 5; l 1, 5 l 2: printf(q->d); List p, q, y; q = (List *) malloc(); p = q; l 1: p->d = 5; l 1, 5 : t->d = 7; l 2: printf(q->d); List p, q, y; q = (List *) malloc(); p = q; l 1: p->d = 5; l 1, 5 : p = (List *) malloc(); l 2: printf(q->d);

Constructing Flow Dependences • Instrument the program with statements which record the location which

Constructing Flow Dependences • Instrument the program with statements which record the location which last write into memory • v depends on u iff v reads a location last read at v • can be approximated statically

Simple Example Append() { List head, tail, temp; l 1: head = (List) malloc();

Simple Example Append() { List head, tail, temp; l 1: head = (List) malloc(); l 2: scanf(“%c", &head->data); l 3: head->n = NULL; l 4: tail = head; l 5: if (tail->data == `x') goto l 12 l 6: temp = (List) malloc(); l 7: scanf(“%c", &temp->data); l 8: temp->n = NULL l 9: tail->n = temp; l 10: tail = tail->n; l 11: goto l 5 l 12: printf(“%c", head->data); l 13: printf(“%c", tail->data);

Project 1: Java Slicer • Develop Slicer for Java Programs with Shallow Pointers •

Project 1: Java Slicer • Develop Slicer for Java Programs with Shallow Pointers • Develop abstract domain and transformers • Implement with Soot for intraprocedural programs • Evaluate the project on real and artificial benchmarks

Taint Checking • Enforce code security by tracking propagated information • Prevent bad behaviors

Taint Checking • Enforce code security by tracking propagated information • Prevent bad behaviors (e. g. , SQL injection) Http. Servlet. Request request =. . . ; String user. Name = request. get. Parameter("name"); Connection con =. . . String query = "SELECT * FROM Users " + " WHERE name = ’" + user. Name + "’"; con. execute(query)

Flow. Droid: Precise Context, Flow, Field, Object-sensitive and Lifecycleaware Taint Analysis for Android App

Flow. Droid: Precise Context, Flow, Field, Object-sensitive and Lifecycleaware Taint Analysis for Android App

Flow. Twist: Efficient Context-Sensitive Inside. Out Taint Analysis for Large Codebases Johannes Lerch, Ben

Flow. Twist: Efficient Context-Sensitive Inside. Out Taint Analysis for Large Codebases Johannes Lerch, Ben Hermann, Eric Bodden, and Mira Mezini {lastname}@cs. tu-darmstadt. de https: //github. com/johanneslerch/Flow. Twist @stg_darmstadt 19. 10. 2021 | Technische Universität Darmstadt | Software Technology Group | 15

security holes ch at p to e at d p u va Ja clevapushed

security holes ch at p to e at d p u va Ja clevapushed out new patch update contains fixes for 50 different security flaws. Ora. Ja flaw, th itibcal cre dra e lawtesst. W ed Friday attacks, reports sa Rele. Aasvu ln er y ab ili ty in 13 th 20 e Java soft Feb 04, ware has the pote ntial to affect a wid arn that it‘s alread e swath of compute y being exploited “i r users, n the wild”. and researchers w Jan 10, 2013 New vulner abilities fou F o ll nd in latest o w in Oracle patches 7 vulnerability g ist latest. Java updahtevulnerabilities erability and updates, m Aug 31, 201 o r e vuln. Oracle Breaking its 2 quarterly update schedule, runtime vulnaddresses erabdilaithas thoavpe abteca newas. Java for thisthat iees released t ix f a p e u a le e recent security flaws. ency Jav , “Oracle decided tonreuncovered in Oracle‘s Ja g r e m e s e a va 7 runtim Aug 30, le issu flaw in Jav e. . w Orac 2012 ” e n le a ib s k s c o a p tt s a a s r n e o k o c s y a it s h il ga rab After related bu are update to fix Java vulne ly e s lo c r ote attackers to the le releases softw m re w lo an. Oora al ld u co c 13 erability that nd Mar 04, 20 cy software update repairs vuln times a Homelan ers n u Emergen r a v du. Ss ecurit affects all Ja de. n o i l l i execute arbitrary co iu b t p 1 date at apparently y still advises h d l 13 20 , u 13 n disabling Ja co caeyssth w D a H a f l S r f s u s a Java, eve. s k v a a s i a n h r J u y t n after New va vulnerabilit billpiolunguinseartsrais npatched vulnera bilit ko a Ja to A new s close t u p e r therefo 12 20 Sep 26, Jan 14, 2 013 f remote attack. y may stil l put We b browse rs using t he

Theory: Stack-based Access Control Security. Manager. check. Permission File. Permission, Socket. Permission, . .

Theory: Stack-based Access Control Security. Manager. check. Permission File. Permission, Socket. Permission, . . . Security. Manager. check. Write File. Permission, Socket. Permission, . . . File. Output. Stream. <init> File. Permission, Socket. Permission, . . . Attacker. do. Evil Ø My. Applet. init Ø ∩=Ø

Theory: Stack-based Access Control ” “ : y t i l a e R

Theory: Stack-based Access Control ” “ : y t i l a e R public static Class<? > for. Name(String class. Name) throws Class. Not. Found. Exception { return for. Name 0(class. Name, true, Class. Loader. get. Caller. Class. Loader()); } Implicit Permission Check Class. for. Name Privileged Class. Loader Attacker. do. Evil Unprivileged Class. Loader My. Applet. init Unprivileged Class. Loader

Leak in sun. beans. finder. Class. Finder (CVE 2012 -4681) public static Class<? >

Leak in sun. beans. finder. Class. Finder (CVE 2012 -4681) public static Class<? > find. Class(String class. Name) { try { Class. Loader cl =. . . return Class. for. Name(class. Name, false, cl); } catch (Class. Not. Found. Exception e) { throws Security. Exception } catch (Security. Exception e) { } return Class. for. Name(class. Name); } “handled” here

Leak in sun. beans. finder. Class. Finder (CVE 2012 -4681) public static Class<? >

Leak in sun. beans. finder. Class. Finder (CVE 2012 -4681) public static Class<? > find. Class(String class. Name) { try { Class. Loader cl =. . . Privileged Class. for. Name return Class. for. Name(class. Name, false, Class. Loader cl); throws Security. Exception } Class. Finder. find. Class Privileged Class. Loader catch (Class. Not. Found. Exception e) Unprivileged Class. Loader { Attacker. do. Evil My. Applet. init } catch (Security. Exception e) { } return Class. for. Name(class. Name); } Unprivileged Class. Loader “handled” here

Deriving the Static Program Analysis Problem Caller Sensitive Track the return value Class. for.

Deriving the Static Program Analysis Problem Caller Sensitive Track the return value Class. for. Name Private Method Privileged Class. Loader Class. Finder. find. Class Privileged Class. Loader Unprivileged Class. Loader Attacker. do. Evil Private Method My. Applet. init Unprivileged Class. Loader Track the parameter Public Method

Two Independent Analyses Source Caller Sensitive Sink Track the return value Private Method Track

Two Independent Analyses Source Caller Sensitive Sink Track the return value Private Method Track the parameter Sink Public Method Source

Two Independent Analyses: Not Context-Sensitive Caller Sensitive Track the return value Private Method Track

Two Independent Analyses: Not Context-Sensitive Caller Sensitive Track the return value Private Method Track the parameter Public Method

Pure Forward Context-Sensitive Approach Caller Sensitive Private Method Public Method Source Sink

Pure Forward Context-Sensitive Approach Caller Sensitive Private Method Public Method Source Sink

Results – only Class. for. Name 40 35 Runtime [min] 30 25 20 15

Results – only Class. for. Name 40 35 Runtime [min] 30 25 20 15 10 5 0 10 9 8 7 6 Maximum Heap Size [GB] 5 4 3

Results – all Caller Sensitive Methods 40 35 Runtime [min] 30 25 Did not

Results – all Caller Sensitive Methods 40 35 Runtime [min] 30 25 Did not Terminate 20 15 10 5 0 10 9 8 7 6 Maximum Heap Size [GB] 5 4 3

Scale of the Problem Caller Sensitive Private Method Public Method ~45, 000 methods

Scale of the Problem Caller Sensitive Private Method Public Method ~45, 000 methods

Scale of the Problem 64 methods Private Method Caller Sensitive Private Method 3, 656

Scale of the Problem 64 methods Private Method Caller Sensitive Private Method 3, 656 call sites Private Method Private Method Public Method ~45, 000 methods

Exploit Imbalance to Improve Scalability 64 methods Private Method Caller Sensitive Private Method 3,

Exploit Imbalance to Improve Scalability 64 methods Private Method Caller Sensitive Private Method 3, 656 call sites Private Method Private Method lysis a n A se Rever Public Method ~45, 000 methods

IFDS Algorithm [17, 19] Reports Leaks Caller Sensitive ? Public Method

IFDS Algorithm [17, 19] Reports Leaks Caller Sensitive ? Public Method

IFDS Algorithm: Computing Summaries foo(a) { a {a} b = a 1: a {a,

IFDS Algorithm: Computing Summaries foo(a) { a {a} b = a 1: a {a, b} 2: c = b 3: return c }

IFDS Algorithm: Computing Summaries foo(a) { b = a 1: a {a, b} c

IFDS Algorithm: Computing Summaries foo(a) { b = a 1: a {a, b} c = b 2: a {a} b {b, c} a {a, b, c} 3: return c }

IFDS Algorithm: Computing Summaries foo(a) { 1: b = a 2: c = b

IFDS Algorithm: Computing Summaries foo(a) { 1: b = a 2: c = b a {a, b, c} 3: return c }

Path Construction foo(a) { 1: b = a a {a, b} pred(b) = a

Path Construction foo(a) { 1: b = a a {a, b} pred(b) = a stmt(b) = #1 2: c = b a {a} b {b, c} pred(c) = b stmt(c) = #2 a {a, b, c} 3: return c }

Path Construction: Merge at Branches bar(a) { if (. . . ) { b

Path Construction: Merge at Branches bar(a) { if (. . . ) { b = a 1: } else { b = a 2: } a {a, b 1, b 2} 3: return b } a {a, b 1} pred(b 1) = a stmt(b 1) = #1 a {a, b 2} pred(b 2) = a stmt(b 2) = #2

Results – only Class. for. Name 40 Pure Forward Baseline 35 Independent Inside-Out Runtime

Results – only Class. for. Name 40 Pure Forward Baseline 35 Independent Inside-Out Runtime [min] 30 25 20 15 10 5 0 10 9 8 7 6 Maximum Heap Size [GB] 5 4 3

Results – all Caller Sensitive Methods 40 Did not Terminate Pure Forward Baseline 35

Results – all Caller Sensitive Methods 40 Did not Terminate Pure Forward Baseline 35 ate Did not Termin Independent Inside-Out Runtime [min] 30 25 20 15 10 5 0 10 9 8 7 6 Maximum Heap Size [GB] 5 4 3

Two Synchronized/Dependent Analyses Private Method Caller Sensitive Private Method Private Method Public Method

Two Synchronized/Dependent Analyses Private Method Caller Sensitive Private Method Private Method Public Method

Two Synchronized/Dependent Analyses Private Method Caller Sensitive Private Method Balanced Return Private Method Unbalanced

Two Synchronized/Dependent Analyses Private Method Caller Sensitive Private Method Balanced Return Private Method Unbalanced Return Public Method

Results – only Class. for. Name 40 Pure Forward Baseline Independent Inside-Out Dependent Inside-Out

Results – only Class. for. Name 40 Pure Forward Baseline Independent Inside-Out Dependent Inside-Out 35 Runtime [min] 30 25 20 15 10 5 0 10 9 8 7 6 Maximum Heap Size [GB] 5 4 3

Results – all Caller Sensitive Methods 40 Did not Terminate Pure Forward Baseline Independent

Results – all Caller Sensitive Methods 40 Did not Terminate Pure Forward Baseline Independent Inside-Out Dependent Inside-Out 35 ate Did not Termin Runtime [min] 30 25 20 15 10 5 0 10 9 8 7 6 Maximum Heap Size [GB] 5 4 3

Summary

Summary

Android Taint Flow Analysis for App Sets Will Klieber*, Lori Flynn, Amar Bhosale ,

Android Taint Flow Analysis for App Sets Will Klieber*, Lori Flynn, Amar Bhosale , Limin Jia, and Lujo Bauer Carnegie Mellon University *presenting

Motivation § Detect malicious apps that leak sensitive data. § E. g. , leak

Motivation § Detect malicious apps that leak sensitive data. § E. g. , leak contacts list to marketing company. § “All or nothing” permission model. § Apps can collude to leak data. § Evades precise detection if only analyzed individually. § We build upon Flow. Droid. § Flow. Droid alone handles only intra-component flows. § We extend it to handle inter-app flows. 62

Introduction: Android § Android apps have four types of components: § § Activities (our

Introduction: Android § Android apps have four types of components: § § Activities (our focus) Services Content providers Broadcast receivers § Intents are messages to components. § Explicit or implicit designation of recipient § Components declare intent filters to receive implicit intents. § Matched based on properties of intents, e. g. : § Action string (e. g. , “android. intent. action. VIEW ”) § Data MIME type (e. g. , “image/png”) 63

Introduction § Taint Analysis tracks the flow of sensitive data. § Can be static

Introduction § Taint Analysis tracks the flow of sensitive data. § Can be static analysis or dynamic analysis. § Our analysis is static. § We build upon existing Android static analyses: § Flow. Droid [1]: finds intra-component information flow § Epicc [2]: identifies intent specifications [1] S. Arzt et al. , “Flow. Droid: Precise Context, Flow, Field, Object-sensitive and Lifecycle-aware Taint Analysis for Android Apps”. PLDI, 2014. [2] D. Octeau et al. , “Effective inter-component communication mapping in Android with Epicc: An essential step towards holistic security analysis”. USENIX Security, 2013. 64

Our Contribution § We developed a static analyzer called “Did. Fail” (“Droid Intent Data

Our Contribution § We developed a static analyzer called “Did. Fail” (“Droid Intent Data Flow Analysis for Information Leakage”). § Finds flows of sensitive data across app boundaries. § Source code and binaries available at: (or google “Did. Fail SOAP”) http: //www. cert. org/secure-coding/tools/didfail. cfm § Two-phase analysis: 1. Analyze each app in isolation. 2. Use the result of Phase-1 analysis to determine inter-app flows. § We tested our analyzer on two sets of apps. 65

Terminology Definition. A source is an external resource (external to the app, not necessarily

Terminology Definition. A source is an external resource (external to the app, not necessarily external to the phone) from which data is read. Definition. A sink is an external resource to which data is written. For example, § Sources: Device ID, contacts, photos, current location, etc. § Sinks: Internet, outbound text messages, file system, etc. 66

Motivating Example § App Send. SMS. apk sends an intent (a message) to Echoer.

Motivating Example § App Send. SMS. apk sends an intent (a message) to Echoer. apk, which sends a result back. Device ID (Source) Send. SMS. apk start. Activity. For. Result() on. Activity. Result() Echoer. apk intent result get. Intent() set. Result() Text Message (Sink) § Send. SMS. apk tries to launder the taint through Echoer. apk. § Existing static analysis tools cannot precisely detect such inter-app data flows. 67

Analysis Design § Phase 1: Each app analyzed once, in isolation. § Flow. Droid:

Analysis Design § Phase 1: Each app analyzed once, in isolation. § Flow. Droid: Finds tainted dataflow from sources to sinks. § Received intents are considered sources. § Sent intent are considered sinks. § Epicc: Determines properties of intents. § Each intent-sending call site is labelled with a unique intent ID. § Phase 2: Analyze a set of apps: § For each intent sent by a component, determine which components can receive the intent. § Generate & solve taint flow equations. 68

Running Example src 1 sink 1 C 1 I 3 src 3 sink 3

Running Example src 1 sink 1 C 1 I 3 src 3 sink 3 I 1 C 2 Three components: C 1, C 2, C 3. C 1 = Send. SMS C 2 = Echoer C 3 is similar to C 1 C 3 • sink 1 is tainted with only src 1. • sink 3 is tainted with only src 3. 69

Running Example src 1 sink 1 C 1 I 3 src 3 sink 3

Running Example src 1 sink 1 C 1 I 3 src 3 sink 3 I 1 C 2 C 3 Notation: 70

Running Example src 1 sink 1 C 1 I 3 src 3 sink 3

Running Example src 1 sink 1 C 1 I 3 src 3 sink 3 I 1 C 2 C 3 Notation: 71

Running Example src 1 sink 1 C 1 I 3 src 3 sink 3

Running Example src 1 sink 1 C 1 I 3 src 3 sink 3 Notation: I 1 C 2 C 3 Final Sink Taints: • T(sink 1) = {src 1} • T(sink 3) = {src 3} 72

Phase-1 Flow Equations Analyze each component separately. Phase 1 Flow Equations: src 1 sink

Phase-1 Flow Equations Analyze each component separately. Phase 1 Flow Equations: src 1 sink 1 C 2 src 3 sink 3 C 3 Notation • An asterisk (“*”) indicates an unknown component. 73

src 1 Phase-2 Flow Equations Instantiate Phase-1 equations for all possible sender/receiver pairs. Phase

src 1 Phase-2 Flow Equations Instantiate Phase-1 equations for all possible sender/receiver pairs. Phase 1 Flow Equations: sink 1 I 3 src 3 sink 3 I 1 C 2 C 3 Phase 2 Flow Equations: Notation 74

src 1 Phase-2 Taint Equations For each flow equation “src → sink”, generate taint

src 1 Phase-2 Taint Equations For each flow equation “src → sink”, generate taint equation “T(src) ⊆ T(sink)”. Phase 2 Flow Equations: Notation sink 1 I 3 src 3 sink 3 I 1 C 2 C 3 Phase 2 Taint Equations: If s is a non-intent source, then T(s) = {s}. 75

Phase 1 Original APK Epicc Transform. APK Flow. Droid (modified) Extract manifest 76

Phase 1 Original APK Epicc Transform. APK Flow. Droid (modified) Extract manifest 76

Implementation: Phase 1 § APK Transformer § Assigns unique Intent ID to each call

Implementation: Phase 1 § APK Transformer § Assigns unique Intent ID to each call site of intent-sending methods. § Enables matching intents from the output of Flow. Droid and Epicc § Uses Soot to read APK, modify code (in Jimple), and write new APK. § Problem: Epicc is closed-source. How to make it emit Intent IDs? § Solution (hack): Add put. Extra call with Intent ID. Phase 1 Original APK Epicc Transform. APK Flow. Droid (modified) Extract manifest 77

Implementation: Phase 1 § Flow. Droid Modifications: § Extract intent IDs inserted by APK

Implementation: Phase 1 § Flow. Droid Modifications: § Extract intent IDs inserted by APK Transformer, and include in output. § When sink is an intent, identify the sending component. § In base. start. Activity, assume base is the sending component. (Soundness? ) § For deterministic output: Sort the final list of flows. Phase 1 Original APK Epicc Transform. APK Flow. Droid (modified) Extract manifest 78

Implementation: Phase 2 § Take the Phase 1 output. § Generate and solve the

Implementation: Phase 2 § Take the Phase 1 output. § Generate and solve the data-flow equations. § Output: 1. Directed graph indicating information flow between sources, intent results, and sinks. 2. Taintedness of each sink. 79

Testing Did. Fail analyzer: App Set 1 § Send. SMS. apk § Reads device

Testing Did. Fail analyzer: App Set 1 § Send. SMS. apk § Reads device ID, passes through Echoer, and leaks it via SMS § Echoer. apk § Echoes the data received via an intent § Write. File. apk § Reads physical location (from GPS), passes through Echoer, and writes it to a file 80

Testing Did. Fail analyzer: App Set 2 (Droid. Bench) Int 3 = I(Intent. Sink

Testing Did. Fail analyzer: App Set 2 (Droid. Bench) Int 3 = I(Intent. Sink 2. apk, Intent. Source 1. apk, id 3) Int 4 = I(Intent. Source 1. apk, Intent. Sink 1. apk, id 4) Res 8 = R(Int 4) Graph generated using Graph. Viz. Src 15 = get. Device. Id Snk 13 = Log. i Some taint flows: 81

Limitations § Unsoundness § Inherited from Flow. Droid/Epicc § Native code, reflection, etc. §

Limitations § Unsoundness § Inherited from Flow. Droid/Epicc § Native code, reflection, etc. § Shared static fields § Implicit flows § Currently, only activity intents § Bugs § Imprecision § Inherited from Flow. Droid/Epicc § Did. Fail doesn’t consider permissions when matching intents § All intents received by a component are conflated together as a single source 82

Use of Two-Phase Approach in App Stores § We envision that the two-phase analysis

Use of Two-Phase Approach in App Stores § We envision that the two-phase analysis can be used as follows: § An app store runs the phase-1 analysis for each app it has. § When the user wants to download a new app, the store runs the phase-2 analysis and indicates new flows. § Fast response to user. 83

Did. Fail vs Icc. TA § Icc. TA was developed (at roughly the same

Did. Fail vs Icc. TA § Icc. TA was developed (at roughly the same time as Did. Fail) by: § Li Li, Alexandre Bartel, Jacques Klein, Yves Le Traon (Luxembourg); § Steven Arzt, Siegfried Rasthofer, Eric Bodden (EC SPRIDE); § Damien Octeau, Patrick Mc. Daniel (Penn State). § Icc. TA uses a one-phase analysis § Icc. TA is more precise than Did. Fail’s two-phase analysis. § Two-phase Did. Fail analysis allows fast 2 nd-phase computation. § Future collaboration between Icc. TA and Did. Fail teams? 84

Conclusion § We introduced a new analysis that integrates and enhances existing Android app

Conclusion § We introduced a new analysis that integrates and enhances existing Android app static analyses. § Demonstrated feasibility by implementing a prototype and testing it. § Two-phase analysis can be used by app store to provide fast response. § Future work: § § § Implicit flows Static fields Distinguish different received intents Other data channels (file system, non-activity intents) Etc. 85

Concurrent Modification class Make { private Worklist worklist; public static void main (String[] args)

Concurrent Modification class Make { private Worklist worklist; public static void main (String[] args) { Make m = new Make(); m. initialize. Worklist(args); m. process. Worklist(); } void initialize. Worklist(String[] args) {. . . ; worklist = new Worklist(); . . . } void process. Worklist() { Hash. Set s = worklist. unprocessed. Items(); for (Iterator i = s. iterator(); i. has. Next()) { Object item = i. next(); // CME may occur here if (. . . ) process. Item(item); } } void process. Item(Object i) {. . . ; do. Subproblem(. . . ); } void do. Subproblem(. . . ) {. . . worklist. add. Item(newitem); . . . } } public class Worklist { Hash. Set s; public Worklist() { s = new Hash. Set(); . . . } public void add. Item(Object item) { s. add(item); } public Hash. Set unprocessed. Items() { return s; } }

An Illustrating Example /* 0 */ Set v = new Set(); /* 1 */

An Illustrating Example /* 0 */ Set v = new Set(); /* 1 */ Iterator i 1 = v. iterator(); /* 2 */ Iterator i 2 = v. iterator(); /* 3 */ Iterator i 3 = i 1; /* 4 */ i 1. next(); // The following update via i 1 invalidates the // iterator referred to by i 2. /* 5 */ i 1. remove(); /* 6 */ if (. . . ) { i 2. next(); /* CME thrown */ } // i 3 refers to the same, valid, iterator as i 1 /* 7 */ if (. . . ) { i 3. next(); /* CME not thrown */ } // The following invalidates all iterators over v /* 8 */ v. add(". . . "); /* 9 */ if (. . . ) { i 1. next(); /* CME thrown */ }

Java Project 3: • Read PLDI’ 02 paper Deriving Specialized Program Analyses for Certifying

Java Project 3: • Read PLDI’ 02 paper Deriving Specialized Program Analyses for Certifying Component. Client Conformance • Design a simple abstract domain for an CME • Implement in Soot

SOOT By Joe Palmer Information taken from http: //www. sable. mcgill. ca/soot/tutorial/pldi 03/tutorial. pdf

SOOT By Joe Palmer Information taken from http: //www. sable. mcgill. ca/soot/tutorial/pldi 03/tutorial. pdf

General Overview Developed by Sable Research Group out of Mc. Gill University in 1996

General Overview Developed by Sable Research Group out of Mc. Gill University in 1996 -1997 n Used to optimize Java Bytecode n 4 source languages n 4 intermediate representations used n

Sources Languages Primarily takes Java Source as its input n Can also take: n

Sources Languages Primarily takes Java Source as its input n Can also take: n ¨ SML ¨ Scheme ¨ Eiffel ¨ Scala

I. R. ’s n Baf: n n n Streamlined, stack-based representation of bytecode Abstracts

I. R. ’s n Baf: n n n Streamlined, stack-based representation of bytecode Abstracts type dependent variations of expressions into a single expression Jimple: n n n Stack-less, typed, 3 -Address representation of bytecode Mix between java source and java bytecode Linearization of a single expression into 3 separate statements ¨ n Only 15 jimple instructions are used ¨ n Compared to 200 possible instructions in java bytecode! Shimple: n n Only refers to 3 local vars or conts at once SSA-form version of Jimple Each local var has a single static point of definition (never reassign) Uses Phi-Nodes for control flow Grimp: n n Similar to Jimple but allows trees of expressions together with a representation of a “new” operator Expressions are “aggregated” main IR used!!

Phases of the Optimization

Phases of the Optimization