Object Naming Analysis for Reverse Engineered Sequence Diagrams

Object Naming Analysis for Reverse. Engineered Sequence Diagrams Atanas (Nasko) Rountev Beth Harkness Connell Ohio State University

Example of a UML Sequence Diagram start: X p: A m 1() m 2() m 3() create() n: A m 4() Nasko Rountev - ICSE'05 2

UML Sequence Diagrams § § § Popular UML artifacts for modeling of object interactions Design-time sequence diagrams Reverse-engineered sequence diagrams § Based on existing code § Iterative development; design recovery for software maintenance; software testing § Implemented in some commercial UML tools § § Together Control. Center (Borland) Eclipse. UML (Omondo) Nasko Rountev - ICSE'05 3

Reverse-Engineering Analyses § § § Dynamic analysis: tracks a set of representative run-time executions § Several research tools Static analysis: examines only the code § Commercial tools (deficiencies) § Some research work (not comprehensive) RED tool for Java: PRESTO group at OSU § URL: presto. cse. ohio-state. edu/red § Call chain analysis; control-flow analysis; object naming analysis; visualization and navigation; test coverage measurements Nasko Rountev - ICSE'05 4

Object Naming class X { void m 1(A p) { A a = p. m 2(this) ; a. m 4() ; } void m 3() { … } } class A { A m 2(X q) { q. m 3() ; return new A() ; } void m 4() { … } } start: X p: A q: X a: A m 1 m 2 m 3 create m 4 n: A Nasko Rountev - ICSE'05 5

Object Naming Schemes § § § Based on variable names § Same run-time object could be represented by several diagram objects § Different run-time objects could be represented by the same diagram object § Handling of instance fields Based on points-to analysis § Does not work either Based on a new object naming analysis § “Inspired” by constant propagation analysis Nasko Rountev - ICSE'05 6

Flow of Seed Values class X { void m 1(A p) { A a = p. m 2(this) ; a. m 4() ; } void m 3() { … } } class A { A m 2(X q) { q. m 3() ; return new A() ; } void m 4() { … } } start: X p: A m 1 m 2 m 3 create n: A m 4 Nasko Rountev - ICSE'05 7

Singleton Call Sites § Singleton call site § Only one possible run-time receiver object § Receiver comes from a specific seed value: § § § Formal of the start method (incl. this) A new X() expression that is provably executed at most once How about non-singleton call sites? A a = p. m 2(this) ; if (…) a = new A() ; a. m 4() ; Nasko Rountev - ICSE'05 8

Naming Analysis for Singleton Call Sites § § Goal: static analysis that identifies singleton call sites and their seed values Version 1 of the analysis § Interprocedural dataflow analysis, similar to interprocedural copy constant propagation § § § CFGs, dataflow lattice, dataflow functions Three-phase algorithm; flow- and contextsensitive; an IDE analysis; MVP-precise Version 2 of the analysis § Various enhancements Nasko Rountev - ICSE'05 9

Lattice elements: Lthis, Lp, Ln 1, Ln 2, start_m 2 start_m 1 this Lthis q Lthis p. m 2(this) q. m 3() a=m 2_ret_val m 3_ret a. m 4() m 2_ret_val=new A()2 m 4_ret if(…) a end_m 2 a Ln 2 m 2 rv Ln 2 a=new A()1 a Ln 1 end_m 1 Nasko Rountev - ICSE'05 10

Handling of Fields § § Version 1: conservative treatment § a=b. f or a=C. sf results in value for a Version 2: more precise § Static field C. sf that is not modified by any method reachable from the start method § § Treated as a seed value Instance field f: not modified by any method reachable from the start method § § a=b. f, b. f and the algorithm computes seed value x for b: introduce new seed value x. f Iterative: e. g. could have x. f 1. f 2. f 3. f 4 Nasko Rountev - ICSE'05 11

Experiments § § § 21 subjects components from Java libraries and applications § For each component, the analysis was executed multiple times, once for each (non-trivial) potential start method Implementation § Uses Soot (Sable group, Mc. Gill) § Uses several optimization techniques Experiments: Sun Fire 280 -R, 900 MHz Nasko Rountev - ICSE'05 12

Number of Start Methods Nasko Rountev - ICSE'05 13
![Analysis Running Time [sec] Nasko Rountev - ICSE'05 14 Analysis Running Time [sec] Nasko Rountev - ICSE'05 14](http://slidetodoc.com/presentation_image_h2/928f0b41c394dc58c40b8f8eab26ffa7/image-14.jpg)
Analysis Running Time [sec] Nasko Rountev - ICSE'05 14

% Singleton Call Sites Nasko Rountev - ICSE'05 15

Singleton Call Sites § § § Considered call chain depth up to 5 For 18 of the 21 components: > 75% of the call sites were singleton sites For 7 of the 21 components: > 90% Examined component bigdecimal (55%) § 143 call sites that could not be resolved § 125 of the 143 were legitimately nonsingleton: multiple possible run-time objects Conclusion: typically, the majority of call sites can be represented precisely in the reverse-engineered sequence diagrams Nasko Rountev - ICSE'05 16

Conclusions and Future Work § § § Low-cost, high-precision analysis § Relatively simple to implement § There is some room for improvement For non-singleton call sites: need careful investigation of trade-offs between different approaches § Some preliminary work under way Re-implement in Eclipse and make public, together with the other analyses in RED Nasko Rountev - ICSE'05 17

Questions? Nasko Rountev - ICSE'05 18
- Slides: 18