OORPT CHAPTER OORPT ObjectOriented Reengineering Patterns and Techniques
OORPT — CHAPTER OORPT Object-Oriented Reengineering Patterns and Techniques X. CHAPTER Prof. O. Nierstrasz © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz
OORPT — CHAPTER Roadmap © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 2
OORPT — CHAPTER © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 3
OORPT — CHAPTER Program Comprehension & Software Evolution [Lightweight] Principles and [real] Practice Michele Lanza Faculty of Informatics University of Lugano Switzerland © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz
OORPT — CHAPTER Prologue > > > Reverse engineer 1’ 200’ 000 lines of C++ code in ca. 2300 classes * 2 = 2’ 400’ 000 seconds / 3600 = 667 hours / 8 = 83 working days 83 days / 5 = 16 working weeks and 3 days ~ 4 months > Questions: Once upon a time… — What is the size and the overall structure of the system? — What is the internal structure of the system and its elements? — How did the software system become like that? © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 5
OORPT — CHAPTER The Life Cycle of Software Systems ? Requirement s Analysis Design Issues • Tool support • Scalability • Flexibility Time © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz Implementati on 6
OORPT — CHAPTER Object-Oriented Reverse Engineering ? > Goal: take a (large legacy) software system and “understand” it, i. e. , construct a mental model of the system ? > Problem: the software system in question is — — Unknown, very large, and complex Domain- and language-specific Seldom documented or commented “In bad shape” © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 7
OORPT — CHAPTER Object-Oriented Reverse Engineering (II) > ? Constructing a mental model requires information about the system: — Top-down approaches — Bottom-up approaches — Mixed Approaches There is no “silver bullet” methodology > Every reverse engineering situation is unique > Need for flexibility, customizability, scalability, and simplicity > © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 8
OORPT — CHAPTER Reverse Engineering Approaches ? >Reading (source code, documentation, UML diagrams, comments) >Running the SW and analyze its execution trace >Interview users and developers (if available) >Clustering © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz >Concept Analysis >Software Visualization >Software Metrics >Slicing and Dicing >Querying (Database) >Data Mining >Logic Reasoning >… 9
OORPT — CHAPTER The “Information Crystallization” Problem ? > Many approaches generate too much or not enough information > The reverse engineer must make sense of this information by himself > We need the right information at the right time © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 10
OORPT — CHAPTER . . take a step back. . block the ground. . think about it. . > > ! The information needed to reverse engineer a legacy software system resides at various levels We need to obtain and combine — Coarse-grained information about the whole system — Fine-grained information about specific parts — Evolutionary information about the past of the system © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 11
OORPT — CHAPTER Contents > Polymetric Views > Software Visualization vs. Reverse Engineering — Coarse-grained — Fine-grained — Evolutionary — Dynamic Information > Discussion > Demos © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 12
OORPT — CHAPTER A Solution - The Polymetric View > A lightweight combination of two approaches: — Software visualization (reduction of complexity, intuitive) — Software metrics (scalability, assessment) > Interactivity (iterative process, silver bullet impossible) > Does not replace other techniques, it complements them: — “Opportunistic code reading” © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 13
OORPT — CHAPTER The Polymetric View - Principles Entities > Visualize software: — entities as rectangles — relationships as edges > Relationships width metric Enrich these visualizations: — Map up to 5 software metrics on a 2 D figure — Map other kinds of semantic information on nominal colors © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 2 position metrics color metric height metric 14
OORPT — CHAPTER The Polymetric View - Example … System Complexity View Nodes = Classes Edges = Inheritance Relationships Width = Number of Attributes Height = Number of Methods Color = Number of Lines of © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 15
OORPT — CHAPTER The Polymetric View - Example (II) … System Complexity View Nodes = Classes Edges = Inheritance Relationshi ps Width = Height = Color = code # attributes # methods # lines of Reverse engineering goals • Get an impression (build a first raw mental model) of the system, know the size, structure, and complexity of the system in terms of classes and inheritance hierarchies • Locate important (domain model) hierarchies, see if there any deep, nested hierarchies • Locate large classes (standalone, within inheritance hierarchy), locate and classes ©stateful Stéphaneclasses Ducasse, Serge Demeyer, with Oscar Nierstrasz behaviour View-supported tasks • Count the classes, look at the displayed nodes, count the hierarchies • Search for node hierarchies, look at the size and shape of hierarchies, examine the structure of hierarchies • Search big nodes, note their position, look for tall nodes, look for wide nodes, look for dark nodes, compare their size and shape, “read” their name => opportunistic code reading 16
OORPT — CHAPTER The Polymetric View - Description … System Complexity View > Every polymetric view is described according to a common pattern > Every view targets specific reverse engineering goals > The polymetric views are implemented in Code. Crawler Structural Specification Target. . . Scope. . Metrics. . . . . Layout. . . Description. . . . . Goals ……………………. . ……………… Symptoms …………. . ……………… Scenario Case Study ……………………. . © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 17
OORPT — CHAPTER Coarse-grained Software Visualization > Reverse engineering question: — What is the size and the overall structure of the system? > Coarse-grained reverse engineering goals: — — — Gain an overview in terms of size, complexity, and structure Asses the overall quality of the system Locate and understand important (domain model) hierarchies Identify large classes, exceptional methods, dead code, etc. … © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 18
OORPT — CHAPTER Coarse-grained Polymetric Views Example LOC NOS © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz Method Efficiency Correlation View Nodes: Methods Edges: Size: Number of method parameters Position X: Number of lines of code Position Y: Number of statements Goals: • Detect overly long methods • Detect “dead” code • Detect badly formatted methods • Get an impression of the system in terms of coding style • Know the size of the system in # methods 19
OORPT — CHAPTER Code. Crawler Demo © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 20
OORPT — CHAPTER Clustering the Polymetric Views First Contact Candidate Detection System Hotspots System Complexity Root Class Detection Implementation Weight Distribution Data Storage Class Detection Method Efficiency Correlation Direct Attribute Access View Method Length Distribution Inheritance Assessment Class Internal Inheritance Classification Inheritance Carrier Intermediate Abstract The Class Blueprint © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 21
OORPT — CHAPTER Coarse-grained SV - Conclusions >Benefits —Views are customizable (context…) and easily modifiable —Simple approach, yet powerful —Scalability >Limits —Visual language must be learned © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 22
OORPT — CHAPTER Fine-grained Software Visualization > Reverse engineering question: — What is the internal structure of the system and its elements? > Fine-grained reverse engineering goals: — — — Understand the internal implementation of classes and class hierarchies Detect coding patterns and inconsistencies Understand class/subclass roles Identify key methods in a class … © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 23
OORPT — CHAPTER The Class Blueprint - Principles Initialization External Interface Internal Implementation Accessor Attribute Invocation Sequence • The class is divided into 5 layers • Nodes • Methods, Attributes, Classes • Edges • The method nodes are positioned according to • Layer • Invocation sequence • Invocation, Access, Inheritance © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 24
OORPT — CHAPTER The Class Blueprint - Principles (II) # invocations Method # lines # external accesses Attribute # internal accesses Abstract Method Constant Method Overriding Method Read Accessor Delegating Method Write Accessor Extending Method Attribute Method Invocation © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz Direct Attribute Access 25
OORPT — CHAPTER The Class Blueprint - Example > Delegate: — Delegates functionality to other classes — May act as a “Façade” (DP) > Large Implementation: — Deep invocation structure — Several methods — High decomposition Wide Interface > Direct Access > Sharing Entries > © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 26
OORPT — CHAPTER The Class Blueprint - A Pattern Language? > The patterns reveal information about — Coding style — Coding policies — Particularities > We grouped them according to — — — Size Layer distribution Semantics Call-flow State usage © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz >Moreover… —Inheritance Context —Frequent pattern combinations —Rare pattern combinations >They are all part of a pattern language 27
OORPT — CHAPTER The Class Blueprint - Example (II) > Call-flow — Double Single Entry — (=> split class? ) > Inheritance — Adder — Interface overriders > Semantics — Direct Access > State Usage — Sharing Entries © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 28
OORPT — CHAPTER The Class Blueprint - What do we see? © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 29
OORPT — CHAPTER Code. Crawler Demo © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 30
OORPT — CHAPTER Fine-grained SV - Conclusions > Benefits — Complexity reduction — Visual code inspection technique — Complements the coarse-grained views > Limits — Visual language must be learned — Good object-oriented knowledge required — No information about actual functionality => opportunistic code reading necessary © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 31
OORPT — CHAPTER Evolutionary Software Visualization > Reverse engineering question: — How did the software system become like that? > Evolutionary reverse engineering goals: — Understand the evolution of OO systems in terms of size and growth rate — Understand at which time an element, e. g. , a class, has been added or removed from the system — Understand the evolution of single classes — Detect patterns in the evolution of classes —… © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 32
OORPT — CHAPTER The Evolution Matrix - Principles First Version 2. . Version (n - 1) Last Version Removed Classes Added Classes Growth Phase Stagnation Phase Time (Versions) © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 33
OORPT — CHAPTER The Evolution Matrix - Principles (II) > # methods >The Evolution Matrix reveals patterns —The evolution of the whole system (versions, growth and stagnation phases, growth rate, initial and final size) —The life-time of classes (addition, removal) Class > © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz Moreover, we enrich the evolution matrix view with metric information # attributes This allows us to see patterns in the evolution of classes 34
OORPT — CHAPTER The Evolution Matrix - Pattern Language Pulsar • Repeated Modifications make it grow and shrink. • System Hotspot: Nearly every new system version requires changes. • No “cheap class” Time (Versions) Supernova • Suddenly increases in size, possible reasons: • Massive shift of functionality towards a class. • Data storage class • Developers knew what to fill in. © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 35
OORPT — CHAPTER The Evolution Matrix - Pattern Language (II) White Dwarf • Lost the functionality it had and now trundles along without real meaning. • Possibly dead code. Red Giant • A permanent god class which is always very large Idle • Keeps size over several versions. • Possibly dead code, possibly good code. © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz Time (Versions) 36
OORPT — CHAPTER The Evolution Matrix - Pattern Language (III) Dayfly Persistent • Exists during only one or two versions. • Perhaps an idea which was tried out and then dropped. • Has the same lifespan as the whole system. • Part of the original design. • Perhaps holy dead code which no one dares to remove. © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 37
OORPT — CHAPTER The Evolution Matrix - Example © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 38
OORPT — CHAPTER Evolutionary Software Visualization Demo © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 39
OORPT — CHAPTER Evolutionary SV - Conclusions >Benefits —Complexity reduction >Limits —Scalability (can be solved) —Rename problem (can be solved) —Relative changes hard to see (can be solved) © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 40
OORPT — CHAPTER Run-Time Analysis Problems and Challenges > > > RTA and Reverse Engineering - useful (in combination with static information)? Procedural RTA vs. Object-Oriented RTA OO RTA - Conceptual problems — — > Polymorphism and late-binding Inheritance and incremental class definition Functionality (features) spread over the system Which trace to generate? How? Technical challenges and constraints — — Instrumentation problem (logging, VM patching, wrapping, . . ) Amount, density, and noise of generated information (Thousands of events in a few seconds. . ) Granularity of information (object instantiations, message sends, attribute accesses, . . ) How much can we automate? © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 41
OORPT — CHAPTER RTA - Questions > > Can we merge the dynamic information with static information? Can we use a ‘’successful’’ static technique like polymetric views in RTA? — — — — What are the most instantiated classes? Are there any singletons? Which classes are object factories? What is the percentage of actually used methods in classes? Memory consumption? Speed bottlenecks? … © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 42
OORPT — CHAPTER Case Study and Experiment Setup > Case Study: Moose, our reengineering environment — Implementation language: Smalltalk — Age: 6 years — Size: >250 classes and >3500 methods and a test suite of more than 280 unit tests (a veritable legacy system ; -) > Setup — Code instrumentation using Method. Wrappers — Trace Scenario(s) given by the Unit test suite — Wrapping down to method body level > > During trace-time we record events and increase counters Afterwards we map the counter values as metrics © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 43
OORPT — CHAPTER Run-time Measurements > NCM, the number called methods > NMI, the number of method invocations > NCI, the number of created instances, that is the number of times a > > class has been instantiated NCO, the number of created objects, that is the number of ‘foreign’ objects that a class’s objects instantiated Condensed information leads to greater scalability Tradeoff with granularity and sequence of a trace Interval of the values can be great (logarithmic scaling useful) © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 44
OORPT — CHAPTER Instance Usage Overview Nodes Classes Edges Inheritance Metric Scale Logarithmic Layout Tree Node Width # of Created Instances Node Height # of Symptoms Classes Called Methods Small, light: unused A: CDIFScanner Node Color # of Narrow, tall: few, but used, B: Attribute. Description (3500 instances, Method Invocations instances 350’ 000 calls!) Flat, pale: heavily instantiated, C: FAMIX metamodel root seldom used G: Uninstantiated FAMIX classes (!) Flat, dark: heavily instantiated, I: Smalltalk AST Visitor hierarchy functionality partially but used ©heavily Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 45
OORPT — CHAPTER Creation Interaction View Nodes Classes Edges Instantiation Metric Scale Logarithmic Layout Embedded Spring Node Width # of Created Objects Node Height # of Created Instances Symptoms Node Color # of Unconnected: uninstantiated Created Instances Connected, small: classes with Edge Width # of few instances Instantiations Flat, light: instance creators, seldom instantiated, possibly factories Narrow, dark: heavily instantiated, but do not create many other instances Class dark: Examples Wide, heavily instantiated A: Attribute. Description - C: VWImporter (high-level import), D: VWParse. Tree. Enumerator (low-level and used import) E: FAMIXClass, . . Method, . . Attribute, etc. F: FAMIXAccess, FAMIXInvocation - G: (short-lived objects) ©MSEMeasurement Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 46
OORPT — CHAPTER Dynamic Information SV - Conclusions >Pros —Some new views on software systems —Intuitive and compact way of presenting very large amounts of information —Insights into implementation issues —Side result: assessment of test suite © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz > Cons — Loss of granularity and order — Suitability for optimization domain unclear — Probably does not really scale up for very large systems (but this depends on the viewer and his/her will to interact. . ) — The current approach is intrinsically interactive (automatisation would be possible using advanced metrics-based techniques like detection strategies) 47
OORPT — CHAPTER What about reality? > Most IDEs have no or limited visualization support > Not an industry “standard”, most developers still have vi & emacs mentality > Still poor usability — May be used as “stand-alone” browsing tool, but not as part of a development metholodogy — Needs much more effort (and people) to be “sexy” > Ongoing work must cope with the “present hypes”, such as distributed development, e. Xtreme programming, etc. © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 48
OORPT — CHAPTER Epilogue The End >Did we succeed after all? >Not completely, but… —System Hotspots View on 1. 200’ 000 LOC of C++ —System Complexity View on ca. 200 classes of C++ © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 49
OORPT — CHAPTER Industrial Validation - The Acid Test Several large, industrial case studies (NDA) Different implementation languages Severe time constraints > > > System Language Z C++ Y Lines of Code Classes 1’ 200’ 000 ~2300 C++/Java 120’ 000 ~400 X Smalltalk 600’ 000 ~2500 W COBOL 40’ 000 - Sortie C/C++ 28’ 000 ~70 Duploc Smalltalk 32’ 000 ~230 Jun Smalltalk 135’ 000 ~700 Argo. UML Java 220’ 000 ~1400 © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 50
OORPT — CHAPTER Questions and Comments © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz Let’s do it… 51
OORPT — CHAPTER License > http: //creativecommons. org/licenses/by-sa/2. 5/ Attribution-Share. Alike 2. 5 You are free: • to copy, distribute, display, and perform the work • to make derivative works • to make commercial use of the work Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. • For any reuse or distribution, you must make clear to others the license terms of this work. • Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above. © Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz 52
- Slides: 52