Revealing Class Structure With Zoomable Concept Lattices Uri
Revealing Class Structure With Zoomable Concept Lattices Uri Dekel Department of Computer Science Technion, Haifa, Israel M. Sc. research supervised by Dr. Yossi Gil
Outline n n n n Introduction Formal Concept Analysis Stage I – Interface Analysis Stage II – Implementation Analysis Stage III – Code Inspection Version Comparison Conclusions, Related & Future Research 10/2/2020 2
Domain n Understanding and analyzing individual Java classes n Interface (black-box) analysis n n n Implementation (white-box) analysis n n n Understanding class structure and role of fields Discovering implementation problems Code review and inspection n n Reducing the learning curve Discovering interface problems Understanding the purpose of each method from its code. Ensuring style, quality, and correctness Discovering code reuse opportunities Version Comparison 10/2/2020 3
Problems n Classes can be very large and complex n n n OOP practices promote use of many methods Meyer’s “shopping list approach” advocates completing the interface with “syntactic-sugar” methods “Rules of software evolution”: The entropy of software artifacts increases with time Delocalisation Definition order not meaningful Fact: A quarter of all public methods are found in classes with more than 100 methods ! 10/2/2020 4
Research Question n Can Formal Concept Analysis (FCA) help alleviate some of these problems? n FCA is a mathematical classification technique n n n Helps discover meaningful data in binary relations Can be visualized with Concept Lattices FCA has been applied to many CS and SW problems n n Automatic modularization Automatic construction and refinement of class hierarchies Reverse engineering complex systems Smart component repositories 10/2/2020 5
Formal Concept Analysis n Input: A context <O, A, R> n n n O is a set of objects A is a set of attributes R is a binary relation between O and A Mapping: Galois Connection n Common attributes of a set of objects: n Common objects of a set of attributes: Output: Concepts s. t. 10/2/2020 6
FCA Example n Field-accesses context of a class n Objects are fields, attributes are methods, relation specifies which methods access each field Context: Concepts: 10/2/2020 7
Concept Lattices n Partial order: n n Defines domination between concepts Visualized as a concept lattice 10/2/2020 8
Interpreting Class Lattices n We use only sparse lattices n n Interpretation: n n Economical but equivalent representation Each object introduced in lowest concept Each attribute introduced in highest concept Each method uses all fields introduced in the same concept or below Reveals: n n Possible restructuring Asymmetry between coordinates 10/2/2020 9
Field-Accesses Context n Field usage is critical for understanding a class n n Can be calculated directly from the. class file n n n All implementations of an operation use the same fields Representation changes are rare Methods that use the same combination are related Allows some reverse engineering without source code Calculated using standard static analysis Currently restricted to accesses inside the class 10/2/2020 10
Zoom-in Zoom-out approach n Problems: n Concept lattices can be very large n n n Number of concepts is bound by Polynomial for most real-life contexts Linear for 99. 5% of classes! Elaborate member details are cumbersome Solution: n Provide (semi-) automatic zoom in/out tools 10/2/2020 11
Running Example n The Molecule class from CDK n CDK: Chemistry Development Kit n n Open source library of chemistry related classes Developed at the Max Plank institute in Germany Used in chemistry visualization applications Why the Molecule class? n n Has a large interface (nearly 75 public members) The represented entity is familiar to most people Our methodology revealed several new bugs and issues ! n Methodology was successfully applied to other classes as well 10/2/2020 12
Stage I: Interface Analysis “Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to produce bigger and better idiots. So far, the universe is winning…” --Rich Cook “There are only two industries that refer to their customers as ‘users’…” -- Edward Tufte
Interface Analysis n Purpose: n n Understand the functionality provided by the class Map expectations into interface members n n Discover problems n n The “concept assignment” or “feature mapping” problems e. g. missing or superfluous functionality, exposed implementation details, inconsistent naming Methodology: n Methods are partitioned into concepts n n n Heuristic for automatic feature categorization Zoom-out and reason about overall structure Zoom-in and examine specific functionalities 10/2/2020 14
Preliminaries n n Mapping features to interface members requires knowing what the features are Tasks: n n Surmising abstraction, purpose and role Determining vocabulary Predicting mandatory- and non-mandatory functionality Information sources: n n Domain-specific knowledge Class environment n n E. g. hierarchy, dependencies, etc. This step is not unique to concept analysis 10/2/2020 15
Context Selection n Only client-visible methods should be used n n All fields are kept to ensure a correct partitioning n n Public methods by default, protected if client is subclass, default if client is in the same package Will be removed after the lattice is constructed Context parameters: (boldface indicates selection) (bold indicates our selection, Φ represents”don’t care” ) 10/2/2020 16
Constructing the Lattice n The lattice is too cluttered to grasp immediately n n We start zooming-out Layers correspond to levels of abstraction 10/2/2020 17
Simplifying concepts n We summarize the responsibilities of each concept in a quick skim over method signatures n n This process cannot be fully-automated at present Still too cluttered ! 10/2/2020 18
Naming Concepts n Name concepts based on summary n Use symbolic representations for common responsibilities 10/2/2020 19
Horizontal Decomposition n Remove top- and bottom- concepts n Connected components are orthogonal n n n Problem with title (on the right) becomes obvious Abundance of trivial components implies record-like behavior Cohesive component requires further analysis 10/2/2020 20
Abstraction Lattice n Heuristic for clustering concepts n Concepts dominated by the same top-layer concepts belong in the same cluster 10/2/2020 21
Match services against expectations n Functionality search order: n n For each functionality: n n Expected mandatory features Expected non-mandatory features Unexpected features Mark relevant clusters Mark relevant concepts Examine each concept Example: n Bond management 10/2/2020 22
Stage II – Implementation Analysis "There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. ” C. A. R. Hoare
Implementation Analysis n Purpose: n n Understand implementation and structure. Discover problems n n e. g. redundant fields, bad naming conventions, wronglyimplemented operations Methodology: n Code is not inspected at this stage! n n All information derived from lattice Zoom-in: n n n Including private fields and methods Listing full signatures and introducing classes Embedded call-graph 10/2/2020 24
Embedded Call Graph n Superposition of call-graph on concept lattice n A semantics-based CG layout heuristic n n Helps investigate relations between methods n n n Keeps related methods together while reducing crossings e. g. surmise level of abstraction or discover wrappers Used later for selecting an order for code inspection Example: ECG of Pnt 3 D 10/2/2020 25
Investigate Fields n Examine unused fields n n Discover the roles of fields n n Might indicate unimplemented stubs or dead structure Easy for trivial components Harder for the cohesive one Investigate interdependency Naming quality 10/2/2020 26
Investigate Special Methods n Methods that (should) use the entire state should be in the top concept n n n Exceptions can indicate problems Zoom-in by adding declaring class details Examine methods that do not use fields n e. g. discover undeclared statics 10/2/2020 27
Investigate Other Methods n Ensure symmetry where expected n n n e. g. C 11 and C 13, C 10 and C 14, C 16 and C 17 Ensure methods use expected access patterns Add non-public methods to lattice 10/2/2020 28
Stage III – Code Inspection “Real programmers don't document. If it was hard to write, it should be hard to understand…” --Anonymous “Real programmers can write assembly code in any language…” --Larry Wall
Code Inspection n Purpose: n n n Understand functionality which is unclear after the previous stages. Ensure quality of code and style Methodology: n Select an order for effective reading n n n Maximizing reading throughput Maximizing discovered defects Minimizing repetitions 10/2/2020 30
Code Inspection Problem n Original source code order not effective n Co-definitions. n n n No incremental order All class members are defined simultaneously Perturbations to intended order n n n Evolution and maintenance Language issues (e. g. inheritance) Style issues (e. g. public before private) 10/2/2020 31
Reading Strategy n n n Organize methods into groups of related functionality and order these groups (global order) Order the methods inside each group (local order) Each concept is a group n Same-concept methods are similar in purpose, semantics and implementation n n Increased prospects of understanding differences between methods and discovering redundancies and replications Less infrastructure (e. g. external libraries) to memorize 10/2/2020 32
Reading Strategy n Global order (by importance) n Read each HD component separately n n Read concepts in ascending order of layers n n n Exploit similar level of abstraction Read concepts of the same cluster together Local order (by importance) n Read methods in topological order n n n Each represents an independent functionality Use restricted ECG Read methods in same ECG component together Resolve equivalencies with “simplest-first” rule 10/2/2020 33
Inspection Tasks n Inspection tasks customized for our reading order n Finding duplicate services inside a concept n n Identifying code-sharing opportunities n n e. g. overloads of add. Bond Verify that low-level methods are not bypassed n n e. g. get. Degree and get. Bond. Count e. g. get. Bond. Count, get. Bond. At An addition to “standard” inspection tasks 10/2/2020 34
Version Comparison “Zero defects: The result of shutting down a production line…” --Kelvin Throop III, "The Management Dictionary"
Version Comparison n n Examine an outline of the differences before the actual details Example: Differences between the original version of the “Graph” class of VGJ (Visualizing Graphs with Java) and the Technion adaptation of that class. Originals appear in bold font, Modifications appear in plain font n Also useful for subclass/superclass comparisons 10/2/2020 36
Related- and Future- Research
Related Research n Formal Concept Analysis n Many applications for n n n Automatic class hierarchy construction Automatic Modularization Reverse engineering and program understanding Management of component repositories Understanding individual classes n n Class blueprints (M. Lanza and S. Ducasse) Not much else at the class level 10/2/2020 38
Research Directions n Extensions to Current methodology n Conducting user studies n n n Integration with development or browsing tools n n Validating the methodology Discovering new tools e. g. Eclipse or IBM’s documentation enhancer We currently have a non-interactive prototype New zoom-in and zoom-out tools Using other classification criteria n e. g. use of types, name-based classification 10/2/2020 39
Research Directions (cont. ) n Common Programming Practices n n n Defining a lattice-based suite of class metrics “Lattice Patterns” Other directions of research n Using nano-patterns to annotate methods n n Applicability to class design in CASE tools n n Marking functionality directly on lattice. Interactive class diagram editor based on concept lattice Methods are connected to fields and hence assigned some semantics. Automatic assignment of Nano-patterns Dealing with multiple classes 10/2/2020 40
The End “Theory is when you know something, but it doesn't work. Practice is when something works, but you don't know why. Programming combines theory and practice: Nothing works and you don't know why…” -- Anonymous
- Slides: 41