Comparing the Decompositions Produced by Software Clustering Algorithms
Comparing the Decompositions Produced by Software Clustering Algorithms using Similarity Measurements 2001 IEEE International Conference on Software Maintenance (ICSM'01). Brian S. Mitchell & Spiros Mancoridis Math & Computer Science, Drexel University 1
Motivation Using module dependencies when determining the similarity between two decompositions is a good idea… Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 2
Clustering the Structure of a System (1) Given the structure of a system… Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 3
Clustering the Structure of a System (2) The goal is to partition the system structure graph into clusters… The clusters should represent the subsystems Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 4
Clustering the Structure of a System (3) But how do we know that the clustering result is good? Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 5
Ways to Evaluate Software Clustering Results… Given a software clustering result, we can: Assess it against a mental model Assess it against a benchmark standard Techniques: n n Subjective Opinions Similarity Measurements Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 6
Example: How “Similar” are these Decompositions? Blue Edges: PA M 1 M 5 M 2 M 6 M 3 M 4 M 7 M 8 Similarity still the same… Green Edges: Similarity still the same… Red Edges: Not as similar… M 1 M 2 PB M 3 M 7 M 4 M 5 M 6 M 8 Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu Conclusions: Once we add the red edges the similarity between PA and PB decreases 7
Observations Edges are important for determining the similarity between decompositions Existing measurements don’t consider edges: n n Precision / Recall (similarity) Mo. Jo (distance) Our idea: Use the edges to determine similarity Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 8
Research Objectives Create new similarity measurements that use dependencies (edges) n n Edge. Sim (similarity) Me. Cl (distance) Evaluate the new similarity measurements against Mo. Jo & Precision/Recall Use similarity measurements to support evaluation of software clustering results (see our WCRE’ 01 paper) Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 9
Example: How “Similar” are these Decompositions? PA M 1 M 2 M 3 M 7 M 5 M 6 M 4 M 8 Add Blue Edges: PR, Mo. Jo, Me. Cl & Edge. Sim unchanged. Add Green Edges: PR, Mo. Jo, Me. Cl & Edge. Sim unchanged. M 1 M 2 PB M 3 M 7 M 4 M 5 M 6 M 8 Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu Add Red Edges: PR, Mo. Jo unchanged. Edge. Sim, Me. Cl reduced. 10
Definitions M 1 M 3 M 2 M 4 Internal/Intra-Edge: Edge within a cluster External/Inter-Edge: Edge between two clusters Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 11
Edge. Sim Example a MDG a PA b c d f e b f c d h e j k l l i j g h i g k c PB e a f b g Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu d h k l i j 12
Edge. Sim Example a MDG PA b c d f e j k l b d PB g h e c e a Step 1: Find Common Interand Intra-Edges f c g h i a f b g Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu k l i j d h k l i j 13
Edge. Sim Example a MDG PA b c d f e j k l Common Edge Weight d Total Edge Weight PB = b 10 19 = 53% f c g h i a g h e c e a f b g Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu k l i j d h k l i j 14
Me. Cl Example a MDG a PA b c d f e b f c d h e j k l l i j g h i g k c PB e a f b g Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu d h k l i j 15
Me. Cl Example (A B) A 1 A 2 a b f c PA d B 1 c e a PB k l i j d B 2 g h e A 3 h f i k b g j l Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 16
Me. Cl Example (A 1 B 1) U A 1 A 2 a b f c PA d B 1 c e a PB A 3 k l i j d B 2 a g h e A 1, 1 h f b c i k b g j l Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 17
Me. Cl Example (A 2 B 1) U A 1 A 2 a b f c PA d B 1 c e a PB A 3 k l i j d B 2 h f A 2, 1 a g h e A 1, 1 b c f g i k b g j l Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 18
Me. Cl Example (A 1 B 2) U A 1 A 2 a b f c PA d B 1 c e a PB A 3 k l i j d h f B 2 A 2, 1 a b c g h e A 1, 1 f g A 1, 2 d e i k b g j l Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 19
Me. Cl Example (A 2 B 2) U A 1 A 2 a b f c PA d B 1 c e a PB A 3 k l i j d h f B 2 A 2, 1 a b c g h e A 1, 1 A 1, 2 d f g A 2, 2 e h i k b g j l Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 20
Me. Cl Example (A 3 B 2) U A 1 A 2 a b f c PA d B 1 c e a PB A 3 k l i j d h f g a b B 2 j f c A 1, 2 g A 2, 2 d e i k b A 2, 1 g h e A 1, 1 A 3, 2 h i j k l l Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 21
Me. Cl Example (A B) A 1 A 2 a b f c PA d B 1 c e a PB A 3 k l i j d h f g A 2, 1 a b B 2 j f c A 1, 2 g A 2, 2 d e i k b A 1, 1 g h e B 1 A 3, 2 h i j k l l Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu B 2 22
Me. Cl Example (A B) Newly Introduced Inter-Edges A 1 A 2 a b f c PA d B 1 c e a PB k l i j d h f g A 2, 1 a b B 2 j f c A 1, 2 g A 2, 2 d e i k b A 1, 1 g h e A 3 B 1 A 3, 2 h i j k l l Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu B 2 23
Me. Cl Example (B A) A 1 A 2 a b PA d A 3 k f c i l j B 1, 2 a b B 2, 1 c e a PB d h f e g l h B 2 i i j j k l k b g B 2, 2 d B 1 f c g h e B 1, 1 B 2, 3 Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 24
Me. Cl Example (B A) A 1 A 2 a b f PA d e i l j B 1, 2 a b B 2, 1 c e a PB d h f e g l h B 2 i i j j k l k b g B 2, 2 d B 1 f c g h A 2 B 1, 1 A 3 k c A 1 B 2, 3 Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu A 3 25
Me. Cl Example (B A) Newly Introduced Inter-Edges A 1 A 2 a b k f c PA d A 3 e i B 1, 2 a l j b c e a PB d h f B 2, 1 g l g B 2, 2 e h B 2 i i j j k l k b f c d B 1 A 2 B 1, 1 g h A 1 B 2, 3 Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu A 3 26
Me. Cl Calculation A 1 A 2 a b f c PA d B 1 c e a PB h f Inter-Edges Introduced k l Me. Cl(A B): ({b, e}, {e, c}, {g, h}, {f, h}) i j Me. Cl(B A): ({e, i}, {h, j}, {b, f}, {c, f}, {h, e}) d B 2 g h e A 3 Me. Cl= 1 i max. W(MA B, MB A) Total Edge Weight k b g j l Me. Cl = 1 - Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 5 19 = 73. 7% 27
Similarity Measurement Recap A 1: B 1 : M 1 M 2 M 3 M 7 M 5 M 6 M 4 M 8 M 1 M 2 M 3 M 7 M 4 M 5 M 6 A 2: M 8 B 2 : M 1 M 2 M 3 M 7 M 5 M 6 M 4 M 8 M 1 M 2 M 3 M 7 M 4 M 5 M 6 M 8 P 1 P 2 Mo. Jo(P 1) = Mo. Jo(P 2) = 87. 5% PR(P 1) = PR(P 2) = P: 84. 6%, R: 68. 7%, AVGPR=76. 7% Conclusion… P 1 is equally similar to P 2 Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 28
Similarity Measurement Recap A 1: B 1 : M 1 M 2 M 3 M 7 M 5 M 6 M 4 M 8 M 1 M 2 M 3 M 7 M 4 M 5 M 6 A 2: M 8 P 1 Edge. Sim(P 1)=77. 8% Me. Cl(P 1)=88. 9% B 2 : M 1 M 2 M 3 M 7 M 5 M 6 M 4 M 8 M 1 M 2 M 3 M 7 M 4 M 5 M 6 M 8 P 2 Edge. Sim(P 2)=58. 3% Me. Cl(P 2)=66. 7% Conclusion… P 1 is more similar than P 2 Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 29
Summary: Edge. Sim & Me. Cl Edge. Sim: n n Rewards clustering algorithms for preserving the edge types Penalizes clustering algorithms for changing the edge types Me. Cl: n Rewards the clustering algorithm for creating cohesive “subclusters” Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 30
Special Modules A 1 A 2 a b f c PA d B 1 c e a PB k l i j d B 2 g h e A 3 h f i k b g j Omnipresent Modules: “Strong” Connection to other Modules Library Modules: Always used by other modules, never use other modules Isomorphic Modules: Modules equally connected to other subsystems l Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 31
Special Modules A 1 A 2 a b f k i d c a PB h f k b g l g h k d B 1 k c PA A 3 i Special Treatment of Special Modules helps to determine the Similarity Omnipresent Modules: Removed B 2 Library Modules: Removed Isomorphic Modules: Replicated l Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 32
Case Study Overview Source Code void main() { printf(“hello”); } M 2 M 4 M 5 Precision/ Recall Edge. Sim Clustered Result M 3 Me. Cl M 6 M 7 Similarity Analysis Mo. Jo Clustering Algorithms M 1 Similarity Evaluation Tool M 8 Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu Average, Variance, etc. based on 100 clustering runs… (4950 Evaluations) 33
Case Study Observations All similarity measurements exhibit consistent behavior for the systems studied For all systems examined: If Me. Cl(SA) < Me. Cl(SB) then Mo. Jo(SA) < Mo. Jo(SB), PR(SA) < PR(SB), and Edge. Sim(SA) < Edge. Sim(SB) Removal of “special” modules improved all similarity measurements Treating isomorphic modules specially only improved similarity slightly Edge. Sim and Me. Cl produced higher and less variable similarity values then Precision/Recall and Mo. Jo Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 34
Questions Special Thanks To: n n n AT&T Research Sun Microsystems DARPA NSF US Army Drexel University Software Engineering Research Group (SERG) http: //serg. mcs. drexel. edu 35
- Slides: 35