Component Rank Relative Significance Rank for Software Component

Source. Forge • Large open source software development web site • Version control, communication

Motivation • Numerous software systems are being developed day by day • Similar components

Automated Component Library • Collect software components eagerly without preserving their inherent structures •

Component Graph System Y System X A B F C D G E H

Weight of Nodes System Y System X 0. 1 A B C 0. 1

Weights of Edges 0. 05 0. 2 d=1/4 0. 2 A d=1/4 0. 05

Definition of Weights • Under constraints (1)~(3), we have a simultaneous equation = W:

Propagating Weights 0. 34 0. 17 A 0. 33 B 0. 17 0. 33

Propagating Weights 0. 33 0. 175 A 0. 17 B 0. 175 0. 17

Propagating Weights 0. 5 0. 25 A 0. 175 B 0. 25 0. 345

Propagating Weights 0. 4 0. 2 A B 0. 2 0. 4 C •

Markov Model 0. 02 0. 01 0. 05 0. 03 0. 001 0. 1

Adjustment to Software Products(1) Pseudo Use Relation A B C • Weight computation does

Adjustment to Software Products(2) Clustering Components C G B F A D component graph

Prototype System SMMT measures similarity input by clone detection technique measure similarity by SMMT

Experiment 1 JDK 1. 3. 0 575, 000 lines, 1877 components 7 minutes on

Experiment 2: Collection of SE Tools and Libraries – CK metrics measurement tools, component

Experiment 3: Application to Industry • Daiwa computer: a middle size software company in

Experiment 4: Document Processing Tools and Libraries • JEDIT, jext, Enhydra, saxon, phex, JDK,

Discussion 1: Weight Computation Reference Count Model Component Rank Model 0. 2 B 0.

Discussion 2: Clustering Policy (1) • Eliminate effect of simply duplicated components A A

Discussion 2: Clustering Policy (2) • Count only reused components which are not simple

Discussion 3: Similarity Criterion and Pseudo Use Relation • Similarity criterion t: 0. 8

Related Works • Markov models of documentation traversal – Influence Weight: impact factor of

S P A R S-J Software Product Archiving, Analyzing and Retrieving System for Java

Conclusion & Future Work • Component Rank: a novel model for software component •

END Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology,

Global Analysis of Software Data Analysis Data on the Internet 　　　　　 Collection

Weight Computation by Eigenvector • W is the eigenvector of eigenvalue 1 – math

Slides: 35

Download presentation

Component Rank: Relative Significance Rank for Software Component Search Katsuro Inoue, Reishi Yokomori, Hikaru Fujiwara, Tetsuo Yamamoto, Makoto Matsushita, and Shinji Kusumoto Osaka University Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Source. Forge • Large open source software development web site • Version control, communication support, . . . Hosted Projects: Registered Users: 60, 888 613, 792 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2

Motivation • Numerous software systems are being developed day by day • Similar components (libraries, portions of codes, or abstracted algorithms, . . . ) might be independently developed in different projects • Key factor for high productivity and reliability in today’s software development – Reuse • Exploring large software libraries is not easy – Little support to search components – Consistent management by human hand is difficult Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 3

Automated Component Library • Collect software components eagerly without preserving their inherent structures • Analyze relations among components by using various analysis techniques • Rank the components based on their Component Rank Model significance • Answer user’s queries according to the rank Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 4

Component Graph System Y System X A B F C D G E H I component use relation Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 5

Weight of Nodes System Y System X 0. 1 A B C 0. 1 D 0. 1 0. 2 E 0. 1 0. 05 H F 0. 1 G 0. 2 I 0. 05 sum of all node weights = 1. . . (1) weight of node represents significance of node Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 6

Weights of Edges 0. 05 0. 2 d=1/4 0. 2 A d=1/4 0. 05 d=1/4 0. 05 B 0. 4 0. 15 d: distribution ratio • Node weight is distributed to each outgoing edge • Edge weights are collected at the destination node sum of all outgoing edge weights = origin node weight. . . (2) sum of all incoming edge weights = destination node weight. . . (3) Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 7

Definition of Weights • Under constraints (1)~(3), we have a simultaneous equation = W: node weight vector . Dt: transposed matrix of distribution ratios Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 8

Propagating Weights 0. 34 0. 17 A 0. 33 B 0. 17 0. 33 C Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 9

Propagating Weights 0. 33 0. 175 A 0. 17 B 0. 175 0. 17 0. 5 C Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 10

Propagating Weights 0. 5 0. 25 A 0. 175 B 0. 25 0. 345 0. 175 0. 345 C Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 11

Propagating Weights 0. 4 0. 2 A B 0. 2 0. 4 C • Stable weight assignment next-step weights are the same as previous ones • Component Rank : order of nodes sorted by the weight Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 12

Markov Model 0. 02 0. 01 0. 05 0. 03 0. 001 0. 1 • Component rank model can be considered as a Markov Chain of user's focus • User's focus moves from one component to another along a use relation at a fixed time duration • Node weight represents the existence probability of the user's focus at infinite future Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 13

Adjustment to Software Products(1) Pseudo Use Relation A B C • Weight computation does not always converge • Add a pseudo edge from a node to another, if there is no 'real' edge • Distribution ratios: pseudo edges << real edges Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 14

Adjustment to Software Products(2) Clustering Components C G B F A D component graph C G BF E AD E clustered component graph Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 15

Prototype System SMMT measures similarity input by clone detection technique measure similarity by SMMT • inheritance • method call • attribute access • abstract class impl extract use relation . java file = component similarity criterion t=0. 8 (80% statements are the construct clustered cluster same) component graph similar components weight ratio p between real and pseudo edges : 0. 85 output de-cluster to compute original components node weights component ranks equal distribution ratios d to outgoing edges Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 16

Experiment 1 JDK 1. 3. 0 575, 000 lines, 1877 components 7 minutes on PC (Pentium IV, 2 GHz, 2 GB) rank • Very general and core classes : ranked high • Specific and independent classes: ranked low class name 1 java. lang. Object 2 java. lang. Class 3 java. lang. Throwable 4 java. lang. Exception 5 java. io. IOException 6 java. lang. String. Buffer 7 java. lang. Security. Manager 8 java. io. Input. Stream 9 java. lang. reflect. Field 10 java. lang. reflect. Constructor. . . 1256 sunw. util. Event. Listener. . . 1256 weight 0. 16126 0. 08712 0. 05510 0. 03103 0. 01343 0. 01214 0. 01169 0. 01027 0. 00948 0. 00936. . . 0. 00011. . . superclass of all classes superclass of any error or exception handler these 622 classes are not used by any other classes Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 17

Experiment 2: Collection of SE Tools and Libraries – CK metrics measurement tools, component rank system – ANTLR, JAMA, Caffe Cappuccino rank – 582 components class name 1 antlr. Token 2 antlr. debug. Event 2 antlr. debug. New. Line. Event 4 antlr. collections. impl. Vector 5 jp. gr. java_conf. keisuken. text. html. Html. Parameter 6 jp. gr. java_conf. keisuken. net. server. Server. Properties 7 Jama. Matrix 8 jp. gr. java_conf. keisuken. util. Integer. Array 8 jp. gr. java_conf. keisuken. util. Long. Array 10 jp. ac. osaka_u. es. ics. iip_lab. metrics. parser. Identifier. Info. . . 418 cktool_new. examples. Main weight 0. 10727 0. 06189 0. 05434 0. 05246 0. 03699 0. 01564 0. 01390 0. 01365. . . 0. 00050 Indicator of generality and specialty w. r. t. usage from other classes Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 18

Experiment 3: Application to Industry • Daiwa computer: a middle size software company in Osaka • Shared Java application framework for web-based data management • Framework+ 5 applications on framework – 1538 components, 339 clustered nodes • Classes in the framework and definitions of data structure are ranked high Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 19

Experiment 4: Document Processing Tools and Libraries • JEDIT, jext, Enhydra, saxon, phex, JDK, etc. (7171 components) • Perform string search by grep command with keyword get. Nodetype order sorted by rank class name 1(67) enhydra 3. 1. . . dom. Node 2(169) saxon 7_0. . . saxon. om. Node. Info 3(275) saxon 7_0. . . saxon. pattern. Node. Test 4(316) enhydra 3. 1. . . dom. Document. Impl 5(355) saxon 7_0. . . saxon. pattern. Pattern 6(382) saxon 7_0. . . saxon. Controller 7(437) enhydra 3. 1. . . xslt. XSLTEngine. Impl 8(446) enhydra 3. 1. . . dom. Element. Impl 9(500) saxon 7_0. . . saxon. style. Style. Element 10(506) saxon 7_0. . . saxon. tree. Node. Impl. . . 125(4441) enhydra 3. 1. . . Func. ID. . . 125(4441) weight method definitions 0. 029110 of obtaining node 0. 000969 kinds in DOM tree 0. 000437 0. 000368 0. 000324 0. 000296 0. 000241 0. 000235 0. 000202 We can easily find 0. 000198 the core definitions. . . of classes 0. 000029. . . Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 20

Discussion 1: Weight Computation Reference Count Model Component Rank Model 0. 2 B 0. 31 B 0. 6 A 0. 33 A E D C 0 0 0. 2 0. 03 0. 30 Fragile to locally-made references, which may not be important globally More stable to local references Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 21

Discussion 2: Clustering Policy (1) • Eliminate effect of simply duplicated components A A X B B Y original copy others Clustering 0. 25 A X B Y 0. 25 same weight arrangement as the case with no duplicated components Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 22

Discussion 2: Clustering Policy (2) • Count only reused components which are not simple duplicated A A X B C Y original modified others Clustering 0. 3 0. 2 A X B C Y 0. 15 0. 2 A's weight is higher than others Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 23

Discussion 3: Similarity Criterion and Pseudo Use Relation • Similarity criterion t: 0. 8 – Resulting ranks are fairly insensitive to t – Some inherently-different components are in the same cluster if t is less than 0. 8 • Pseudo use relation ratios p: 0. 85 – Resulting ranks are stable between 0. 75 - 0. 95 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 24

Related Works • Markov models of documentation traversal – Influence Weight: impact factor of journal publication thought incoming references – Page Rank: weight of HTML in the Internet through incoming web links Explicit use relations No clustering (important for software products) • Measurement reusability of components or interfaces – Use various characteristic metrics – Indirect indicator of reusability – Our approach directly reflects usage of components Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 25

S P A R S-J Software Product Archiving, Analyzing and Retrieving System for Java Analyzer and Evaluator Component Collector Internet / Corporate Repositories Query Handler Software Component Searcher Component Archive SPARS-J Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 26

Conclusion & Future Work • Component Rank: a novel model for software component • Prototype system for Java • Application to various collections of Java programs : promising results • Developing SPARS-J • Statistical evaluation (recall & precision) • Practical evaluation using SPARS-J • Other models (weight distribution, similarity, . . . ) Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 30

END Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 31

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 32

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 33

Global Analysis of Software Data Analysis Data on the Internet 　　　　　 Collection Feedback Subsidiary Company Data Company-Wide Project Data Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 34

Weight Computation by Eigenvector • W is the eigenvector of eigenvalue 1 – math package for the eigenvector computation can be used, but generally slower then the propagation computation = W: node weight vector . Dt: transposed matrix of distribution ratios Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 35