Component Rank Relative Significance Rank for Software Component
- Slides: 35
Component Rank: Relative Significance Rank for Software Component Search Katsuro Inoue, Reishi Yokomori, Hikaru Fujiwara, Tetsuo Yamamoto, Makoto Matsushita, and Shinji Kusumoto Osaka University Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Source. Forge • Large open source software development web site • Version control, communication support, . . . Hosted Projects: Registered Users: 60, 888 613, 792 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2
Motivation • Numerous software systems are being developed day by day • Similar components (libraries, portions of codes, or abstracted algorithms, . . . ) might be independently developed in different projects • Key factor for high productivity and reliability in today’s software development – Reuse • Exploring large software libraries is not easy – Little support to search components – Consistent management by human hand is difficult Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 3
Automated Component Library • Collect software components eagerly without preserving their inherent structures • Analyze relations among components by using various analysis techniques • Rank the components based on their Component Rank Model significance • Answer user’s queries according to the rank Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 4
Component Graph System Y System X A B F C D G E H I component use relation Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 5
Weight of Nodes System Y System X 0. 1 A B C 0. 1 D 0. 1 0. 2 E 0. 1 0. 05 H F 0. 1 G 0. 2 I 0. 05 sum of all node weights = 1. . . (1) weight of node represents significance of node Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 6
Weights of Edges 0. 05 0. 2 d=1/4 0. 2 A d=1/4 0. 05 d=1/4 0. 05 B 0. 4 0. 15 d: distribution ratio • Node weight is distributed to each outgoing edge • Edge weights are collected at the destination node sum of all outgoing edge weights = origin node weight. . . (2) sum of all incoming edge weights = destination node weight. . . (3) Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 7
Definition of Weights • Under constraints (1)~(3), we have a simultaneous equation = W: node weight vector . Dt: transposed matrix of distribution ratios Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 8
Propagating Weights 0. 34 0. 17 A 0. 33 B 0. 17 0. 33 C Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 9
Propagating Weights 0. 33 0. 175 A 0. 17 B 0. 175 0. 17 0. 5 C Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 10
Propagating Weights 0. 5 0. 25 A 0. 175 B 0. 25 0. 345 0. 175 0. 345 C Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 11
Propagating Weights 0. 4 0. 2 A B 0. 2 0. 4 C • Stable weight assignment next-step weights are the same as previous ones • Component Rank : order of nodes sorted by the weight Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 12
Markov Model 0. 02 0. 01 0. 05 0. 03 0. 001 0. 1 • Component rank model can be considered as a Markov Chain of user's focus • User's focus moves from one component to another along a use relation at a fixed time duration • Node weight represents the existence probability of the user's focus at infinite future Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 13
Adjustment to Software Products(1) Pseudo Use Relation A B C • Weight computation does not always converge • Add a pseudo edge from a node to another, if there is no 'real' edge • Distribution ratios: pseudo edges << real edges Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 14
Adjustment to Software Products(2) Clustering Components C G B F A D component graph C G BF E AD E clustered component graph Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 15
Prototype System SMMT measures similarity input by clone detection technique measure similarity by SMMT • inheritance • method call • attribute access • abstract class impl extract use relation . java file = component similarity criterion t=0. 8 (80% statements are the construct clustered cluster same) component graph similar components weight ratio p between real and pseudo edges : 0. 85 output de-cluster to compute original components node weights component ranks equal distribution ratios d to outgoing edges Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 16
Experiment 1 JDK 1. 3. 0 575, 000 lines, 1877 components 7 minutes on PC (Pentium IV, 2 GHz, 2 GB) rank • Very general and core classes : ranked high • Specific and independent classes: ranked low class name 1 java. lang. Object 2 java. lang. Class 3 java. lang. Throwable 4 java. lang. Exception 5 java. io. IOException 6 java. lang. String. Buffer 7 java. lang. Security. Manager 8 java. io. Input. Stream 9 java. lang. reflect. Field 10 java. lang. reflect. Constructor. . . 1256 sunw. util. Event. Listener. . . 1256 weight 0. 16126 0. 08712 0. 05510 0. 03103 0. 01343 0. 01214 0. 01169 0. 01027 0. 00948 0. 00936. . . 0. 00011. . . superclass of all classes superclass of any error or exception handler these 622 classes are not used by any other classes Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 17
Experiment 2: Collection of SE Tools and Libraries – CK metrics measurement tools, component rank system – ANTLR, JAMA, Caffe Cappuccino rank – 582 components class name 1 antlr. Token 2 antlr. debug. Event 2 antlr. debug. New. Line. Event 4 antlr. collections. impl. Vector 5 jp. gr. java_conf. keisuken. text. html. Html. Parameter 6 jp. gr. java_conf. keisuken. net. server. Server. Properties 7 Jama. Matrix 8 jp. gr. java_conf. keisuken. util. Integer. Array 8 jp. gr. java_conf. keisuken. util. Long. Array 10 jp. ac. osaka_u. es. ics. iip_lab. metrics. parser. Identifier. Info. . . 418 cktool_new. examples. Main weight 0. 10727 0. 06189 0. 05434 0. 05246 0. 03699 0. 01564 0. 01390 0. 01365. . . 0. 00050 Indicator of generality and specialty w. r. t. usage from other classes Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 18
Experiment 3: Application to Industry • Daiwa computer: a middle size software company in Osaka • Shared Java application framework for web-based data management • Framework+ 5 applications on framework – 1538 components, 339 clustered nodes • Classes in the framework and definitions of data structure are ranked high Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 19
Experiment 4: Document Processing Tools and Libraries • JEDIT, jext, Enhydra, saxon, phex, JDK, etc. (7171 components) • Perform string search by grep command with keyword get. Nodetype order sorted by rank class name 1(67) enhydra 3. 1. . . dom. Node 2(169) saxon 7_0. . . saxon. om. Node. Info 3(275) saxon 7_0. . . saxon. pattern. Node. Test 4(316) enhydra 3. 1. . . dom. Document. Impl 5(355) saxon 7_0. . . saxon. pattern. Pattern 6(382) saxon 7_0. . . saxon. Controller 7(437) enhydra 3. 1. . . xslt. XSLTEngine. Impl 8(446) enhydra 3. 1. . . dom. Element. Impl 9(500) saxon 7_0. . . saxon. style. Style. Element 10(506) saxon 7_0. . . saxon. tree. Node. Impl. . . 125(4441) enhydra 3. 1. . . Func. ID. . . 125(4441) weight method definitions 0. 029110 of obtaining node 0. 000969 kinds in DOM tree 0. 000437 0. 000368 0. 000324 0. 000296 0. 000241 0. 000235 0. 000202 We can easily find 0. 000198 the core definitions. . . of classes 0. 000029. . . Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 20
Discussion 1: Weight Computation Reference Count Model Component Rank Model 0. 2 B 0. 31 B 0. 6 A 0. 33 A E D C 0 0 0. 2 0. 03 0. 30 Fragile to locally-made references, which may not be important globally More stable to local references Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 21
Discussion 2: Clustering Policy (1) • Eliminate effect of simply duplicated components A A X B B Y original copy others Clustering 0. 25 A X B Y 0. 25 same weight arrangement as the case with no duplicated components Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 22
Discussion 2: Clustering Policy (2) • Count only reused components which are not simple duplicated A A X B C Y original modified others Clustering 0. 3 0. 2 A X B C Y 0. 15 0. 2 A's weight is higher than others Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 23
Discussion 3: Similarity Criterion and Pseudo Use Relation • Similarity criterion t: 0. 8 – Resulting ranks are fairly insensitive to t – Some inherently-different components are in the same cluster if t is less than 0. 8 • Pseudo use relation ratios p: 0. 85 – Resulting ranks are stable between 0. 75 - 0. 95 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 24
Related Works • Markov models of documentation traversal – Influence Weight: impact factor of journal publication thought incoming references – Page Rank: weight of HTML in the Internet through incoming web links Explicit use relations No clustering (important for software products) • Measurement reusability of components or interfaces – Use various characteristic metrics – Indirect indicator of reusability – Our approach directly reflects usage of components Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 25
S P A R S-J Software Product Archiving, Analyzing and Retrieving System for Java Analyzer and Evaluator Component Collector Internet / Corporate Repositories Query Handler Software Component Searcher Component Archive SPARS-J Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 26
27
28
29
Conclusion & Future Work • Component Rank: a novel model for software component • Prototype system for Java • Application to various collections of Java programs : promising results • Developing SPARS-J • Statistical evaluation (recall & precision) • Practical evaluation using SPARS-J • Other models (weight distribution, similarity, . . . ) Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 30
END Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 31
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 32
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 33
Global Analysis of Software Data Analysis Data on the Internet Collection Feedback Subsidiary Company Data Company-Wide Project Data Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 34
Weight Computation by Eigenvector • W is the eigenvector of eigenvalue 1 – math package for the eigenvector computation can be used, but generally slower then the propagation computation = W: node weight vector . Dt: transposed matrix of distribution ratios Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 35
- Stage 15 relative clauses and relative pronouns
- Marginal relative frequencies
- Relative clauses jeopardy
- Relative adverbs examples
- Fspos
- Typiska drag för en novell
- Tack för att ni lyssnade bild
- Ekologiskt fotavtryck
- Varför kallas perioden 1918-1939 för mellankrigstiden
- En lathund för arbete med kontinuitetshantering
- Personalliggare bygg undantag
- Vilotidsbok
- Anatomi organ reproduksi
- Förklara densitet för barn
- Datorkunskap för nybörjare
- Tack för att ni lyssnade bild
- Debatt artikel mall
- Magnetsjukhus
- Nyckelkompetenser för livslångt lärande
- Påbyggnader för flakfordon
- Formel för lufttryck
- Svenskt ramverk för digital samverkan
- Bo bergman jag fryser om dina händer
- Presentera för publik crossboss
- Argument för teckenspråk som minoritetsspråk
- Plats för toran ark
- Klassificeringsstruktur för kommunala verksamheter
- Luftstrupen för medicinare
- Bästa kameran för astrofoto
- Centrum för kunskap och säkerhet
- Byggprocessen steg för steg
- Bra mat för unga idrottare
- Verktyg för automatisering av utbetalningar
- Rutin för avvikelsehantering
- Smärtskolan kunskap för livet
- Ministerstyre för och nackdelar