Code Clone Analysis and Its Application Katsuro Inoue

  • Slides: 33
Download presentation
Code Clone Analysis and Its Application Katsuro Inoue Osaka University Software Engineering Lab, Osaka

Code Clone Analysis and Its Application Katsuro Inoue Osaka University Software Engineering Lab, Osaka University

Clone Detection Software Engineering Lab, Osaka University

Clone Detection Software Engineering Lab, Osaka University

3 What is Code Clone? n. A code fragment which has identical or similar

3 What is Code Clone? n. A code fragment which has identical or similar code fragments in source code n. Introduced in source code because of various reasons pcode reuse by `copy-and-paste’ pstereotyped function u ex. file open, DB connect, … pintentional u code clone copy-and-paste iteration performance enhancement n. It makes software maintenance more difficult p. If we modify a code clone with many similar code fragments, it is necessary to consider whether or not we have to modify each of them u We easily overlook Software Engineering Lab, Osaka University

4 Simple Example AFG: : AFG(Ja. Object* obj) { objname = “afg"; object =

4 Simple Example AFG: : AFG(Ja. Object* obj) { objname = “afg"; object = obj; } AFG: : ~AFG() { for(unsigned int i = 0; i < children. size(); i++) if(children[i] != NULL) delete children[i]; . . . for(unsigned int i = 0; i < nodes. size(); i++) if(nodes[i] != NULL) delete nodes[i]; } Software Engineering Lab, Osaka University

5 Definition of Code Clone n No single or generic definition of code clone

5 Definition of Code Clone n No single or generic definition of code clone p Each researcher has own definition, but common understanding u u u n Type 1 clone: syntactical equivalence Type 2 clone: parameterized syntactical equivalence Type 3 clone: others (semantic equivalence, deleted/added, …) Various detection methods 1. 2. 3. 4. 5. Line-based comparison (type 1) AST (Abstract Syntax Tree) based comparison (type 2, 3) PDG (Program Dependency Graph) based comparison (type 3) Metrics comparison (type 1, 2) Token-based comparison (type 2) Software Engineering Lab, Osaka University

6 Detection Method Token Based Comparison n. Compare token sequences of source code, and

6 Detection Method Token Based Comparison n. Compare token sequences of source code, and identify the similar subsequence as code clones* p. Before comparison, tokens of identifier (type name, variable name, method name, …) are replaced by the same special token (parameterization) n. The Scalability is very high p. M Loc / 5 -20 min. * T. Kamiya, S. Kusumoto, and K. Inoue, CCFinder: A multi-linguistic token-based code clone detection system for large scale source code, IEEE Transactions on Software Engineering, vol. 28, no. 7, pp. 654 -670, Jul. 2002. Software Engineering Lab, Osaka University

CCFinder and Associate Tools Software Engineering Lab, Osaka University

CCFinder and Associate Tools Software Engineering Lab, Osaka University

8 Clone Pair and Clone Set n. Clone Pair p. A pair of identical

8 Clone Pair and Clone Set n. Clone Pair p. A pair of identical or similar code fragments n. Clone Set p. A set of identical or similar fragments C 1 C 2 C 3 C 4 C 5 Clone Pair Clone Set (C 1, C 2) {C 1, C 2, C 4} (C 1, C 4) {C 3, C 5} (C 2, C 4) (C 3, C 5) Software Engineering Lab, Osaka University

9 Our Code Clone Research n. Develop tools p. Detection tool: CCFinder p. Visualization

9 Our Code Clone Research n. Develop tools p. Detection tool: CCFinder p. Visualization tool: Gemini p. Refactoring support tool: Aries p. Change support tool: Libra p. CCFinder. X n. Deliver our tools to domestic or overseas organizations/individuals p. More than 5000 organizations use our tools! n. Promote academic-industrial collaboration p. Organize code clone p. Manage mailing-lists seminars Software Engineering Lab, Osaka University

10 Detection tool: Development of CCFinder n. Developed by industry requirement p. Maintenance of

10 Detection tool: Development of CCFinder n. Developed by industry requirement p. Maintenance of a huge system u More than 10 M LOC, more than 20 years old u Maintenance of code clones by hand had been performed, but. . . n. Token-base clone detection tool CCFinder p. Normalization of name space p. Parameterization of user-defined p. Removal of table initialization p. Identification of module delimiter p. Suffix-tree algorithm names n. CCFinder can analyze the system of millions line scale in 5 -30 min. Software Engineering Lab, Osaka University

11 Detection tool: CCFinder Detection Process Source files 1. 1. staticvoidfoo()throws. RESyntax. Exception{{ 2.

11 Detection tool: CCFinder Detection Process Source files 1. 1. staticvoidfoo()throws. RESyntax. Exception{{ 2. 2. Stringa[] a[]==new new. String[][]{{"123, 400", "abc", "orange 100"}; }; 3. 3. org. apache. regexp. REpat pat==new neworg. apache. regexp. RE("[0 -9, ]+"); 4. 4. intsum sum==0; 0; 5. 5. for(intii==0; 0; ii<<a. length; ++i) 6. 6. ifif(pat. match(a[i])) 7. 7. sum+= +=Sample. parse. Number(pat. get. Paren(0)); 8. 8. System. out. println("sum==""++sum); 9. 9. }} 10. staticvoidgoo(String[][]a)a)throws. RESyntax. Exception{{ 11. RE REexp exp==new new. RE("[0 -9, ]+"); 12. intsum sum==0; 0; 13. for(intii==0; 0; ii<<a. length; ++i) 14. ifif(exp. match(a[i])) 15. sum+= +=parse. Number(exp. get. Paren(0)); 16. System. out. println("sum==""++sum); 17. }} Software Engineering Lab, Osaka University Lexicalanalysis Lexical analysis Tokensequence Token sequence Transformation Transformedtokensequence Transformed token sequence Matchdetection Match detection Cloneson ontransformedsequence Clones on transformed sequence Formatting Clone pairs

Suffix-tree n. Suffix tree is a tree that satisfies the following conditions. 1. A

Suffix-tree n. Suffix tree is a tree that satisfies the following conditions. 1. A leaf node represents the starting position of sub-string. 2. A path from root node to a leaf node represents a sub-string. 3. First characters of labels of all the edges from one node are different from each other. → A common path means a clone Software Engineering Lab, Osaka University

13 Visualization Tool: Gemini n. Visualize code clones detected by CCFinder p. CCFinder outputs

13 Visualization Tool: Gemini n. Visualize code clones detected by CCFinder p. CCFinder outputs the detection result as a text sequence n. Provide interactive analyses of code clones p. Scatter Plot p. Clone metrics p. File metrics n. Filter out unimportant code clones Software Engineering Lab, Osaka University

14 Software Engineering Lab, Osaka University

14 Software Engineering Lab, Osaka University

Applications Software Engineering Lab, Osaka University

Applications Software Engineering Lab, Osaka University

16 Case Studies n. Open source software p. Free. BSD, Net. BSD, Linux(C, 7

16 Case Studies n. Open source software p. Free. BSD, Net. BSD, Linux(C, 7 MLOC) p. JDK Libraries(Java 1. 8 MLOC) p. Qt(C++, 240 KLOC) n. Commercial software(more than 100 companies) p. IPA/SEC, NTT Data Corp. , Hitachi Ltd. , Hitachi GP, Hitachi SAS, NEC soft Ltd. , ASTEC Inc. , SRA Inc. , JAXA, Daiwa Computer, etc… n. Students excise of Osaka University n. Court evidence for software copyright suit … Software Engineering Lab, Osaka University

17 Case study 1: Similarity between Free. BSD, Net. BSD, Linux n. Result p.

17 Case study 1: Similarity between Free. BSD, Net. BSD, Linux n. Result p. There are many code clones between Free. BSD and Net. BSD p. There a little code clones between Linux and Free. BSD/Net. BSD n. Their histories can explain the result p. The ancestors of Free. BSD and Net. BSD are the same p. Linux was made from scratch Software Engineering Lab, Osaka University

18 History of BSD Unix OS Software Engineering Lab, Osaka University

18 History of BSD Unix OS Software Engineering Lab, Osaka University

19 Cluster Analysis Using Clone Ratio as Similarity Measure Software Engineering Lab, Osaka University

19 Cluster Analysis Using Clone Ratio as Similarity Measure Software Engineering Lab, Osaka University

20 Case study 2: Students Excise n Target p Programs developed on a programming

20 Case study 2: Students Excise n Target p Programs developed on a programming exercise in Osaka Univ. Simple compiler for Pascal written in C language u This exercise consists of 3 steps l STEP 1: develop a syntax checker l STEP 2: develop a semantics checker by extending his/her syntax checker l STEP 3: develop a total compiler by extending his/her semantic checker u n Purpose p Check the stepwise development p Check plagiarisms Software Engineering Lab, Osaka University

21 Result n There were a lot of code clones between S 2 and

21 Result n There were a lot of code clones between S 2 and S 5 n We did not use the detection result for evaluating their excises S 1 S 2 S 3 S 4 S 5 Software Engineering Lab, Osaka University

22 Case Study 3: IPA/SEC Advanced Project n. Target p. A car-traffic information system

22 Case Study 3: IPA/SEC Advanced Project n. Target p. A car-traffic information system using heterogeneous sensors, developed by 5 Japanese companies p. The project manager had little knowledge of the source code since each company independently developed the components n. Purpose p. Grasp features of black-boxed source code n. Approach p. Analyzed twice, after the unit test (280, 000 LOC), and after the combined test (300, 000 LOC) p. The minimum size of detected code clone is 30 tokens Software Engineering Lab, Osaka University

23 Case Study 3: Scatter Plot Analysis n. Scatter Plot of company X n.

23 Case Study 3: Scatter Plot Analysis n. Scatter Plot of company X n. In part A, there are many non- interesting code clones poutput code for debug (consecutive printf-statements) pcheck data validity pconsecutive if-statements n In part B, there are many code clones across directories p. This part treats vehicle position information p. Each directory include a single kind of vehicles, e. g. , taxi, bus, or track p. Logical structures are mostly the same Software Engineering Lab, Osaka University

Handling Huge Targets Software Engineering Lab, Osaka University

Handling Huge Targets Software Engineering Lab, Osaka University

25 1. Distributed Code-Clone Analysis Embarrassingly parallel problem D-CCFinder (Distributed CCFinder) Virtual PC cluster

25 1. Distributed Code-Clone Analysis Embarrassingly parallel problem D-CCFinder (Distributed CCFinder) Virtual PC cluster with 80 lab. machines Each tile is a task with a single CCFinder Software Engineering Lab, Osaka University

26 Result of Free. BSD Ports Collection 10. 8 GB/403 M LOC in C

26 Result of Free. BSD Ports Collection 10. 8 GB/403 M LOC in C Livieri, S. , Higo, Y. , Matsushita, M. , Inoue, K. , “Very-Large Scale Code Clone Analysis and Visualization of Open Source Programs Using Distributed CCFinder: D-CCFinder“, International Conference on Software Engineering, Minneapolis, MN. (May 2007, to appear) Software Engineering Lab, Osaka University

27 Result of 136 Linux Kernels 7. 4 GB 260 M LOC in C

27 Result of 136 Linux Kernels 7. 4 GB 260 M LOC in C Software Engineering Lab, Osaka University

28 2. File Clone Finder FCFiner n. Efficiently find only file copies p. Except

28 2. File Clone Finder FCFiner n. Efficiently find only file copies p. Except for comments and spacing n. Tokenization and hashing technique Software Engineering Lab, Osaka University

29 Analysis for Free. BSD Ports Collection n 14. 6 hours for 7 GBytes

29 Analysis for Free. BSD Ports Collection n 14. 6 hours for 7 GBytes target Software Engineering Lab, Osaka University

30 Analysis for Free. BSD Ports Collection (2) Software Engineering Lab, Osaka University

30 Analysis for Free. BSD Ports Collection (2) Software Engineering Lab, Osaka University

Summary Software Engineering Lab, Osaka University

Summary Software Engineering Lab, Osaka University

32 Conclusion n. We have developed Code clone analysis tools p. CCFinder family u

32 Conclusion n. We have developed Code clone analysis tools p. CCFinder family u CCFinder, CCFinder. X, Gemini, … p. Scalable tools u D-CCFinder, Yocca, FCFinder n. We have promoted academic-industrial collaboration p. Applied to many industry practices Software Engineering Lab, Osaka University

5 th International Workshop on Software Clones IWSC 2011 n. In conjunction with 33

5 th International Workshop on Software Clones IWSC 2011 n. In conjunction with 33 nd International Conference on Software Engineering ICSE 2011 n. May 2011 @ Honolulu, Hawaii Software Engineering Lab, Osaka University 33