Finding Code Clones for Refactoring with Clone Metrics



![Background: Refactoring [Fowler 1999] (1/2) n Refactoring is a process of restructuring an existing Background: Refactoring [Fowler 1999] (1/2) n Refactoring is a process of restructuring an existing](https://slidetodoc.com/presentation_image_h2/fadb89276eea7e2ec7e982a0eaf9ea0b/image-4.jpg)
![Background: Refactoring [Fowler 1999] (2/2) n Refactoring u. Merge Code Clones code clones into Background: Refactoring [Fowler 1999] (2/2) n Refactoring u. Merge Code Clones code clones into](https://slidetodoc.com/presentation_image_h2/fadb89276eea7e2ec7e982a0eaf9ea0b/image-5.jpg)


![Background: Clone Metrics [Higo 2007] n Quantitative information on clone u E. g. , Background: Clone Metrics [Higo 2007] n Quantitative information on clone u E. g. ,](https://slidetodoc.com/presentation_image_h2/fadb89276eea7e2ec7e982a0eaf9ea0b/image-8.jpg)









































- Slides: 49

Finding Code Clones for Refactoring with Clone Metrics : A Case Study of Open Source Software Eunjong Choi†, Norihiro Yoshida‡, Takashi Ishio†, Katsuro Inoue†, and Tateki Sano* †Osaka University, Japan ‡Nara Institute of Science and Technology , Japan *NEC Corporation, Japan Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1

Contents 1. 2. 3. 4. 5. Background Clone Metrics Industrial Case Study of Open Source Software Summary and Future Work Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2

Background: Clone n Identical or similar code fragments in source code Similar n The presence of code clones u indication of low maintainability of software Øif a bug is found in a code clone, the other code clone have to be checked for defect detection. Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 3
![Background Refactoring Fowler 1999 12 n Refactoring is a process of restructuring an existing Background: Refactoring [Fowler 1999] (1/2) n Refactoring is a process of restructuring an existing](https://slidetodoc.com/presentation_image_h2/fadb89276eea7e2ec7e982a0eaf9ea0b/image-4.jpg)
Background: Refactoring [Fowler 1999] (1/2) n Refactoring is a process of restructuring an existing code. u. Alter software’s internal structure without changing its external behavior u. Improve the maintainability of software [Fowler 1999] M. Fowler, et al. , Refactoring: Improving The Design of Existing Code, Addition Wesley, 1999. Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 4
![Background Refactoring Fowler 1999 22 n Refactoring u Merge Code Clones code clones into Background: Refactoring [Fowler 1999] (2/2) n Refactoring u. Merge Code Clones code clones into](https://slidetodoc.com/presentation_image_h2/fadb89276eea7e2ec7e982a0eaf9ea0b/image-5.jpg)
Background: Refactoring [Fowler 1999] (2/2) n Refactoring u. Merge Code Clones code clones into a single program unit Refactoring call statement [Fowler 1999] M. Fowler, et al. , Refactoring: Improving The Design of Existing Code, Addition Wesley, 1999. Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 5

Background: Language-dependent Code Clone n It is unavoidable to exist in source code u because of specifications of the used program language. replacement. set. Task. Type(task. Type); replacement. set. Task. Name(task. Name); replacement. set. Location(location); replacement. set. Owning. Target(target); replacement. set. Runtime (wrapper); wrapper. set. Proxy(replacement); Example of the language-dependent code clone (Consecutive setter invocations) Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 6

Background: Clone Set n. A set of code clones Code Clone 1 Code Clone 3 Clone Set Code Clone 2 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 7
![Background Clone Metrics Higo 2007 n Quantitative information on clone u E g Background: Clone Metrics [Higo 2007] n Quantitative information on clone u E. g. ,](https://slidetodoc.com/presentation_image_h2/fadb89276eea7e2ec7e982a0eaf9ea0b/image-8.jpg)
Background: Clone Metrics [Higo 2007] n Quantitative information on clone u E. g. , LEN(S), RNR(S), POP(S) sets n Purposes u To check features of code clones in software u To extract code clones for several purposes ØE. g. , The highest length of code clones… [Higo 2007] Y. Higo, T. Kamiya, S. Kusumoto, K. Inoue, "Method and Implementation for Investigating Code Clones in a Software System", Information and Software Technology, pp. 985 -998 (2007 -9) Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 8

Clone Metrics: LEN(S) n The average length of token sequences of code clones in a clone set S abb abb A token sequence [a b b ] is detected as a code clone LEN(S) = 3 Clone set S Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 9

Clone Metrics: RNR(S) n The ratio of non-repeated token sequences of code clones in a clone set S n Eliminate language dependent code clones u. High RNR value RNR(S) = abb The length of non-repeated token sequence abb Clone set S 1 3 • 100 = 33. 3 The length of whole token sequence Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 10

Clone Metrics: POP(S) n The number of code clones in a clone set S 1 2 3 POP(S) = 3 Clone set S Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 11

Single Clone Metric (1/3) n Clone sets whose LEN(S) is higher u They Include many consecutive if (of if-else) blocks Ø involve similar but different conditional expressions. if ((p = get. Project(). get. Property("ant. netrexxc. binary")) != null) { this. binary = Project. to. Boolean(p); } // classpath makes no sense if ((p = get. Project(). get. Property("ant. netrexxc. comments")) != null) { this. comments = Project. to. Boolean(p); } …………The last part is omitted………… Code Clone in a clone set whose POP(S) is the highest in Ant 1. 7. 0 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 12

Single Clone Metric (2/3) n Clone sets whose RNR(S) is higher u They do not organize a single semantic unit Ø semantic unit : many instructions forming a single functionality else { // is the zip file in the cache Zip. File zip. File = (Zip. File) zip. Files. get(file); if (zip. File == null) { zip. File = new Zip. File(file); zip. Files. put(file, zip. File); a part of semantic unit } Zip. Entry entry = zip. File. get. Entry(resource. Name); if (entry != null) { Code Clone in a clone set whose RNR(S) is the second highest in Ant 1. 7. 0 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 13

Single Clone Metric (3/3) n Clone sets whose POP(S) is higher u They Include many language-dependent code clones out. println("">"); out. println(""); out. print("<!ELEMENT project (target | "); out. print(TASKS); out. print(" | "); out. print(TYPES); Code Clone in a clone set whose POP(S) is higher than others Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 14

Key Idea n It is not appropriate to extract code clones for refactoring using just a single clone metric u. According to our experiences n We propose a method based on combined clone metrics u. To improve the weakness of single-metric-based extraction Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 15

Combined Clone Metrics n Clone sets whose RNR(S), POPS(S) are higher u Each code clone organizes a single semantic units if (if. Property != null && p. get. Property(if. Property) == null) { return false; } else if (unless. Property != null && p. get. Property(unless. Property) != null) { Appropriate for Refactoring! return false; } return true; } Code Clone in a clone set whose RNR(S), POP(S) are higher than others Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 16

Industrial Case Study (1/2) n Goal: validating our key idea u. Using combined clone metrics is a feasible method to extract code clone for refactoring n Target System u. Industrial Java software developed by NEC u 110 KLOC, 736 clone sets Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 17

Industrial Case Study (2/2) n Experimental Step Selected 62 clone sets from CCFinder's output using clone metrics. 2. Conducted a survey about these clone sets and got feedback from a developer. 1. Survey Source files CCFinder Clone sets using clone metrics Feed back Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 18

Subject Code Clones (1/2) n Clone sets whose either clone metric value is high u. SLEN : Clone sets whose LEN(S) value is top 10 high u. SRNR : Clone sets whose RNR(S) value is top 10 high u. SPOP : Clone sets whose POP(S) value is top 10 high Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 19

Subject Code Clones (2/2) n Clone sets whose combined clone metrics values are high u. SLEN • RNR: 15 clone sets whose LEN(S) and RNR(S) values are high rank in the top 15 u. SLEN • POP: 7 clone sets whose LEN(S) and POP(S) values are high rank in the top 15 u. SRNR • POP: 18 clone sets whose RNR(S) and POP(S) values are high rank in the top 15 u. SLEN • RNR • POP : 1 clone set whose LEN(S), RNR(S) and POP(S) values are high rank in the top 15 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 20

In Survey : About Clone set XXX Q. Which practice is appropriate for this clone set? [] Perform refactoring [] Write comments about code clones, but don’t perform refactoring. [] Change nothing. [] Others. ( ) Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 21

In Survey : About Clone set XXX Q. Which practice is appropriate for this clone set? √[] Perform refactoring = Appropriate for refactoring [] Write comments about code clones, but don’t perform refactoring. [] Change nothing. [] Others. ( ) Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 22

In Survey : About Clone set XXX Q. Which practice is appropriate for this clone set? [] Perform refactoring √[] Write comments about code clones, but don’t perform refactoring. √[] Change nothing. √[] Others. ( ) =Inappropriate for refactoring Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 23

Results of Case Study (1/2) Filtering #Selected Clone Sets #Refactoring Precision Each Single Clone metric 30 14 0. 47 Combined Clone metrics 41 34 0. 87 #Selected Clone Sets: The number of selected clones n #Refactoring: The number of clone sets marked as “Perform refactoring“ in survey n Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 24

Results of Case Study (2/2) Filtering n #Selected Clone Sets #Refactoring Precision Each Single Clone metric 30 14 0. 47 Combined Clone metrics 41 34 0. 87 Precision : “How many refactoring candidates were accepted by a developer? “ #Refactoring Precision = #Selected Clone Sets Combined clone metrics is more accepted as refactoring candidates by a developer Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 25

Case Study of Open Source Software n Goal: validating our key idea u. Using combined clone metrics is a feasible method to extract code clone for refactoring u. Using open source software n Experimental Step Selected clone sets from CCFinder's output using clone metrics. 2. Checked Clone sets whether they are appropriate for performing refactoring. 1. Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 26

Target systems n implementation in java u Apache Ant: ü 198 KLOC, 998 clone sets ØJboss: ü 633 KLOC, 4284 clone sets Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 27

Subject clone sets n Subject clone sets u. Apached Ant: 87 clone sets u. Jboss: 299 clone sets ØClone sets whose either clone metric value is top 10 high ØClone sets whose combined clone metrics values are high rank in the 15 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 28

Subject Code Clones (Apache Ant) Filtering #Selected Clone Sets #Refactoring Precision Each Single Clone metric 30 6 0. 20 Combined Clone metrics 60 31 0. 53 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 29

Subject Code Clones (Jboss) Filtering #Selected Clone Sets #Refactoring Precision Each Single Clone metric 30 9 0. 30 Combined Clone metrics 298 76 0. 25 Q. Why results are different between the software? Because of the open source software dose not allow coding rule? Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 30

Analysis of Results: defects of RNR metric (1/2) n RNR metric sometimes extract unintentional code clones u E. g. , Language-dependent code clones Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 31

Analysis of Results: defects of RNR metric (2/2) l. Index = l. Return. index. Of( "*" ); while( l. Index >= 0 ) { l. Return = ( l. Index > 0 ? l. Return. substring( 0, l. Index ) : "" ) + "%2 a" + ( ( l. Index + 1 ) < l. Return. length() ? l. Return. substring( l. Index + 1 ) : "" ); l. Index = l. Return. index. Of( "*" ); } l. Index = l. Return. index. Of( ": " ); while( l. Index >= 0 ) { l. Return = ( l. Index > 0 ? l. Return. substring( 0, l. Index ) : "" ) + "%3 a" + ( ( l. Index + 1 ) < l. Return. length() ? l. Return. substring( l. Index + 1 ) : "" ); l. Index = l. Return. index. Of( ": " ); } Code Clone in a clone sets whose LEN(S) and RNR(S) (=96) values are high rank in the top 15 in JBOSS Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 32

Analysis of Results: defects of RNR metric (2/2) l. Index = l. Return. index. Of( "*" ); while( l. Index >= 0 ) { l. Return = ( l. Index > 0 ? l. Return. substring( 0, l. Index ) : "" ) + "%2 a" + ( ( l. Index + 1 ) < l. Return. length() ? l. Return. substring( l. Index + 1 ) : "" ); l. Index = l. Return. index. Of( "*" ); } l. Index = l. Return. index. Of( ": " ); while( l. Index >= 0 ) { l. Return = ( l. Index > 0 ? l. Return. substring( 0, l. Index ) : "" ) + "%3 a" + ( ( l. Index + 1 ) < l. Return. length() ? l. Return. substring( l. Index + 1 ) : "" ); l. Index = l. Return. index. Of( ": " ); } The value of RNR is really 96? Code Clone in a clone sets whose LEN(S) and RNR(S) (=96) values are high rank in the top 15 in JBOSS Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 33

Analysis of Results: defects of RNR metric (2/2) l. Index = l. Return. index. Of( "*" ); while( l. Index >= 0 ) { l. Return = ( l. Index > 0 ? l. Return. substring( 0, l. Index ) : "" ) + "%2 a" + ( ( l. Index + 1 ) < l. Return. length() ? l. Return. substring( l. Index + 1 ) : "" ); l. Index = l. Return. index. Of( "*" ); } l. Index = l. Return. index. Of( ": " ); while( l. Index >= 0 ) { l. Return = ( l. Index > 0 ? l. Return. substring( 0, l. Index ) : "" ) + "%3 a" + ( ( l. Index + 1 ) < l. Return. length() ? l. Return. substring( l. Index + 1 ) : "" ); l. Index = l. Return. index. Of( ": " ); } Code Clone in a clone sets whose LEN(S) and RNR(S) (=96) values are high rank in the top 15 in JBOSS Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 34

Analysis of Results: defects of RNR metric (2/2) n Code Clone in a clone sets whose LEN(S) and RNR(S) (=96) values are high rank in the top 15 in JBOSS n RNR value of this clone sets Code Clone in a clone sets whose LEN(S) and RNR(S) (=50) Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 35

Summary and Future Work n Summary u. We conducted a case study to validate our key idea and discuss its result n Future Work u. Update used metrics u. Investigate about recall u. Use more metrics. u. Conduct case studies of open source software Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 36

Thank You for Your Attention! 감사합니다. ありがとうございます Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 37

Example of clone set that are not selected… boolean is. Equal(final Dewey. Decimal other) { final int max = Math. max(other. components. length, components. length); for (int i = 0; i < max; i++) { final int component 1 = (i < components. length) ? components[ i ] : 0; final int component 2 = (i < other. components. length) ? other. components[ i ] : 0; if ( It is too short to organize a semantic unit. n RNR metric sometimes extract unintentional code clones n u E. g. , Language-dependent code clones Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 38

Clone sets whose RNR(S) is higher than others n Each code clone in a clone set S consists of more non-repeated token sequences /* Code Clone in a clone set whose RNR(S) is the second highest in Ant 1. 7. 0 */ else { // is the zip file in the cache Zip. File zip. File = (Zip. File) zip. Files. get(file); if (zip. File == null) { zip. File = new Zip. File(file); zip. Files. put(file, zip. File); } Zip. Entry entry = zip. File. get. Entry(resource. Name); if (entry != null) { /* … */ Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 39

Clone sets whose RNR(S) is lower than others n Consists of more repeated token sequences u Involve in language-dependent code clone /* Code Clone in a clone set whose RNR(S) is the lowest in Ant 1. 7. 0 */ String sos. Cmd. Dir = null; …… skip code…. private String filename = null; private boolean no. Compress = false; private boolean no. Cache = false; private boolean recursive = false; private boolean verbose = false; /* … */ Consecutive variable declarations Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 40

Clone metric: RNR(S) (1/2) n File: Clone Set: a b c a b, S 1: {ab, ab } u F 2: c c* c* a b, u F 3: d a b, e f u F 4: c c* d e f ØSuperscript * indicated that the token is in a repeated token sequence u F 1: u. RNR(S 1) of Clone Set S 1 is RNR(S 1) = 2 + 2 + 2 • 100 = 100 2+2+2+2 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 41

Clone metric: RNR(S) (2/2) n File: Clone Set: a b c a b, S 2: {c c* , c* c*, c c* } u F 2: c c* c* a b, u F 3: d a b, e f u F 4: c c* d e f ØSuperscript * indicated that the token is in a repeated token sequence u F 1: u. RNR(S 2) of Clone Set S 2 is RNR(S 2) = 1 + 0 + 1 • 100 = 33. 3 2+2+2 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 42

The Number of Duplicate Clone Set(Industrial) n| SRNR ∩ SPOP ∩ SRNR ∙ POP| = 1 n | SRNR ∩ SRNR ∙ POP| = 2 n | S POP ∩ SRNR ∙ POP| = 2 n | SLEN ∙ RNR ∩ SLEN ∙ POP ∩ SRNR ∙ POP ∩ SLEN ∙ RNR ∙ POP| = 1 CSセミナー 2010/12/01 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 43

The Number of Duplicate Clone Set(Apache ant) n| SRNR ∩ SRNR ∙ POP| = 1 n | SPOP ∩ SLEN ∙ POP| = 1 CSセミナー 2010/12/01 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 44

The Number of Duplicate Clone Set(JBOSS) n| SRNR ∩ SLEN ∙ RNR| = 3 n | SRNR ∩ SRNR ∙ POP| = 1 n | SLEN ∙ RNR ∩ SLEN ∙ POP ∩ SRNR ∙ POP ∩ SLEN ∙ RNR ∙ POP| = 2 CSセミナー 2010/12/01 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 45

Clone set metrics n LEN (C ): Length of token sequence of each element in clone set C n POP (C ): Number of elements in clone set C n DFL (C ): Estimation of how many tokens would be removed from source files when all code fragments of clone set C are replaced with caller statements of a new identical routine new sub routine caller statements n RAD (C ): Distribution in the file system of elements in clone set C Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 46

Results, and Precision of each clone set in the survey Filtering #Selected Clone Sets #Refactoring Precision Clone sets whose LEN(S) value is top 10 high 10 7 0. 70 Clone sets whose RNR(S) value is top 10 high 10 4 0. 40 Clone sets whose POP(S) value is top 10 high 10 3 0. 30 Clone sets whose LEN(S) and RNR(S) values are high rank in the top 15 15 13 0. 87 Clone sets whose LEN(S) and POP(S) values are high rank in the top 7 6 0. 86 18 14 0. 78 1 1 1. 00 RNR(S) and POP(S) values are high rank in the top 15 Clone sets whose 1 clone set whose LEN(S), RNR(S), and POP(S) values are high rank in the top 15 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 47

Subject Code Clones (Apache Ant) Clone Sets SLEN SRNR SPOP SLEN • RNR SLEN • POP SRNR • POP SLEN • RNR • POP #Selected Clone Sets #Refactoring Precision 10 10 10 8 18 34 - 0 6 9 16 - Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 0. 00 0. 60 0. 00 0. 75 0. 50 0. 47 - 48

Subject Code Clones (Jboss) Clone Sets SLEN SRNR SPOP SLEN • RNR SLEN • POP SRNR • POP SLEN • RNR • POP #Selected Clone Sets 10 10 10 63 104 129 2 #Refactoring Precision 2 7 0 37 5 32 2 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 0. 20 0. 60 0. 00 0. 59 0. 05 0. 25 1. 00 49