Cloned Buggy Code Detection in Practice Using Normalized
Cloned Buggy Code Detection in Practice Using Normalized Compression Distance Takashi Ishio (NAIST, Japan) Naoto Maeda, Kensuke Shibuya (NEC, Japan) Katsuro Inoue (Osaka University, Japan) ICSME 2018 Industry Track - Short Paper
Motivation: Cloned Bug Similar source code fragments may have the same mistake. [Chou, 2001], [Pham, 2010], [Yue, 2017] An actual bug fix in a system (C#, Identifiers are anonymized) for (var i=0; i < row. Cells. Count; i++) { if (row. Cells[i]. Value == null) { break; continue; } ret. Add((string)row. Cells[i]. Value); } Clones in the system for (var i=0; i < row. Cells. Count; i++) { if (row. Cells[i]. Value == null) { break; } ret. Add((string)row. Cells[i]. Value); } for (var i=0; i < row. Cells. Count; i++) { if (row. Cells[i]. Value == null) { continue; } ret. Add((string)row. Cells[i]. Value); } 2 Cloned Buggy Code Detection in Practice Using Normalized Compression Distance, ICSME 2018 Industry Track
Existing Tools • Cloned Buggy Code Detector [Li, 2012] Detect clones of a query code fragment + Designed for the cloned bug problem – Built on program dependence analysis (i. e. Hard to support multiple languages: C/C++, C#, Java, and Java. Script) • CCFinder. X Detect all clones in a system + Support popular languages + Already used by project managers in the company – Not so easy to use: Training cost for 10, 000+ developers 3 Cloned Buggy Code Detection in Practice Using Normalized Compression Distance, ICSME 2018 Industry Track
Our Tool = grep-like clone search • Source Files Query for (var i=0; i < row. Cells. Count; i++) { if (row. Cells[i]. Value == null) { break; } ret. Add((string)row. Cells[i]. Value); } d=0. 5 d=0. 4 d=0. 3 d=0. 1 d=0. 7 void save() { List<string> ret = new List<string>(); for (var i=0; i < row. Cells. Count; i++) { if (row. Cells[i]. Value == null) { break; } ret. Add((string)row. Cells[i]. Value); } Text. Writer tw = new Stream. Writer("Saved. Lists. txt"); } tw. Write. Line(ret); tw. Close(); 4 Cloned Buggy Code Detection in Practice Using Normalized Compression Distance, ICSME 2018 Industry Track
Experiment 1. Benchmark-based Evaluation with OSS – Actual fix of cloned bugs in OSS [Li, 2012] Projects #Queries #Bugs Median #Files Postgre. SQL Median #LOC 14 34 1, 058 277, 959 5 8 261 67, 028 Linux 34 39 22, 181 6, 931, 715 Total 53 81 792, 432 241, 074, 652 Git – Baselines • Textual similarity: Normalized Levenshtein Distance on tokens • Code clone detection tools: CCFinder. X and Ni. Cad 2. Example-based Evaluation in the company – Two actual bug fix 5 Cloned Buggy Code Detection in Practice Using Normalized Compression Distance, ICSME 2018 Industry Track
Accuracy Configuration Precision Recall MAP 0. 010 1. 000 0. 741 712, 087 <0. 001 0. 988 0. 742 CCFinder. X (50 tokens) 70 0. 629 0. 728 N/A Ni. Cad (Block, 3 lines) 19 0. 632 0. 3 N/A NCD (Our tool, th=0. 5) Normalized Levenshtein Distance (th = 0. 5) #Report 8, 107 • Our tool identified all the clones of bugs – A corner case exists for Normalized Levenshtein Distance – The clone detection tools are highly precise, but their recall is limited 6 Cloned Buggy Code Detection in Practice Using Normalized Compression Distance, ICSME 2018 Industry Track
Runtime Performance Configuration Median Time per Query (sec. ) Postgres Linux 7 1, 130 12 h 26 min 7 2 165 2 h 45 min CCFinder. X (50 tokens) 69 29 2, 234 21 h 30 min Ni. Cad (Block, 3 lines) 117 59 2, 498 24 h 53 min NCD (Our tool, th=0. 5) Normalized Levenshtein Distance (th = 0. 5) 48 Git Total Time • Single-threaded, on Intel Xeon 2. 60 GHz and SSD • Practical performance – “ 20 minutes for 6 million LOC” is acceptable for managers 7 Cloned Buggy Code Detection in Practice Using Normalized Compression Distance, ICSME 2018 Industry Track
Example-based Evaluation in the Company Incorrect loop implementation (C#) Memory leak (Visual C++) for (var i=0; i < row. Cells. Count; i++) { if (row. Cells[i]. Value == null) { break; } ret. Add((string)row. Cells[i]. Value); } if ((*list) -> count > 0) { i. Count = (*list)->count – 1; for (; i. Count >= 0; i. Count--) { if ((*list)->Filename)[i. Count] != NULL) { free((*list)->filename[i. Count]); } } free(*list); } • Reported 13 clones for two bugs in total • The buggy clones are the most similar to the queries (More details are included in our paper) 8 Cloned Buggy Code Detection in Practice Using Normalized Compression Distance, ICSME 2018 Industry Track
Lessons Learned: Important things for end users 1. Usability – Simple CLI & GUI – Potential users can try it by themselves 2. Actual examples in the company, not in OSS – Most of potential end users asked this first 3. Source code availability of the tool – It must be verifiable for security 9 Cloned Buggy Code Detection in Practice Using Normalized Compression Distance, ICSME 2018 Industry Track
Summary and Future Work • NCDSearch: A new code clone detection tool – – Find clones of a query code fragment Very simple, but practical performance The tool is accepted by early adopters in the company https: //github. com/takashi-ishio/NCDSearch • The tool except for GUI is available • Future work – Evaluate the long-term effect of the tool – Establish best practices to use the tool • e. g. how to make a good query, how to choose a threshold 10 Cloned Buggy Code Detection in Practice Using Normalized Compression Distance, ICSME 2018 Industry Track
- Slides: 10