Automated Source Code Changes Classification for Effective Code

  • Slides: 26
Download presentation
Automated Source Code Changes Classification for Effective Code Review and Analysis Evgeny G. Knyazev

Automated Source Code Changes Classification for Effective Code Review and Analysis Evgeny G. Knyazev Senior developer «Transas Technologies» Post-graduate student SPb State University of Information Technologies, Fine Mechanics and Optics

Source Code Review Informal source code lookthrough trying to find different kind problems in

Source Code Review Informal source code lookthrough trying to find different kind problems in it 2

Source Code Review Helps to… increase code quality find errors on early stages know

Source Code Review Helps to… increase code quality find errors on early stages know all code of a system keep an eye on novices work 3

Source Control System and Code Changes Review Source Control System Change Request (Revision X)

Source Control System and Code Changes Review Source Control System Change Request (Revision X) Review Source control system keeps development history It allows to review only changed code Developer 4

Code Change Review Example 5

Code Change Review Example 5

Changes Review Task Complexity In large project a lot of changes need to be

Changes Review Task Complexity In large project a lot of changes need to be reviewed Project Size, LOC Observation Period ~1 month Changes Count Tortoise SVN ~ 200 thousand Navi. Manager ~ 250 thousand 22. 09. 2007 -22. 10. 2007 215 72 KDE ~ 4. 7 million 17. 09. 200714. 10. 2007 11841 (!) 6

The Solution Split changes into classes 2. Choose class for review 1. New Functionality

The Solution Split changes into classes 2. Choose class for review 1. New Functionality Implementation Code Deletion Cosmetics Refactoring Bugfix 7

The solution (2) 3. Automate changes classification Source Control System Automated Changes Classifier Change

The solution (2) 3. Automate changes classification Source Control System Automated Changes Classifier Change Class Review Yes Is This Class Interesting ? Developer 8

Known Code Changes Classification Methods Changes Comments Classification ◦ “bug”, “fixed” – a bug

Known Code Changes Classification Methods Changes Comments Classification ◦ “bug”, “fixed” – a bug fix ◦ “implement”, “feature” – new feature implementation Refactoring Search Using Changes Metrics ◦ Extract parent class ( DIT>0 и NOM<0, …) ◦ Move to other class ( DIT=0 и NOM<0, …) ◦ Split method ( NOM < T, . . . ) Difference Search in Semantic Graphs ◦ Build code graph before and after the change ◦ Generate transition script ◦ Search refactoring templates 9

Changes Metrics Clustering Method: Learning Phase 1. Learning Set Preparation 2. Expert Classification of

Changes Metrics Clustering Method: Learning Phase 1. Learning Set Preparation 2. Expert Classification of Learning Set 3. Change Metrics Calculation 4. Change Metrics Vectors Clustering 5. Mapping Clusters to Expert Classes 10

Fuzzy Change Metrics Clustering Algorithm 11

Fuzzy Change Metrics Clustering Algorithm 11

Changes Metrics Clustering Method: Changes Classification 1. Changes Metrics Calculation 2. Map Changes to

Changes Metrics Clustering Method: Changes Classification 1. Changes Metrics Calculation 2. Map Changes to Nearest Clusters of Learning Set 3. Computation of Change Class by Cluster-Class Mapping, Built During Learning 12

Changes Metrics Calculated as subtraction of revisions metrics ◦ ∆M = Mr – Mr-1

Changes Metrics Calculated as subtraction of revisions metrics ◦ ∆M = Mr – Mr-1 CC – Cyclomatic Complexity (number of linearly independent paths in execution graph) CS – number of Classes/Structures e. LOC – Effective Lines of Code (without empty and comment lines) 13

Metrics Calculation and Clustering of Changes from Navi-Manager Project Revis Nearest ion 16820 16833

Metrics Calculation and Clustering of Changes from Navi-Manager Project Revis Nearest ion 16820 16833 17026 17029 17038 17107 CC IC -2 0 +4 0 0 e. LOC Cluster +1 0 +4 0 0 +4 +12 -5 -1 +18 +1 -1 +89 Change Comment 1 Vessel objects now merged in one transaction. 2 Deleted an extra commit command. 3 Full format of lat and lon during polling report. 4 Set Message. Source. Update. Time after processing of each change. 2 Revert changes from r 17029. There’s no need to update time after each message processed. 3 Implementation of first version of vessel tracks loading from Mon. Server. 14

Fuzzy Clusters of Revisions Table Revision / Cluster 16820 16833 17026 17029 17038 17107

Fuzzy Clusters of Revisions Table Revision / Cluster 16820 16833 17026 17029 17038 17107 1 2 3 4 0, 78 0, 14 0, 00 0, 08 0, 02 0, 79 0, 00 0, 21 0, 32 0, 11 0, 36 0, 03 0, 30 0, 00 0, 67 0, 02 0, 79 0, 00 0, 20 0, 11 0, 68 0, 11 15

Method Learning Example Project: Navi-Manager of Learning Set: 29 changes Number of Clusters: 4

Method Learning Example Project: Navi-Manager of Learning Set: 29 changes Number of Clusters: 4 Size Cluster 1 2 3 4 Expert Class Refactoring Code Deletion New Functionality Implementation Bugfix 16

Classification Example Revisio n Nearest Class Nearest Cluster Change Comment 16820 Refactor. 1 Vessel

Classification Example Revisio n Nearest Class Nearest Cluster Change Comment 16820 Refactor. 1 Vessel objects now merged in one transaction. Delete 16833 Func. 2 Deleted an extra commit command. 3 Full format of lat and lon during polling report. 4 Set Message. Source. Update. Time after processing of each change. 2 Revert changes from r 17029. There’s no need to update time after each message processed. 3 Implementation of first version of vessel tracks loading from Mon. Server. New 17026 Feature 17029 Bugfix 17038 Del. Func. New 17107 Feature 17

Classification Fuzziness r 16833 «Deleted an extra commit command» classified as: Change 2% as

Classification Fuzziness r 16833 «Deleted an extra commit command» classified as: Change 2% as refactoring On 79% as code deletion On 0% as new functionality implementation On 20% as bugfix On 18

Code Changes Classification in Software Development Process Project Manager Dev. Team Leader Source Code

Code Changes Classification in Software Development Process Project Manager Dev. Team Leader Source Code Developer Testing Team Leader 19

Changes Control During Important Development Phases Deny potentially classes destabilizing changes Main Dev Cycle

Changes Control During Important Development Phases Deny potentially classes destabilizing changes Main Dev Cycle Stop Сode Code Freeze New Functionality Implementation + – – Code Deletion + + – – – + Dev Phase Change Class Refactoring Small Bugfixes Critical Bugfixes 20

Request List of Changes by Class For Example: request refactorings list done in specific

Request List of Changes by Class For Example: request refactorings list done in specific version X Request Refactorings in Version X Dev Team Leader List of Refactorings in Version X Automated Source Code Changes Classifier List of Changes in Version X Source Control System 21

Project Statistics Analysis Navi-Manager Change Statistics 6%2% 22% Small bugfixes: 70% Small new features+refac

Project Statistics Analysis Navi-Manager Change Statistics 6%2% 22% Small bugfixes: 70% Small new features+refac toring: 22% 70% Big new features: 6% Tortoise. SVN Change Statistics 38% 34% 28% Bugfixes: 38% Refactoring: 28% New functionality: 34% 22

Achieved Results on Navi-Manager Project Effectiveness ◦ More than 50% time economy on code

Achieved Results on Navi-Manager Project Effectiveness ◦ More than 50% time economy on code review Development Problems Discover ◦ Too much bugfixes comparing to new feature implementations 23

Automated Changes Classification Tool Works with Subversion Low depended from program language Calculates CC,

Automated Changes Classification Tool Works with Subversion Low depended from program language Calculates CC, CS, e. LOC metrics Discovers change classes: ◦ ◦ ◦ New feature implementation Code deletion Refactoring Cosmetic Changes Bugfixes* 24

Future Research Method improvements ◦ Gustavson-Kessel Clustering ◦ Object and coupling metrics usage Refactorings

Future Research Method improvements ◦ Gustavson-Kessel Clustering ◦ Object and coupling metrics usage Refactorings classification Application widening ◦ Usage in development process on constant basis ◦ Adaptability analysis for different types of projects 25

Thank you! Any questions? evgeny. knyazev@gmail. com 26

Thank you! Any questions? evgeny. knyazev@gmail. com 26