Fuzzy Set and Cachebased Approach for Bug Triaging
Fuzzy Set and Cache-based Approach for Bug Triaging Ahmed Y. Tamrawi Electrical and Computer Engineering Department Iowa State University 2011
Software Bugs 1 2 3 4 5 { Introduction } The term “Bug” Definition: (Software Bug) A common term used to describe a flaw, mistake, or failure in a computer system that produces an • Bugs can occur in any software. incorrect or unexpected result, or causes it to behave (September 9, 1947) • in unintended ways. Ranging from operating systems, flight auto- pilot software, to a simple arithmetic program! • Software bugs are costing ~60 bln US$/Y. Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 2
More Bugs Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 1 2 3 4 5 { Introduction } 3
Bug Repository 1 2 3 4 5 { Introduction } • Software users and developers report bugs, to allow software developers to fix them. • Bugs are reported using bug reports which are added to an issue tracking system or bug repository. Bugs Repository reported Iowa State University An interface for Bugs Repository stored Fuzzy Set and Cache-based Approach for Bug Triaging 4
Bug Triaging 1 2 3 4 5 { Introduction } Definition: (Bug Triaging) a bug to the most appropriate/capable • Assigning Manual bug triaging is a difficult, expensive, developer who will fix it. and lengthy process, since it needs the bug triager to manually read, analyze, and assign bug fixers for each newly reported bug. Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 5
Bug Triaging Bug Assignment 1 3 4 5 { Introduction } Bug Triager Bugs Repository New Bug Reports Iowa State University 2 Software Developers Fuzzy Set and Cache-based Approach for Bug Triaging 6
Bug Triaging 1 2 3 4 5 { Introduction } • Bug triager challenges: – – Knowledge about the system/project; Descriptiveness of bug report; Rate of reporting bugs; Many developers, different projects, and various Eclipse – Feb 2011 expertise! • Why not to automate the bug triaging process? – Improve software quality; – Reduce cost and time. Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 7
Example 1 2 3 4 5 { Motivation } Assigned to: James Moody Summary: New Repository wizard follows implementation model, not user model. Description: The new CVS Repository Connection wizard's layout is confusing. This is because it follows the implementation model of the order of elds in the full CVS location path rather than the user model. . . Assigned to: James Moody Summary: Opening repository resources doesn't honor type. Description: Opening repository resource always open the default text editor and doesn't honor any mapping between resource types and editors. As a result it is not possible to view the contents of an image (*. gif le) in a sensible way. . Technical Aspect Version Control Management (VCM) James Moody Iowa State University This aspect is concerned about various Concurrent Versions System (CVS) repository features and operations within Eclipse project. Fuzzy Set and Cache-based Approach for Bug Triaging 8
Technical Aspects & Terms 1 2 3 4 5 { Motivation } • A software system has many technical aspects. • Technical aspects are described via the technical terms extracted from software artifacts. • A bug report describes issues related to technical aspects via its terms. Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 9
Automatic Bug Triaging 1 2 3 4 5 { Motivation } Key Philosophy for Automatic Bug Triaging Who have the most bug-fixing capability/expertise with respect to the reported technical aspect(s) in a give bug report should be the fixer(s) Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 10
Problem Definition 1 2 3 4 5 {Bugzie Model } Problem: (Automatic Bug Assignment) In a software system, given a bug report B, and a set of developers D who have past fixing activity. Find the developers(s) with the most fixing expertise with respect to the reported technical aspect(s) in B. New Bug Report B Iowa State University Software Developers Fuzzy Set and Cache-based Approach for Bug Triaging Bugs Repository 11
Bugzie Overview 1 2 3 4 5 {Bugzie Model } • Bugzie considers the problem as a ranking problem. – State-of-the-art approaches view the problem as a classification problem. • For a bug report, Bugzie determines a ranked list of developers most capable toward the reported issue(s). Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 12
Bugzie Overview 1 2 3 4 5 {Bugzie Model } • Bugzie utilizes the fuzzy set theory to rank the fixing expertise of developers toward the technical aspects. • Bugzie models the association of a developer and technical aspects. • If a developer has higher fixing association with a technical aspect, he will have higher expertise and rank for that aspect. Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 13
Association of Fixer & Term Definition: (Capable 2 3 4 5 {Bugzie Model } Fixer toward a Term) • is more capable than in the issues related to t. Iowa State University 1 Ct Fuzzy Set and Cache-based Approach for Bug Triaging 14
Association of Fixer & Term 1 2 3 4 5 {Bugzie Model } • The membership score of a developer d toward a term t is: • Dd: Bug reports d has fixed. • Dt: Bug reports containing t. D( ) Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 15
Association of Fixer & Bug Report (B) t 1 t 2 2 1 3 4 5 {Bugzie Model } tn CB Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 16
Association of Fixer & Bug Report 1 2 3 4 5 {Bugzie Model } • Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 17
1 Bugzie Model Pre-processing t 1 t 2 tn 3 Recommendation List 1 4 5 Updating Initial Training Iowa State University 5 {Bugzie Model } Bugs Repository 4 Fuzzy Set and Cache-based Approach for Bug Triaging Recommendation 2 3 Bug Report (B) 2 18
1 Bugzie Caching 2 3 4 5 {Bugzie Model } • Fixer candidates selection (Developers Caching). • Significant terms selection (Terms Caching). Developers Cache F(x) Bugs Repository Terms Cache T(k) Initial Training Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 19
Data Collection 1 2 3 4 5 {Bugzie Model } • Collected all fixed bug reports from 7 bug repositories. • For each bug report, we extracted and merged the summary and description. • For each system, we pre-processed these reports: stemming, stop words removal, etc. Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 20
Locality of Fixing Activity 1 2 3 4 5 {Bugzie Model } Bug Report 2006 2007 2008 2009 2010 Timeline Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 21
Locality of Fixing Activity 1 2 3 4 5 {Bugzie Model } • Bug Report B All Developers that have been fixing before B Hypothesis: (Locality of Fixing Activity) Fixed by d The recent fixing developers are likely to fix bug Fixing Recent x% Timeline reports in the near future. 2006 2007 2008 2009 2010 Developers Cache F(x) Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 22
Locality of Fixing Activity 1 2 3 4 5 {Bugzie Model } 96% - 99% 94% - 98% Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 23
Selection of Fixer Candidates 1 2 3 4 5 {Bugzie Model } • The locality of fixing activity suggests the actual fixer for a given bug report is likely the one having recent fixing activity. • For each bug report, Bugzie chooses the top x% of developers sorted by their fixing time as the fixer candidates F(x). Bug Report B Fixed by d All Developers that have been fixing before B Recent x% 2006 2007 2008 2009 2010 Fixing Timeline Developers Cache F(x) Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 24
1 Developers Caching Pre-processing t 1 t 2 tn 4 Recommendation List 5 2 Developers Cache F(x) Bugs Repository 6 Updating Initial Training Iowa State University 5 {Bugzie Model } 1 4 Fuzzy Set and Cache-based Approach for Bug Triaging Recommendation 3 3 Bug Report (B) 2 Bug Report (B) 25
Selection of Descriptive Terms 1 2 3 4 5 {Bugzie Model } Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 26
Selection of Descriptive Terms 1 2 3 4 5 {Bugzie Model } Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 27
1 Terms Caching 3 4 5 {Bugzie Model } t 1 t 2 tn Recommendation List Bugs Repository Iowa State University Terms Cache T(k) Updating Initial Training Fuzzy Set and Cache-based Approach for Bug Triaging Recommendation Pre-processing Bug Report (B) 2 Bug Report (B) 28
Empirical Evaluation 1 2 3 4 5 { Empirical Evaluation } • We evaluated Bugzie on our collected datasets. • Experiments: – – Selection of fixer candidates; Selection of terms; Selection of developers and terms; Comparison with state-of-the-art approaches. Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 29
Experiment Setup 1 2 3 4 5 { Empirical Evaluation } Bug Report B 2 3 4 5 6 1 Bugzie uses frame 0 for initial training 2 Using training data, Bugzie recommends a top-n developers to fix bug report B 7 Move to next Bug Report 3 Bugzie updates the training data with the tested bug report B 8 9 10 Creation Timeline 1 Recommendation List for B 0 Bugzie repeats steps 2 and 3 till it consumes all bug reports Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 30
Prediction Accuracy 1 2 3 4 5 { Empirical Evaluation } • If the recommendation list for a bug report contains its actual fixer, we count this as a hit (i. e. a correct recommendation). • For each frame under test, we calculated Prediction Accuracy (PA). • If we have 100 bugs and for 60 of those bugs, we could recommend the actual fixing developer is in our Top-2 list, then Top-2 prediction accuracy is 60% Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 31
Selection of Fixer Candidates Pre-processing t 1 t 2 4 tn 5 2 All Developers that have been fixing before B 1 Developers Cache F(x) 2006 Initial Training Iowa State University Bugs Repository 2007 Bug Report B Fixed by d Recent x% 2008 5 Recommendation 4 3 3 { Empirical Evaluation } Recommendation List Bug Report (B) 2 1 2009 2010 6 Updating Developers Cache F(x) Fuzzy Set and Cache-based Approach for Bug Triaging Fixing Bug Timeline Report (B) 32
Selection of Fixer Candidates Top-1 Prediction Accuracy Iowa State University Firefox ( ): At x = 10%, PA = 72. 4% At x = 100%, PA = 70. 7% Fuzzy Set and Cache-based Approach for Bug Triaging 1 2 3 4 5 { Empirical Evaluation } Top-5 Prediction Accuracy 33
Selection of Fixer Candidates 1 2 3 4 5 { Empirical Evaluation } • Selecting a suitable portion of recent fixers does not lessen much the accuracy, and sometimes improves it as in the cases of Firefox, Eclipse, etc. • Selecting only a portion of available developers as candidates also improves time efficiency. Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 34
Selection of Terms Pre-processing t 1 t 2 4 Recommendation List 2 5 Terms Cache T(k) 6 Updating Initial Training Iowa State University 5 Bugs Repository 4 { Empirical Evaluation } tn 1 3 Fuzzy Set and Cache-based Approach for Bug Triaging Recommendation 3 2 Bug Report (B) 1 Bug Report (B) 35
Selection of Terms Peak Range 2 3 4 5 { Empirical Evaluation } Peak Range Top-1 Prediction Accuracy Iowa State University 1 Eclipse( ): At k = 16, PA = 80% At k = All Terms, PA = 72% Top-5 Prediction Accuracy Fuzzy Set and Cache-based Approach for Bug Triaging 36
Selection of Terms 1 2 3 4 5 { Empirical Evaluation } • Selection of terms could improve much the prediction accuracy. • The results suggest that one just needs a small yet significant set of terms for each developer to describe his bug-fixing expertise. Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 37
Selection of Developers & Terms 1 2 3 4 5 { Empirical Evaluation } • To study the impact of both developers selection (x) and terms selection (k). Eclipse Firefox Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 38
Selection of Developers & Terms 1 2 3 4 5 { Empirical Evaluation } Base: Base model with all developers and all terms C. S. : Candidate Selection T. S. : Terms Selection Both: The best PA when applying both C. S. and T. S. Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 39
Comparison 1 2 3 4 5 { Empirical Evaluation } • We compared Bugzie Results with state-ofthe-art approaches. • Used Weka to re-implement those approaches Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 40
Comparison 1 2 3 4 5 { Empirical Evaluation } • Some of the approaches (C 4. 5 - Decision Trees) can not scale up well to our dataset. • We prepared smaller dataset: 3 -Year Histories of the full dataset Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 41
Comparison Results 1 2 3 4 5 { Empirical Evaluation } (d) days, (h) hours, (m) minutes, (s) seconds Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 42
Conclusions 1 2 3 4 5 { Conclusions} • Bugzie achieves higher accuracy and efficiency than state-of-the-art approaches. • Bugzie can accommodate the locality of fixing activity and software evolution with flexible caching of developers and terms. Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 43
Thesis Contributions 1 2 3 4 5 { Conclusions} • Bugzie, a scalable, fuzzy set and cache-based automatic bug triaging approach, which is significantly more efficient and accurate than existing state-of-the-art approaches. • The finding of the locality of fixing activity. • A comprehensive evaluation on the efficiency and correctness of Bugzie in comparison with state-of-the-art approaches. • An observation/method to capture a small and significant set of terms describing developers’ bug-fixing expertise. Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 44
Future Work 1 2 3 4 5 { Conclusions} • Use different caching mechanisms for developers and terms. • Explore the usage of other textual and nontextual contents of bug reports for bug triaging. • Use other software artifacts to accurately measure the developer’s expertise. Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 45
Thank You! Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging 47
- Slides: 46