Revisiting Software Development Effort Estimation Based on Early
Revisiting Software Development Effort Estimation Based on Early Phase Development Activities MSR 2013 Masateru Tsunoda, Koji Toda, Kyohei Fushida, Yasutaka Kamei, Meiyappan Nagappan, Naoyasu Ubayashi (Kinki University, Japan; Fukuoka Institute of Technology, Japan; NTT DATA Corporation, Japan; Kyushu University, Japan; Queen's University, Canada)
Effort is estimated using software size o Total effort is estimated based on software size to decide staffing and schedule. n High accuracy is needed to avoid project failure. Estimated effort = 4 person-month × 2 developers 2 months o Linear regression is one of common methods to build the model [1]. n Size is settled by a method such as function point analysis. effort (estimated) = 1. 5 + 0. 069×size [1] L. Briand, T. Langley, and I. Wieczorek, “A replicated assessment and comparison of common software cost modeling techniques, ” In Proc. of International Conference on Software Engineering (ICSE), pp. 377– 386, Limerick, Ireland, June 2000. 1
Estimation using early phase activity o Practitioners use the ratio of early phase activities to the whole phase. effort (estimated) = 3×early phase effort Ratio of effort before design phase (estimation timing) is 33% (on avg. ). total effort (estimated) Estimation timing o Which shows higher estimation accuracy? effort (estimated) = 3×early phase effort (estimated) = 1. 5 + 0. 069×size 2
What should be clarified? o RQ 1: When a model using software size or early phase effort is built, which shows higher accuracy? n Collecting data requires effort, so some organizations do not have detailed data. Software size Model Add (RQ 2) Programming Language Total effort Model Early phase effort Total effort Add (RQ 2) Programming Language o RQ 2: When other variables are added to the models on RQ 1, which model shows higher accuracy? 3
What should be clarified? (contd. ) o RQ 3: When both software size and early phase effort are used, is estimation accuracy improved? Does multicollinearity arise by using them? Software size Model Total effort Early phase effort 4
Built models using ISBSG dataset o 118 projects (data point) in ISBSG dataset n Collected from organizations in 20 countries by ISBSG. o Definition of early phase effort n Planning-and-analysis effort o Built model n n n FP (software size): baseline planning effort planning-and-analysis effort planning effort, FP planning-and-analysis effort, FP Software size Early phase effort Software size 5
How to compare estimation accuracy o Evaluated accuracies by differences in balanced relative error (BRE) from the baseline (FP model). effort = a x FP (baseline) average BRE = 100% effort = b x planning effort average BRE = 50% Difference of average BRE = 100% - 50% = 50% improved Example of BRE Actual effort 100 hour Estimated effort 50 hour BRE = |100 -50| / 50 = 100% 6
Early phase effort improves accuracy o Accuracy of a model using early phase effort is higher than software size. n Built models without other variables. Size (FP) and early phase effort > Size (Ans. to RQ 3: When other variables are not used) Difference o Multicollinearity did not arise when using both early phase effort and FP. 7
Other variables did not work well o Adding other variables improved accuracy of FP model. o But it did not improve accuracy of other models. n Other variables: development type, platform type, language type o Set baseline as FP with the variables, and compared other models without the variables. effort = a x FP+ c x platform type + … (baseline) Avg. BRE = 100% effort = b x planning effort Avg. BRE = 50% Difference of average BRE = 100% - 50% = 50% improved 8
Early phase effort is still effective o Accuracy of a model using planning-and-analysis effort is higher, but planning effort is not than software size (Ans. to RQ 2). Size and early phase effort > Size Difference o Using both early phase effort and software size improves accuracy, and multicollinearity does not arise. (Ans. to RQ 3) 9
How to build estimation model? o Preferable to build a model that only uses early phase effort as an explanatory variable. n If an organization does not collect data in detail o It might not be preferable to use variables which we used as additional variables. n In organizations that collect other data in detail o Using both early phase effort and software size improves the accuracy without multicollinearity. n If software size is settled precisely by a method such as function point analysis 10
- Slides: 11