TY - GEN
T1 - Realistic assessment of software effort estimation models
AU - Sigweni, Boyce
AU - Shepperd, Martin
AU - Turchi, Tommaso
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/6/1
Y1 - 2016/6/1
N2 - Context: It is unclear that current approaches to evaluating or comparing competing software cost or effort models give a realistic picture of how they would perform in actual use. Specifically, we're concerned that the usual practice of using all data with some holdout strategy is at variance with the reality of a data set growing as projects complete. Objective: This study investigates the impact of using unrealistic, though possibly convenient to the researchers, ways to compare models on commercial data sets. Our questions are does this lead to different conclusions in terms of the comparisons and if so,are the results biased e.g., more optimistic than those that might realistically be achieved in practice. Method: We compare a traditional approach based on leave one out cross-validation with growing the data set chronologically using the Finnish and Desharnais data sets. Results: Our realistic, time-based approach to validation is significantly more conservative than leave-one-out cross-validation (LOOCV) for both data sets. Conclusion: If we want our research to lead to actionable findings it's incumbent upon the researchers to evaluate their models in realistic ways. This means a departure from LOOCV techniques, while further investigation is needed for other validation techniques, such as k-fold validation.
AB - Context: It is unclear that current approaches to evaluating or comparing competing software cost or effort models give a realistic picture of how they would perform in actual use. Specifically, we're concerned that the usual practice of using all data with some holdout strategy is at variance with the reality of a data set growing as projects complete. Objective: This study investigates the impact of using unrealistic, though possibly convenient to the researchers, ways to compare models on commercial data sets. Our questions are does this lead to different conclusions in terms of the comparisons and if so,are the results biased e.g., more optimistic than those that might realistically be achieved in practice. Method: We compare a traditional approach based on leave one out cross-validation with growing the data set chronologically using the Finnish and Desharnais data sets. Results: Our realistic, time-based approach to validation is significantly more conservative than leave-one-out cross-validation (LOOCV) for both data sets. Conclusion: If we want our research to lead to actionable findings it's incumbent upon the researchers to evaluate their models in realistic ways. This means a departure from LOOCV techniques, while further investigation is needed for other validation techniques, such as k-fold validation.
UR - http://www.scopus.com/inward/record.url?scp=84978535952&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84978535952&partnerID=8YFLogxK
U2 - 10.1145/2915970.2916005
DO - 10.1145/2915970.2916005
M3 - Conference contribution
AN - SCOPUS:84978535952
T3 - ACM International Conference Proceeding Series
BT - Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering, EASE 2016
PB - Association for Computing Machinery
T2 - 20th International Conference on Evaluation and Assessment in Software Engineering, EASE 2016
Y2 - 1 June 2016 through 3 June 2016
ER -