TY - GEN
T1 - An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach
AU - Fucci, Davide
AU - Scanniello, Giuseppe
AU - Romano, Simone
AU - Shepperd, Martin
AU - Sigweni, Boyce
AU - Uyaguari, Fernando
AU - Turhan, Burak
AU - Juristo, Natalia
AU - Oivo, Markku
PY - 2016/9/8
Y1 - 2016/9/8
N2 - Context: Test-driven development (TDD) is an agile practice claimed to improve the quality of a software product, as well as the productivity of its developers. A previous study (i.e., baseline experiment) at the University of Oulu (Finland) compared TDD to a test-last development (TLD) approach through a randomized controlled trial. The results failed to support the claims. Goal: We want to validate the original study results by replicating it at the University of Basilicata (Italy), using a different design. Method: We replicated the baseline experiment, using a crossover design, with 21 graduate students. We kept the settings and context as close as possible to the baseline experiment. In order to limit researchers bias, we involved two other sites (UPM, Spain, and Brunel, UK) to conduct blind analysis of the data. Results: The Kruskal-Wallis tests did not show any significant difference between TDD and TLD in terms of testing effort (p-value = .27), external code quality (p-value = .82), and developers' productivity (p-value = .83). Nevertheless, our data revealed a difference based on the order in which TDD and TLD were applied, though no carry over effect. Conclusions: We verify the baseline study results, yet our results raises concerns regarding the selection of experimental objects, particularly with respect to their interaction with the order in which of treatments are applied. We recommend future studies to survey the tasks used in experiments evaluating TDD. Finally, to lower the cost of replication studies and reduce researchers' bias, we encourage other research groups to adopt similar multi-site blind analysis approach described in this paper.
AB - Context: Test-driven development (TDD) is an agile practice claimed to improve the quality of a software product, as well as the productivity of its developers. A previous study (i.e., baseline experiment) at the University of Oulu (Finland) compared TDD to a test-last development (TLD) approach through a randomized controlled trial. The results failed to support the claims. Goal: We want to validate the original study results by replicating it at the University of Basilicata (Italy), using a different design. Method: We replicated the baseline experiment, using a crossover design, with 21 graduate students. We kept the settings and context as close as possible to the baseline experiment. In order to limit researchers bias, we involved two other sites (UPM, Spain, and Brunel, UK) to conduct blind analysis of the data. Results: The Kruskal-Wallis tests did not show any significant difference between TDD and TLD in terms of testing effort (p-value = .27), external code quality (p-value = .82), and developers' productivity (p-value = .83). Nevertheless, our data revealed a difference based on the order in which TDD and TLD were applied, though no carry over effect. Conclusions: We verify the baseline study results, yet our results raises concerns regarding the selection of experimental objects, particularly with respect to their interaction with the order in which of treatments are applied. We recommend future studies to survey the tasks used in experiments evaluating TDD. Finally, to lower the cost of replication studies and reduce researchers' bias, we encourage other research groups to adopt similar multi-site blind analysis approach described in this paper.
UR - http://www.scopus.com/inward/record.url?scp=84991666654&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84991666654&partnerID=8YFLogxK
U2 - 10.1145/2961111.2962592
DO - 10.1145/2961111.2962592
M3 - Conference contribution
AN - SCOPUS:84991666654
T3 - International Symposium on Empirical Software Engineering and Measurement
BT - 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016
PB - IEEE Computer Society
T2 - 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016
Y2 - 8 September 2016 through 9 September 2016
ER -