Please use this identifier to cite or link to this item: http://cmuir.cmu.ac.th/jspui/handle/6653943832/57025
Full metadata record
DC FieldValueLanguage
dc.contributor.authorKwabena Ebo Benninen_US
dc.contributor.authorJacky Keungen_US
dc.contributor.authorAkito Mondenen_US
dc.contributor.authorPassakorn Phannachittaen_US
dc.contributor.authorSolomon Mensahen_US
dc.date.accessioned2018-09-05T03:34:06Z-
dc.date.available2018-09-05T03:34:06Z-
dc.date.issued2017-12-07en_US
dc.identifier.issn19493789en_US
dc.identifier.issn19493770en_US
dc.identifier.other2-s2.0-85042378748en_US
dc.identifier.other10.1109/ESEM.2017.50en_US
dc.identifier.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85042378748&origin=inwarden_US
dc.identifier.urihttp://cmuir.cmu.ac.th/jspui/handle/6653943832/57025-
dc.description.abstract© 2017 IEEE. Context: Recent studies have shown that performance of defect prediction models can be affected when data sampling approaches are applied to imbalanced training data for building defect prediction models. However, the magnitude (degree and power) of the effect of these sampling methods on the classification and prioritization performances of defect prediction models is still unknown. Goal: To investigate the statistical and practical significance of using resampled data for constructing defect prediction models. Method: We examine the practical effects of six data sampling methods on performances of five defect prediction models. The prediction performances of the models trained on default datasets (no sampling method) are compared with that of the models trained on resampled datasets (application of sampling methods). To decide whether the performance changes are significant or not, robust statistical tests are performed and effect sizes computed. Twenty releases of ten open source projects extracted from the PROMISE repository are considered and evaluated using the AUC, pd, pf and G-mean performance measures. Results: There are statistical significant differences and practical effects on the classification performance (pd, pf and G-mean) between models trained on resampled datasets and those trained on the default datasets. However, sampling methods have no statistical and practical effects on defect prioritization performance (AUC) with small or no effect values obtained from the models trained on the resampled datasets. Conclusions: Existing sampling methods can properly set the threshold between buggy and clean samples, while they cannot improve the prediction of defect-proneness itself. Sampling methods are highly recommended for defect classification purposes when all faulty modules are to be considered for testing.en_US
dc.subjectComputer Scienceen_US
dc.titleThe Significant Effects of Data Sampling Approaches on Software Defect Prioritization and Classificationen_US
dc.typeConference Proceedingen_US
article.title.sourcetitleInternational Symposium on Empirical Software Engineering and Measurementen_US
article.volume2017-Novemberen_US
article.stream.affiliationsCity University of Hong Kongen_US
article.stream.affiliationsOkayama Universityen_US
article.stream.affiliationsChiang Mai Universityen_US
Appears in Collections:CMUL: Journal Articles

Files in This Item:
There are no files associated with this item.


Items in CMUIR are protected by copyright, with all rights reserved, unless otherwise indicated.