A Novel Ensemble Classifier Selection Method for Software Defect Prediction

The presence of software defects significantly impacts the quality of software systems and increases development and maintenance costs. To improve system quality and reduce costs, it is necessary to predict software defects in the early stages of the software development lifecycle. This paper propos...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xin Dong, Jie Wang, Yan Liang
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Diversity measures double fault disagreement ensemble classifier selection imbalanced data classification software defect prediction
Online Access:	https://ieeexplore.ieee.org/document/10869442/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The presence of software defects significantly impacts the quality of software systems and increases development and maintenance costs. To improve system quality and reduce costs, it is necessary to predict software defects in the early stages of the software development lifecycle. This paper proposes Double Fault Disagreement (DFD), a novel diversity metric and method for selecting competent base classifiers for ensemble learning-based software defect prediction. To consider the diversity features of the base learners, several base learners with strong diversity are chosen to build ensemble learning. This method makes full use of the diversity characteristics of base learners, leverages their classification ability, optimizes the selection method for ensemble learning, and enhances the predictive performance of the ensemble model. The experimental results demonstrate that the DFD ensemble learning-based software defect prediction model outperforms the ten other models, including five common machine learning (ML) classification algorithms (logistic regression (LR), naïve Bayes (NB), K-nearest neighbor (KNN), decision tree (DT), and support vector machine (SVM)), two deep learning (DL) algorithms (multi-layer perceptron (MLP) and convolutional neural network (CNN)), and three ensemble learning algorithms (random forest (RF), extreme gradient boosting (XGB), and stacking). The DFD model achieves superior performance on eight public NASA and PROMISE datasets (six of which are imbalanced) across five performance indicators, including area under the curve (AUC), geometric mean (G-Mean), F1 score, Matthews correlation coefficient (MCC), and Balance. Furthermore, the DFD method is not only highly performant but also requires a small number of base learners to converge rapidly, thereby reducing the amount of data and computation time required.
ISSN:	2169-3536

A Novel Ensemble Classifier Selection Method for Software Defect Prediction

Similar Items