Evaluating the Predictive Power of Software Metrics for Fault Localization

Fault localization remains a critical challenge in software engineering, directly impacting debugging efficiency and software quality. This study investigates the predictive power of various software metrics for fault localization by framing the task as a multi-class classification problem and evalu...

Full description

Saved in:
Bibliographic Details
Main Authors: Issar Arab, Kenneth Magel, Mohammed Akour
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Computers
Subjects:
Online Access:https://www.mdpi.com/2073-431X/14/6/222
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849431860871954432
author Issar Arab
Kenneth Magel
Mohammed Akour
author_facet Issar Arab
Kenneth Magel
Mohammed Akour
author_sort Issar Arab
collection DOAJ
description Fault localization remains a critical challenge in software engineering, directly impacting debugging efficiency and software quality. This study investigates the predictive power of various software metrics for fault localization by framing the task as a multi-class classification problem and evaluating it using the Defects4J dataset. We fitted thousands of models and benchmarked different algorithms—including deep learning, Random Forest, XGBoost, and LightGBM—to choose the best-performing model. To enhance model transparency, we applied explainable AI techniques to analyze feature importance. The results revealed that test suite metrics consistently outperform static and dynamic metrics, making them the most effective predictors for identifying faulty classes. These findings underscore the critical role of test quality and coverage in automated fault localization. By combining machine learning with transparent feature analysis, this work delivers practical insights to support more efficient debugging workflows. It lays the groundwork for an iterative process that integrates metric-based predictive models with large language models (LLMs), enabling future systems to automatically generate targeted test cases for the most fault-prone components, which further enhances the automation and precision of software testing.
format Article
id doaj-art-ca95ec42e9354fa287f46b3e4de7ff02
institution Kabale University
issn 2073-431X
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Computers
spelling doaj-art-ca95ec42e9354fa287f46b3e4de7ff022025-08-20T03:27:30ZengMDPI AGComputers2073-431X2025-06-0114622210.3390/computers14060222Evaluating the Predictive Power of Software Metrics for Fault LocalizationIssar Arab0Kenneth Magel1Mohammed Akour2Adrem Data Laboratory, Department of Computer Science, University of Antwerp, 2020 Antwerpen, BelgiumFaculty of Computer Science, North Dakota State University, Fargo, ND 58108, USACollege of Computer and Information Sciences, Prince Sultan University, Riyadh 12435, Saudi ArabiaFault localization remains a critical challenge in software engineering, directly impacting debugging efficiency and software quality. This study investigates the predictive power of various software metrics for fault localization by framing the task as a multi-class classification problem and evaluating it using the Defects4J dataset. We fitted thousands of models and benchmarked different algorithms—including deep learning, Random Forest, XGBoost, and LightGBM—to choose the best-performing model. To enhance model transparency, we applied explainable AI techniques to analyze feature importance. The results revealed that test suite metrics consistently outperform static and dynamic metrics, making them the most effective predictors for identifying faulty classes. These findings underscore the critical role of test quality and coverage in automated fault localization. By combining machine learning with transparent feature analysis, this work delivers practical insights to support more efficient debugging workflows. It lays the groundwork for an iterative process that integrates metric-based predictive models with large language models (LLMs), enabling future systems to automatically generate targeted test cases for the most fault-prone components, which further enhances the automation and precision of software testing.https://www.mdpi.com/2073-431X/14/6/222fault localizationsoftware quality assurancemachine learningsoftware metricstest coverageautomated debugging
spellingShingle Issar Arab
Kenneth Magel
Mohammed Akour
Evaluating the Predictive Power of Software Metrics for Fault Localization
Computers
fault localization
software quality assurance
machine learning
software metrics
test coverage
automated debugging
title Evaluating the Predictive Power of Software Metrics for Fault Localization
title_full Evaluating the Predictive Power of Software Metrics for Fault Localization
title_fullStr Evaluating the Predictive Power of Software Metrics for Fault Localization
title_full_unstemmed Evaluating the Predictive Power of Software Metrics for Fault Localization
title_short Evaluating the Predictive Power of Software Metrics for Fault Localization
title_sort evaluating the predictive power of software metrics for fault localization
topic fault localization
software quality assurance
machine learning
software metrics
test coverage
automated debugging
url https://www.mdpi.com/2073-431X/14/6/222
work_keys_str_mv AT issararab evaluatingthepredictivepowerofsoftwaremetricsforfaultlocalization
AT kennethmagel evaluatingthepredictivepowerofsoftwaremetricsforfaultlocalization
AT mohammedakour evaluatingthepredictivepowerofsoftwaremetricsforfaultlocalization