Leveraging an Enhanced CodeBERT-Based Model for Multiclass Software Defect Prediction via Defect Classification

Ensuring software reliability through early-stage defect prevention and prediction is crucial, particularly as software systems become increasingly complex. Automated testing has emerged as the most practical approach to achieving bug-free and efficient code. In this context, machine learning-driven...

Full description

Saved in:
Bibliographic Details
Main Authors: Rida Ghafoor Hussain, Kin-Choong Yow, Marco Gori
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10820528/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823859636420214784
author Rida Ghafoor Hussain
Kin-Choong Yow
Marco Gori
author_facet Rida Ghafoor Hussain
Kin-Choong Yow
Marco Gori
author_sort Rida Ghafoor Hussain
collection DOAJ
description Ensuring software reliability through early-stage defect prevention and prediction is crucial, particularly as software systems become increasingly complex. Automated testing has emerged as the most practical approach to achieving bug-free and efficient code. In this context, machine learning-driven methods, especially those leveraging natural language models, have gained significant traction for developing effective techniques. This paper introduces a novel framework for automating software defect prediction, focusing on eight specific defects: SIGFPE, NZEC, LOGICAL, SYNTAX, SIGSEGV, SIGABRT, SEMANTIC, and LINKER. Our research involves a specialized dataset comprising nine classes, including eight common programming errors and one error-free class. The goal is to enhance software testing and development processes by identifying defects within code snippets. The proposed framework utilizes a CodeBERT-based algorithm for defect prediction, optimizing model hyperparameters to achieve superior accuracy. Comparative analysis against established models such as RoBERTa, Microsoft CodeBERT, and GPT-2 demonstrates that our approach yields significant improvements in prediction performance, with accuracy gains of up to 20% and 7% respectively in binary and multi class experimentation. Empirical studies validate the effectiveness of neural language models like CodeBERT for software defect prediction, highlighting substantial advancements in software testing and development techniques. These findings underscore the potential benefits of incorporating advanced machine learning models into the software development lifecycle.
format Article
id doaj-art-b4d85971d30a46ab80037d89852f70ba
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-b4d85971d30a46ab80037d89852f70ba2025-02-11T00:01:32ZengIEEEIEEE Access2169-35362025-01-0113243832439710.1109/ACCESS.2024.352506910820528Leveraging an Enhanced CodeBERT-Based Model for Multiclass Software Defect Prediction via Defect ClassificationRida Ghafoor Hussain0https://orcid.org/0009-0000-5734-9922Kin-Choong Yow1https://orcid.org/0000-0002-8610-661XMarco Gori2Department of Information Engineering, University of Florence, Florence, ItalyFaculty of Engineering and Applied Sciences, University of Regina, Regina, SK, CanadaDepartment of Information Engineering, University of Siena, Siena, ItalyEnsuring software reliability through early-stage defect prevention and prediction is crucial, particularly as software systems become increasingly complex. Automated testing has emerged as the most practical approach to achieving bug-free and efficient code. In this context, machine learning-driven methods, especially those leveraging natural language models, have gained significant traction for developing effective techniques. This paper introduces a novel framework for automating software defect prediction, focusing on eight specific defects: SIGFPE, NZEC, LOGICAL, SYNTAX, SIGSEGV, SIGABRT, SEMANTIC, and LINKER. Our research involves a specialized dataset comprising nine classes, including eight common programming errors and one error-free class. The goal is to enhance software testing and development processes by identifying defects within code snippets. The proposed framework utilizes a CodeBERT-based algorithm for defect prediction, optimizing model hyperparameters to achieve superior accuracy. Comparative analysis against established models such as RoBERTa, Microsoft CodeBERT, and GPT-2 demonstrates that our approach yields significant improvements in prediction performance, with accuracy gains of up to 20% and 7% respectively in binary and multi class experimentation. Empirical studies validate the effectiveness of neural language models like CodeBERT for software defect prediction, highlighting substantial advancements in software testing and development techniques. These findings underscore the potential benefits of incorporating advanced machine learning models into the software development lifecycle.https://ieeexplore.ieee.org/document/10820528/Software defect predictionCodeBERTdefectsGPTcode snippetssoftware reliability
spellingShingle Rida Ghafoor Hussain
Kin-Choong Yow
Marco Gori
Leveraging an Enhanced CodeBERT-Based Model for Multiclass Software Defect Prediction via Defect Classification
IEEE Access
Software defect prediction
CodeBERT
defects
GPT
code snippets
software reliability
title Leveraging an Enhanced CodeBERT-Based Model for Multiclass Software Defect Prediction via Defect Classification
title_full Leveraging an Enhanced CodeBERT-Based Model for Multiclass Software Defect Prediction via Defect Classification
title_fullStr Leveraging an Enhanced CodeBERT-Based Model for Multiclass Software Defect Prediction via Defect Classification
title_full_unstemmed Leveraging an Enhanced CodeBERT-Based Model for Multiclass Software Defect Prediction via Defect Classification
title_short Leveraging an Enhanced CodeBERT-Based Model for Multiclass Software Defect Prediction via Defect Classification
title_sort leveraging an enhanced codebert based model for multiclass software defect prediction via defect classification
topic Software defect prediction
CodeBERT
defects
GPT
code snippets
software reliability
url https://ieeexplore.ieee.org/document/10820528/
work_keys_str_mv AT ridaghafoorhussain leveraginganenhancedcodebertbasedmodelformulticlasssoftwaredefectpredictionviadefectclassification
AT kinchoongyow leveraginganenhancedcodebertbasedmodelformulticlasssoftwaredefectpredictionviadefectclassification
AT marcogori leveraginganenhancedcodebertbasedmodelformulticlasssoftwaredefectpredictionviadefectclassification