Cross-Project Defect Prediction Using Transfer Learning with Long Short-Term Memory Networks

With the increasing number of software projects, within-project defect prediction (WPDP) has already been unable to meet the demand, and cross-project defect prediction (CPDP) is playing an increasingly significant role in the area of software engineering. The classic CPDP methods mainly concentrate...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hongwei Tao, Lianyou Fu, Qiaoling Cao, Xiaoxu Niu, Haoran Chen, Songtao Shang, Yang Xian
Format:	Article
Language:	English
Published:	Wiley 2024-01-01
Series:	IET Software
Online Access:	http://dx.doi.org/10.1049/2024/5550801
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849468173935443968
author	Hongwei Tao Lianyou Fu Qiaoling Cao Xiaoxu Niu Haoran Chen Songtao Shang Yang Xian
author_facet	Hongwei Tao Lianyou Fu Qiaoling Cao Xiaoxu Niu Haoran Chen Songtao Shang Yang Xian
author_sort	Hongwei Tao
collection	DOAJ
description	With the increasing number of software projects, within-project defect prediction (WPDP) has already been unable to meet the demand, and cross-project defect prediction (CPDP) is playing an increasingly significant role in the area of software engineering. The classic CPDP methods mainly concentrated on applying metric features to predict defects. However, these approaches failed to consider the rich semantic information, which usually contains the relationship between software defects and context. Since traditional methods are unable to exploit this characteristic, their performance is often unsatisfactory. In this paper, a transfer long short-term memory (TLSTM) network model is first proposed. Transfer semantic features are extracted by adding a transfer learning algorithm to the long short-term memory (LSTM) network. Then, the traditional metric features and semantic features are combined for CPDP. First, the abstract syntax trees (AST) are generated based on the source codes. Second, the AST node contents are converted into integer vectors as inputs to the TLSTM model. Then, the semantic features of the program can be extracted by TLSTM. On the other hand, transferable metric features are extracted by transfer component analysis (TCA). Finally, the semantic features and metric features are combined and input into the logical regression (LR) classifier for training. The presented TLSTM model performs better on the f-measure indicator than other machine and deep learning models, according to the outcomes of several open-source projects of the PROMISE repository. The TLSTM model built with a single feature achieves 0.7% and 2.1% improvement on Log4j-1.2 and Xalan-2.7, respectively. When using combined features to train the prediction model, we call this model a transfer long short-term memory for defect prediction (DPTLSTM). DPTLSTM achieves a 2.9% and 5% improvement on Synapse-1.2 and Xerces-1.4.4, respectively. Both prove the superiority of the proposed model on the CPDP task. This is because LSTM capture long-term dependencies in sequence data and extract features that contain source code structure and context information. It can be concluded that: (1) the TLSTM model has the advantage of preserving information, which can better retain the semantic features related to software defects; (2) compared with the CPDP model trained with traditional metric features, the performance of the model can validly enhance by combining semantic features and metric features.
format	Article
id	doaj-art-f2f1f099b62e44acb8f65b3aa89b59e4
institution	Kabale University
issn	1751-8814
language	English
publishDate	2024-01-01
publisher	Wiley
record_format	Article
series	IET Software
spelling	doaj-art-f2f1f099b62e44acb8f65b3aa89b59e42025-08-20T03:25:56ZengWileyIET Software1751-88142024-01-01202410.1049/2024/5550801Cross-Project Defect Prediction Using Transfer Learning with Long Short-Term Memory NetworksHongwei Tao0Lianyou Fu1Qiaoling Cao2Xiaoxu Niu3Haoran Chen4Songtao Shang5Yang Xian6School of Computer Science and TechnologySchool of Computer Science and TechnologySchool of Computer Science and TechnologySchool of Computer Science and TechnologySchool of Computer Science and TechnologySchool of Computer Science and TechnologySchool of Computer Science and TechnologyWith the increasing number of software projects, within-project defect prediction (WPDP) has already been unable to meet the demand, and cross-project defect prediction (CPDP) is playing an increasingly significant role in the area of software engineering. The classic CPDP methods mainly concentrated on applying metric features to predict defects. However, these approaches failed to consider the rich semantic information, which usually contains the relationship between software defects and context. Since traditional methods are unable to exploit this characteristic, their performance is often unsatisfactory. In this paper, a transfer long short-term memory (TLSTM) network model is first proposed. Transfer semantic features are extracted by adding a transfer learning algorithm to the long short-term memory (LSTM) network. Then, the traditional metric features and semantic features are combined for CPDP. First, the abstract syntax trees (AST) are generated based on the source codes. Second, the AST node contents are converted into integer vectors as inputs to the TLSTM model. Then, the semantic features of the program can be extracted by TLSTM. On the other hand, transferable metric features are extracted by transfer component analysis (TCA). Finally, the semantic features and metric features are combined and input into the logical regression (LR) classifier for training. The presented TLSTM model performs better on the f-measure indicator than other machine and deep learning models, according to the outcomes of several open-source projects of the PROMISE repository. The TLSTM model built with a single feature achieves 0.7% and 2.1% improvement on Log4j-1.2 and Xalan-2.7, respectively. When using combined features to train the prediction model, we call this model a transfer long short-term memory for defect prediction (DPTLSTM). DPTLSTM achieves a 2.9% and 5% improvement on Synapse-1.2 and Xerces-1.4.4, respectively. Both prove the superiority of the proposed model on the CPDP task. This is because LSTM capture long-term dependencies in sequence data and extract features that contain source code structure and context information. It can be concluded that: (1) the TLSTM model has the advantage of preserving information, which can better retain the semantic features related to software defects; (2) compared with the CPDP model trained with traditional metric features, the performance of the model can validly enhance by combining semantic features and metric features.http://dx.doi.org/10.1049/2024/5550801
spellingShingle	Hongwei Tao Lianyou Fu Qiaoling Cao Xiaoxu Niu Haoran Chen Songtao Shang Yang Xian Cross-Project Defect Prediction Using Transfer Learning with Long Short-Term Memory Networks IET Software
title	Cross-Project Defect Prediction Using Transfer Learning with Long Short-Term Memory Networks
title_full	Cross-Project Defect Prediction Using Transfer Learning with Long Short-Term Memory Networks
title_fullStr	Cross-Project Defect Prediction Using Transfer Learning with Long Short-Term Memory Networks
title_full_unstemmed	Cross-Project Defect Prediction Using Transfer Learning with Long Short-Term Memory Networks
title_short	Cross-Project Defect Prediction Using Transfer Learning with Long Short-Term Memory Networks
title_sort	cross project defect prediction using transfer learning with long short term memory networks
url	http://dx.doi.org/10.1049/2024/5550801
work_keys_str_mv	AT hongweitao crossprojectdefectpredictionusingtransferlearningwithlongshorttermmemorynetworks AT lianyoufu crossprojectdefectpredictionusingtransferlearningwithlongshorttermmemorynetworks AT qiaolingcao crossprojectdefectpredictionusingtransferlearningwithlongshorttermmemorynetworks AT xiaoxuniu crossprojectdefectpredictionusingtransferlearningwithlongshorttermmemorynetworks AT haoranchen crossprojectdefectpredictionusingtransferlearningwithlongshorttermmemorynetworks AT songtaoshang crossprojectdefectpredictionusingtransferlearningwithlongshorttermmemorynetworks AT yangxian crossprojectdefectpredictionusingtransferlearningwithlongshorttermmemorynetworks

Cross-Project Defect Prediction Using Transfer Learning with Long Short-Term Memory Networks

Similar Items