Leveraging Cross-Project Similarity for Data Augmentation and Security Bug Report Prediction

Accurately identifying security bug reports remains a key challenge in software development. Due to the varying expertise of bug reporters, many security bug reports are incorrectly labeled as non-security bug reports, this increases the security risk of the software and the workload of developers t...

Full description

Saved in:
Bibliographic Details
Main Authors: Jinfeng Ji, Geunseok Yang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10978022/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Accurately identifying security bug reports remains a key challenge in software development. Due to the varying expertise of bug reporters, many security bug reports are incorrectly labeled as non-security bug reports, this increases the security risk of the software and the workload of developers to identify these incorrectly labeled reports from bug reports. This study aims to improve the prediction of security bug reports by addressing the class imbalance problem and enhancing the generalization ability of the model across projects. To achieve this goal, we propose a deep learning-based prediction method combined with a novel data augmentation method based on cross-project text similarity. The bug report data is collected from four open-source projects: Ambari, Camel, Derby, and Wicket, where the number of security bug reports is 56, 74, 179, and 47, respectively, and the number of non-security bug reports is significantly higher. To alleviate the imbalance phenomenon and leverage cross-project knowledge, we augment the dataset by identifying and merging semantically similar security bug reports from other projects. We evaluate 5 deep learning models, including CNN, LSTM, GRU, Transformer, and BERT. Our approach achieved F1 scores between 0.60 and 0.98, with the best performance using LSTM and GRU models, especially LSTM on Ambari, GRU on Camel and Ambari, they both achieved an F1 score of 0.98. The overall average F1 score is 0.77, a significant improvement over the baseline classification. The results show that data augmentation based on cross-project similarities is an effective strategy to improve security bug report prediction, especially in imbalanced datasets. This approach can help developers detect security-related issues more effectively, reduce the risk of misclassification, and enhance overall software security.
ISSN:2169-3536