Noisy Token Removal for Bug Localization: The Impact of Semantically Confusing Misguiding Terms
A bug report is a technical document describing bugs that have occurred in the software. Finding the source code files to resolve a reported bug is a laborious task. To automate this process, information retrieval-based bug localization (IRBL) techniques have been proposed. These techniques assess t...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10755074/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850236722271485952 |
|---|---|
| author | Youngkyoung Kim Misoo Kim Eunseok Lee |
| author_facet | Youngkyoung Kim Misoo Kim Eunseok Lee |
| author_sort | Youngkyoung Kim |
| collection | DOAJ |
| description | A bug report is a technical document describing bugs that have occurred in the software. Finding the source code files to resolve a reported bug is a laborious task. To automate this process, information retrieval-based bug localization (IRBL) techniques have been proposed. These techniques assess the relevance between the bug report and source files, providing developers with a ranked list of source files. They rely heavily on text tokens, making it essential to remove noisy tokens from the input tokens. To address the problem of prevalent noisy tokens deteriorating IRBL performance, we define impactful noisy words as misguiding terms and investigate their prevalence and impact. We employed a deep learning model combined with explainable AI techniques to detect misguiding terms, leveraging their semantic embedding capabilities. We conducted extensive experiments on 24 open-source software projects and three IRBL models. By removing misguiding terms, the mean reciprocal rank of bug localization improved by 19%, 17%, and 27% for three models on average and up to 120%. The proposed approach effectively distinguishes between beneficial terms and noise, leading to superior IRBL performance compared to the existing noise detection approaches, with consistent improvements observed across 24 projects. |
| format | Article |
| id | doaj-art-eb58e899f5d947169cf18d703e7eceac |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-eb58e899f5d947169cf18d703e7eceac2025-08-20T02:01:54ZengIEEEIEEE Access2169-35362024-01-011217239617240910.1109/ACCESS.2024.350036710755074Noisy Token Removal for Bug Localization: The Impact of Semantically Confusing Misguiding TermsYoungkyoung Kim0https://orcid.org/0000-0001-5457-7997Misoo Kim1https://orcid.org/0000-0002-8274-5457Eunseok Lee2https://orcid.org/0000-0002-6557-8087Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon-si, Gyeonggi-do, South KoreaDepartment of Artificial Intelligence Convergence, Chonnam National University, Gwangju, South KoreaCollege of Computing and Informatics, Sungkyunkwan University, Suwon-si, Gyeonggi-do, South KoreaA bug report is a technical document describing bugs that have occurred in the software. Finding the source code files to resolve a reported bug is a laborious task. To automate this process, information retrieval-based bug localization (IRBL) techniques have been proposed. These techniques assess the relevance between the bug report and source files, providing developers with a ranked list of source files. They rely heavily on text tokens, making it essential to remove noisy tokens from the input tokens. To address the problem of prevalent noisy tokens deteriorating IRBL performance, we define impactful noisy words as misguiding terms and investigate their prevalence and impact. We employed a deep learning model combined with explainable AI techniques to detect misguiding terms, leveraging their semantic embedding capabilities. We conducted extensive experiments on 24 open-source software projects and three IRBL models. By removing misguiding terms, the mean reciprocal rank of bug localization improved by 19%, 17%, and 27% for three models on average and up to 120%. The proposed approach effectively distinguishes between beneficial terms and noise, leading to superior IRBL performance compared to the existing noise detection approaches, with consistent improvements observed across 24 projects.https://ieeexplore.ieee.org/document/10755074/Automated debuggingbug localizationbug reportdeep learningexplainable AIinformation retrieval |
| spellingShingle | Youngkyoung Kim Misoo Kim Eunseok Lee Noisy Token Removal for Bug Localization: The Impact of Semantically Confusing Misguiding Terms IEEE Access Automated debugging bug localization bug report deep learning explainable AI information retrieval |
| title | Noisy Token Removal for Bug Localization: The Impact of Semantically Confusing Misguiding Terms |
| title_full | Noisy Token Removal for Bug Localization: The Impact of Semantically Confusing Misguiding Terms |
| title_fullStr | Noisy Token Removal for Bug Localization: The Impact of Semantically Confusing Misguiding Terms |
| title_full_unstemmed | Noisy Token Removal for Bug Localization: The Impact of Semantically Confusing Misguiding Terms |
| title_short | Noisy Token Removal for Bug Localization: The Impact of Semantically Confusing Misguiding Terms |
| title_sort | noisy token removal for bug localization the impact of semantically confusing misguiding terms |
| topic | Automated debugging bug localization bug report deep learning explainable AI information retrieval |
| url | https://ieeexplore.ieee.org/document/10755074/ |
| work_keys_str_mv | AT youngkyoungkim noisytokenremovalforbuglocalizationtheimpactofsemanticallyconfusingmisguidingterms AT misookim noisytokenremovalforbuglocalizationtheimpactofsemanticallyconfusingmisguidingterms AT eunseoklee noisytokenremovalforbuglocalizationtheimpactofsemanticallyconfusingmisguidingterms |