Joint Tomek Links (JTL): An Innovative Approach to Noise Reduction for Enhanced Classification Performance
Noisy data is a prevalent issue in data mining, significantly impacting the performance of classification algorithms. Mathematical methods are crucial in tackling this obstacle, particularly in optimizing noise detection and data preprocessing. This study proposes a novel approach—Joint T...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11037412/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849714591770083328 |
|---|---|
| author | Goksu Tuysuzoglu Yunus Dogan Elife Ozturk Kiyak Mustafa Ersahin Bita Ghasemkhani Kokten Ulas Birant Derya Birant |
| author_facet | Goksu Tuysuzoglu Yunus Dogan Elife Ozturk Kiyak Mustafa Ersahin Bita Ghasemkhani Kokten Ulas Birant Derya Birant |
| author_sort | Goksu Tuysuzoglu |
| collection | DOAJ |
| description | Noisy data is a prevalent issue in data mining, significantly impacting the performance of classification algorithms. Mathematical methods are crucial in tackling this obstacle, particularly in optimizing noise detection and data preprocessing. This study proposes a novel approach—Joint Tomek Links (JTL)— to identify and eliminate noisy instances by detecting pairs of nearest neighbors from different classes. It first finds the Tomek links and then refines a probabilistic method to determine which instance from a pair will be removed. In our approach, a random tree classifier serves as the base model. We conducted experiments on 40 benchmark datasets spanning various domains, achieving an average classification accuracy of 83.26% for JTL. The results demonstrate that the JTL attains an average improvement of 5.33% in accuracy compared to the original classification with a random tree. Furthermore, JTL surpasses existing techniques, delivering a noteworthy gain in accuracy by 12.30% on the same datasets. These findings underscore the effectiveness of JTL in enhancing data quality and boosting classification performance in data mining tasks. |
| format | Article |
| id | doaj-art-bd25832ac85449e38335cee2b7a1b062 |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-bd25832ac85449e38335cee2b7a1b0622025-08-20T03:13:39ZengIEEEIEEE Access2169-35362025-01-011312305912308210.1109/ACCESS.2025.358029011037412Joint Tomek Links (JTL): An Innovative Approach to Noise Reduction for Enhanced Classification PerformanceGoksu Tuysuzoglu0https://orcid.org/0000-0002-2926-4267Yunus Dogan1https://orcid.org/0000-0002-0353-5014Elife Ozturk Kiyak2Mustafa Ersahin3https://orcid.org/0000-0003-4318-8288Bita Ghasemkhani4https://orcid.org/0000-0002-0394-8847Kokten Ulas Birant5https://orcid.org/0000-0002-5107-6406Derya Birant6https://orcid.org/0000-0003-3138-0432Department of Computer Engineering, Dokuz Eylul University, İzmir, TürkiyeDepartment of Computer Engineering, Dokuz Eylul University, İzmir, TürkiyeIndependent Researcher, İzmir, TürkiyeResearch and Development Department, Commencis Teknoloji, Istanbul, TürkiyeGraduate School of Natural and Applied Sciences, Dokuz Eylul University, İzmir, TürkiyeDepartment of Computer Engineering, Dokuz Eylul University, İzmir, TürkiyeDepartment of Computer Engineering, Dokuz Eylul University, İzmir, TürkiyeNoisy data is a prevalent issue in data mining, significantly impacting the performance of classification algorithms. Mathematical methods are crucial in tackling this obstacle, particularly in optimizing noise detection and data preprocessing. This study proposes a novel approach—Joint Tomek Links (JTL)— to identify and eliminate noisy instances by detecting pairs of nearest neighbors from different classes. It first finds the Tomek links and then refines a probabilistic method to determine which instance from a pair will be removed. In our approach, a random tree classifier serves as the base model. We conducted experiments on 40 benchmark datasets spanning various domains, achieving an average classification accuracy of 83.26% for JTL. The results demonstrate that the JTL attains an average improvement of 5.33% in accuracy compared to the original classification with a random tree. Furthermore, JTL surpasses existing techniques, delivering a noteworthy gain in accuracy by 12.30% on the same datasets. These findings underscore the effectiveness of JTL in enhancing data quality and boosting classification performance in data mining tasks.https://ieeexplore.ieee.org/document/11037412/Artificial intelligenceclassificationdata miningmachine learningnoise reductionTomek links |
| spellingShingle | Goksu Tuysuzoglu Yunus Dogan Elife Ozturk Kiyak Mustafa Ersahin Bita Ghasemkhani Kokten Ulas Birant Derya Birant Joint Tomek Links (JTL): An Innovative Approach to Noise Reduction for Enhanced Classification Performance IEEE Access Artificial intelligence classification data mining machine learning noise reduction Tomek links |
| title | Joint Tomek Links (JTL): An Innovative Approach to Noise Reduction for Enhanced Classification Performance |
| title_full | Joint Tomek Links (JTL): An Innovative Approach to Noise Reduction for Enhanced Classification Performance |
| title_fullStr | Joint Tomek Links (JTL): An Innovative Approach to Noise Reduction for Enhanced Classification Performance |
| title_full_unstemmed | Joint Tomek Links (JTL): An Innovative Approach to Noise Reduction for Enhanced Classification Performance |
| title_short | Joint Tomek Links (JTL): An Innovative Approach to Noise Reduction for Enhanced Classification Performance |
| title_sort | joint tomek links jtl an innovative approach to noise reduction for enhanced classification performance |
| topic | Artificial intelligence classification data mining machine learning noise reduction Tomek links |
| url | https://ieeexplore.ieee.org/document/11037412/ |
| work_keys_str_mv | AT goksutuysuzoglu jointtomeklinksjtlaninnovativeapproachtonoisereductionforenhancedclassificationperformance AT yunusdogan jointtomeklinksjtlaninnovativeapproachtonoisereductionforenhancedclassificationperformance AT elifeozturkkiyak jointtomeklinksjtlaninnovativeapproachtonoisereductionforenhancedclassificationperformance AT mustafaersahin jointtomeklinksjtlaninnovativeapproachtonoisereductionforenhancedclassificationperformance AT bitaghasemkhani jointtomeklinksjtlaninnovativeapproachtonoisereductionforenhancedclassificationperformance AT koktenulasbirant jointtomeklinksjtlaninnovativeapproachtonoisereductionforenhancedclassificationperformance AT deryabirant jointtomeklinksjtlaninnovativeapproachtonoisereductionforenhancedclassificationperformance |