Joint Tomek Links (JTL): An Innovative Approach to Noise Reduction for Enhanced Classification Performance

Noisy data is a prevalent issue in data mining, significantly impacting the performance of classification algorithms. Mathematical methods are crucial in tackling this obstacle, particularly in optimizing noise detection and data preprocessing. This study proposes a novel approach—Joint T...

Full description

Saved in:
Bibliographic Details
Main Authors: Goksu Tuysuzoglu, Yunus Dogan, Elife Ozturk Kiyak, Mustafa Ersahin, Bita Ghasemkhani, Kokten Ulas Birant, Derya Birant
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11037412/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849714591770083328
author Goksu Tuysuzoglu
Yunus Dogan
Elife Ozturk Kiyak
Mustafa Ersahin
Bita Ghasemkhani
Kokten Ulas Birant
Derya Birant
author_facet Goksu Tuysuzoglu
Yunus Dogan
Elife Ozturk Kiyak
Mustafa Ersahin
Bita Ghasemkhani
Kokten Ulas Birant
Derya Birant
author_sort Goksu Tuysuzoglu
collection DOAJ
description Noisy data is a prevalent issue in data mining, significantly impacting the performance of classification algorithms. Mathematical methods are crucial in tackling this obstacle, particularly in optimizing noise detection and data preprocessing. This study proposes a novel approach—Joint Tomek Links (JTL)— to identify and eliminate noisy instances by detecting pairs of nearest neighbors from different classes. It first finds the Tomek links and then refines a probabilistic method to determine which instance from a pair will be removed. In our approach, a random tree classifier serves as the base model. We conducted experiments on 40 benchmark datasets spanning various domains, achieving an average classification accuracy of 83.26% for JTL. The results demonstrate that the JTL attains an average improvement of 5.33% in accuracy compared to the original classification with a random tree. Furthermore, JTL surpasses existing techniques, delivering a noteworthy gain in accuracy by 12.30% on the same datasets. These findings underscore the effectiveness of JTL in enhancing data quality and boosting classification performance in data mining tasks.
format Article
id doaj-art-bd25832ac85449e38335cee2b7a1b062
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-bd25832ac85449e38335cee2b7a1b0622025-08-20T03:13:39ZengIEEEIEEE Access2169-35362025-01-011312305912308210.1109/ACCESS.2025.358029011037412Joint Tomek Links (JTL): An Innovative Approach to Noise Reduction for Enhanced Classification PerformanceGoksu Tuysuzoglu0https://orcid.org/0000-0002-2926-4267Yunus Dogan1https://orcid.org/0000-0002-0353-5014Elife Ozturk Kiyak2Mustafa Ersahin3https://orcid.org/0000-0003-4318-8288Bita Ghasemkhani4https://orcid.org/0000-0002-0394-8847Kokten Ulas Birant5https://orcid.org/0000-0002-5107-6406Derya Birant6https://orcid.org/0000-0003-3138-0432Department of Computer Engineering, Dokuz Eylul University, İzmir, TürkiyeDepartment of Computer Engineering, Dokuz Eylul University, İzmir, TürkiyeIndependent Researcher, İzmir, TürkiyeResearch and Development Department, Commencis Teknoloji, Istanbul, TürkiyeGraduate School of Natural and Applied Sciences, Dokuz Eylul University, İzmir, TürkiyeDepartment of Computer Engineering, Dokuz Eylul University, İzmir, TürkiyeDepartment of Computer Engineering, Dokuz Eylul University, İzmir, TürkiyeNoisy data is a prevalent issue in data mining, significantly impacting the performance of classification algorithms. Mathematical methods are crucial in tackling this obstacle, particularly in optimizing noise detection and data preprocessing. This study proposes a novel approach—Joint Tomek Links (JTL)— to identify and eliminate noisy instances by detecting pairs of nearest neighbors from different classes. It first finds the Tomek links and then refines a probabilistic method to determine which instance from a pair will be removed. In our approach, a random tree classifier serves as the base model. We conducted experiments on 40 benchmark datasets spanning various domains, achieving an average classification accuracy of 83.26% for JTL. The results demonstrate that the JTL attains an average improvement of 5.33% in accuracy compared to the original classification with a random tree. Furthermore, JTL surpasses existing techniques, delivering a noteworthy gain in accuracy by 12.30% on the same datasets. These findings underscore the effectiveness of JTL in enhancing data quality and boosting classification performance in data mining tasks.https://ieeexplore.ieee.org/document/11037412/Artificial intelligenceclassificationdata miningmachine learningnoise reductionTomek links
spellingShingle Goksu Tuysuzoglu
Yunus Dogan
Elife Ozturk Kiyak
Mustafa Ersahin
Bita Ghasemkhani
Kokten Ulas Birant
Derya Birant
Joint Tomek Links (JTL): An Innovative Approach to Noise Reduction for Enhanced Classification Performance
IEEE Access
Artificial intelligence
classification
data mining
machine learning
noise reduction
Tomek links
title Joint Tomek Links (JTL): An Innovative Approach to Noise Reduction for Enhanced Classification Performance
title_full Joint Tomek Links (JTL): An Innovative Approach to Noise Reduction for Enhanced Classification Performance
title_fullStr Joint Tomek Links (JTL): An Innovative Approach to Noise Reduction for Enhanced Classification Performance
title_full_unstemmed Joint Tomek Links (JTL): An Innovative Approach to Noise Reduction for Enhanced Classification Performance
title_short Joint Tomek Links (JTL): An Innovative Approach to Noise Reduction for Enhanced Classification Performance
title_sort joint tomek links jtl an innovative approach to noise reduction for enhanced classification performance
topic Artificial intelligence
classification
data mining
machine learning
noise reduction
Tomek links
url https://ieeexplore.ieee.org/document/11037412/
work_keys_str_mv AT goksutuysuzoglu jointtomeklinksjtlaninnovativeapproachtonoisereductionforenhancedclassificationperformance
AT yunusdogan jointtomeklinksjtlaninnovativeapproachtonoisereductionforenhancedclassificationperformance
AT elifeozturkkiyak jointtomeklinksjtlaninnovativeapproachtonoisereductionforenhancedclassificationperformance
AT mustafaersahin jointtomeklinksjtlaninnovativeapproachtonoisereductionforenhancedclassificationperformance
AT bitaghasemkhani jointtomeklinksjtlaninnovativeapproachtonoisereductionforenhancedclassificationperformance
AT koktenulasbirant jointtomeklinksjtlaninnovativeapproachtonoisereductionforenhancedclassificationperformance
AT deryabirant jointtomeklinksjtlaninnovativeapproachtonoisereductionforenhancedclassificationperformance