Effective k-nearest neighbor models for data classification enhancement
Abstract Imbalanced class distributions and class overlaps are major problems that are often associated with the classifier’s decreased performance. To lessen their impact, the literature continues to introduce methods, including kNN extensions. Nevertheless, as far as we know, the kNN’s performance...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
SpringerOpen
2025-04-01
|
| Series: | Journal of Big Data |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s40537-025-01137-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Imbalanced class distributions and class overlaps are major problems that are often associated with the classifier’s decreased performance. To lessen their impact, the literature continues to introduce methods, including kNN extensions. Nevertheless, as far as we know, the kNN’s performance has not yet attained satisfactory levels. Therefore, our study presents kNN models aimed at considerably alleviating these problems’ negative effects. Compared to previous research that frequently considered one problem, each of our models considers more than one problem. Firstly, we propose a straightforward yet effective technique to simply identify irrelevant “noisy” points, the proximal ratio (PR), using which these overlapped points have the minimum possible contribution regarding the classification decision. PR explicitly identifies points with low PR scores as outliers and gives a direct indication of how consistent a point is with its neighbors. It flags points as outliers when they are surrounded by neighbors from multiple classes. Further, PR evaluates the local consistency of a point with its neighbors, considering the distribution of classes within the local neighborhood. Secondly, a weighting computation is introduced to reduce the imbalanced class impact. Finally, PR and weighting equations are integrated with the kNN algorithm to develop PRkNN models, whose performance outperforms their rival kNNs and competes fiercely with popular machine learning models. Extensive evaluation analysis is made across fifty-two datasets in six experimental phases using six evaluation metrics. The results, supported by statistical tests, demonstrate that, on average, our proposed models are extremely competitive without being appreciably slower than their competitors. |
|---|---|
| ISSN: | 2196-1115 |