Telescope indexing for k-nearest neighbor search algorithms over high dimensional data & large data sets
Abstract When k-Nearest-Neighbors ( $$k$$ -NN) was conceived more than 70 years ago, computation, as we use it now, would be hardly recognizable. Since then, technology has improved by orders of magnitude, including unprecedented connectivity. However, $$k$$ -NN has remained virtually unchanged, exp...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-09856-5 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849235512026464256 |
|---|---|
| author | Madhavan K R Hasan Kurban Oguzhan M. Kulekci Mehmet M. Dalkilic |
| author_facet | Madhavan K R Hasan Kurban Oguzhan M. Kulekci Mehmet M. Dalkilic |
| author_sort | Madhavan K R |
| collection | DOAJ |
| description | Abstract When k-Nearest-Neighbors ( $$k$$ -NN) was conceived more than 70 years ago, computation, as we use it now, would be hardly recognizable. Since then, technology has improved by orders of magnitude, including unprecedented connectivity. However, $$k$$ -NN has remained virtually unchanged, exposing its shortcomings for today’s needs: becoming overwhelmed when presented with large, high-dimensional data. Although space partitioning data structures, especially k-d trees and ball-trees, have improved performance in larger data, they remain inadequate when data is also high-dimensional. Experiments confirm that space partitioning becomes ineffective in high-dimensional data because most of the search space is explored needlessly. Our strategy is to partition the data into small groups of points similarly distanced from a reference point in a B+ tree data structure and use this data structure to limit the search space of a $$k$$ -NN query. Further, we establish that the limited search space chosen by the B+ tree structure can be effectively explored by any indexing techniques applicable to the entire data. We then present our algorithm $$k$$ -NN with partitioning (ti $$k$$ -NN), including computational analysis and experiments. Our detailed evaluation demonstrates significant speedup achieved by ti $$k$$ -NN over the naive, $$k$$ -d tree, $$ball$$ -tree based $$k$$ -NN and other state-of-the-art approximate $$k$$ -NN search approaches in high dimensional data. |
| format | Article |
| id | doaj-art-0b8d2c78edaf41c9b75a35abcc2b59ab |
| institution | Kabale University |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-0b8d2c78edaf41c9b75a35abcc2b59ab2025-08-20T04:02:45ZengNature PortfolioScientific Reports2045-23222025-07-0115111610.1038/s41598-025-09856-5Telescope indexing for k-nearest neighbor search algorithms over high dimensional data & large data setsMadhavan K R0Hasan Kurban1Oguzhan M. Kulekci2Mehmet M. Dalkilic3Computer Science Department, Indiana UniversityCollege of Science and Engineering, Hamad Bin Khalifa UniversityDepartment of Computer Science and Software Engineering, Miami UniversityComputer Science Department, Indiana UniversityAbstract When k-Nearest-Neighbors ( $$k$$ -NN) was conceived more than 70 years ago, computation, as we use it now, would be hardly recognizable. Since then, technology has improved by orders of magnitude, including unprecedented connectivity. However, $$k$$ -NN has remained virtually unchanged, exposing its shortcomings for today’s needs: becoming overwhelmed when presented with large, high-dimensional data. Although space partitioning data structures, especially k-d trees and ball-trees, have improved performance in larger data, they remain inadequate when data is also high-dimensional. Experiments confirm that space partitioning becomes ineffective in high-dimensional data because most of the search space is explored needlessly. Our strategy is to partition the data into small groups of points similarly distanced from a reference point in a B+ tree data structure and use this data structure to limit the search space of a $$k$$ -NN query. Further, we establish that the limited search space chosen by the B+ tree structure can be effectively explored by any indexing techniques applicable to the entire data. We then present our algorithm $$k$$ -NN with partitioning (ti $$k$$ -NN), including computational analysis and experiments. Our detailed evaluation demonstrates significant speedup achieved by ti $$k$$ -NN over the naive, $$k$$ -d tree, $$ball$$ -tree based $$k$$ -NN and other state-of-the-art approximate $$k$$ -NN search approaches in high dimensional data.https://doi.org/10.1038/s41598-025-09856-5Data MiningNearest NeighborsB+ TreesTelescope IndexingHigh Dimensional Data |
| spellingShingle | Madhavan K R Hasan Kurban Oguzhan M. Kulekci Mehmet M. Dalkilic Telescope indexing for k-nearest neighbor search algorithms over high dimensional data & large data sets Scientific Reports Data Mining Nearest Neighbors B+ Trees Telescope Indexing High Dimensional Data |
| title | Telescope indexing for k-nearest neighbor search algorithms over high dimensional data & large data sets |
| title_full | Telescope indexing for k-nearest neighbor search algorithms over high dimensional data & large data sets |
| title_fullStr | Telescope indexing for k-nearest neighbor search algorithms over high dimensional data & large data sets |
| title_full_unstemmed | Telescope indexing for k-nearest neighbor search algorithms over high dimensional data & large data sets |
| title_short | Telescope indexing for k-nearest neighbor search algorithms over high dimensional data & large data sets |
| title_sort | telescope indexing for k nearest neighbor search algorithms over high dimensional data large data sets |
| topic | Data Mining Nearest Neighbors B+ Trees Telescope Indexing High Dimensional Data |
| url | https://doi.org/10.1038/s41598-025-09856-5 |
| work_keys_str_mv | AT madhavankr telescopeindexingforknearestneighborsearchalgorithmsoverhighdimensionaldatalargedatasets AT hasankurban telescopeindexingforknearestneighborsearchalgorithmsoverhighdimensionaldatalargedatasets AT oguzhanmkulekci telescopeindexingforknearestneighborsearchalgorithmsoverhighdimensionaldatalargedatasets AT mehmetmdalkilic telescopeindexingforknearestneighborsearchalgorithmsoverhighdimensionaldatalargedatasets |