Telescope indexing for k-nearest neighbor search algorithms over high dimensional data & large data sets

Abstract When k-Nearest-Neighbors ( $$k$$ -NN) was conceived more than 70 years ago, computation, as we use it now, would be hardly recognizable. Since then, technology has improved by orders of magnitude, including unprecedented connectivity. However, $$k$$ -NN has remained virtually unchanged, exp...

Full description

Saved in:
Bibliographic Details
Main Authors: Madhavan K R, Hasan Kurban, Oguzhan M. Kulekci, Mehmet M. Dalkilic
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-09856-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849235512026464256
author Madhavan K R
Hasan Kurban
Oguzhan M. Kulekci
Mehmet M. Dalkilic
author_facet Madhavan K R
Hasan Kurban
Oguzhan M. Kulekci
Mehmet M. Dalkilic
author_sort Madhavan K R
collection DOAJ
description Abstract When k-Nearest-Neighbors ( $$k$$ -NN) was conceived more than 70 years ago, computation, as we use it now, would be hardly recognizable. Since then, technology has improved by orders of magnitude, including unprecedented connectivity. However, $$k$$ -NN has remained virtually unchanged, exposing its shortcomings for today’s needs: becoming overwhelmed when presented with large, high-dimensional data. Although space partitioning data structures, especially k-d trees and ball-trees, have improved performance in larger data, they remain inadequate when data is also high-dimensional. Experiments confirm that space partitioning becomes ineffective in high-dimensional data because most of the search space is explored needlessly. Our strategy is to partition the data into small groups of points similarly distanced from a reference point in a B+ tree data structure and use this data structure to limit the search space of a $$k$$ -NN query. Further, we establish that the limited search space chosen by the B+ tree structure can be effectively explored by any indexing techniques applicable to the entire data. We then present our algorithm $$k$$ -NN with partitioning (ti $$k$$ -NN), including computational analysis and experiments. Our detailed evaluation demonstrates significant speedup achieved by ti $$k$$ -NN over the naive, $$k$$ -d tree, $$ball$$ -tree based $$k$$ -NN and other state-of-the-art approximate $$k$$ -NN search approaches in high dimensional data.
format Article
id doaj-art-0b8d2c78edaf41c9b75a35abcc2b59ab
institution Kabale University
issn 2045-2322
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-0b8d2c78edaf41c9b75a35abcc2b59ab2025-08-20T04:02:45ZengNature PortfolioScientific Reports2045-23222025-07-0115111610.1038/s41598-025-09856-5Telescope indexing for k-nearest neighbor search algorithms over high dimensional data & large data setsMadhavan K R0Hasan Kurban1Oguzhan M. Kulekci2Mehmet M. Dalkilic3Computer Science Department, Indiana UniversityCollege of Science and Engineering, Hamad Bin Khalifa UniversityDepartment of Computer Science and Software Engineering, Miami UniversityComputer Science Department, Indiana UniversityAbstract When k-Nearest-Neighbors ( $$k$$ -NN) was conceived more than 70 years ago, computation, as we use it now, would be hardly recognizable. Since then, technology has improved by orders of magnitude, including unprecedented connectivity. However, $$k$$ -NN has remained virtually unchanged, exposing its shortcomings for today’s needs: becoming overwhelmed when presented with large, high-dimensional data. Although space partitioning data structures, especially k-d trees and ball-trees, have improved performance in larger data, they remain inadequate when data is also high-dimensional. Experiments confirm that space partitioning becomes ineffective in high-dimensional data because most of the search space is explored needlessly. Our strategy is to partition the data into small groups of points similarly distanced from a reference point in a B+ tree data structure and use this data structure to limit the search space of a $$k$$ -NN query. Further, we establish that the limited search space chosen by the B+ tree structure can be effectively explored by any indexing techniques applicable to the entire data. We then present our algorithm $$k$$ -NN with partitioning (ti $$k$$ -NN), including computational analysis and experiments. Our detailed evaluation demonstrates significant speedup achieved by ti $$k$$ -NN over the naive, $$k$$ -d tree, $$ball$$ -tree based $$k$$ -NN and other state-of-the-art approximate $$k$$ -NN search approaches in high dimensional data.https://doi.org/10.1038/s41598-025-09856-5Data MiningNearest NeighborsB+ TreesTelescope IndexingHigh Dimensional Data
spellingShingle Madhavan K R
Hasan Kurban
Oguzhan M. Kulekci
Mehmet M. Dalkilic
Telescope indexing for k-nearest neighbor search algorithms over high dimensional data & large data sets
Scientific Reports
Data Mining
Nearest Neighbors
B+ Trees
Telescope Indexing
High Dimensional Data
title Telescope indexing for k-nearest neighbor search algorithms over high dimensional data & large data sets
title_full Telescope indexing for k-nearest neighbor search algorithms over high dimensional data & large data sets
title_fullStr Telescope indexing for k-nearest neighbor search algorithms over high dimensional data & large data sets
title_full_unstemmed Telescope indexing for k-nearest neighbor search algorithms over high dimensional data & large data sets
title_short Telescope indexing for k-nearest neighbor search algorithms over high dimensional data & large data sets
title_sort telescope indexing for k nearest neighbor search algorithms over high dimensional data large data sets
topic Data Mining
Nearest Neighbors
B+ Trees
Telescope Indexing
High Dimensional Data
url https://doi.org/10.1038/s41598-025-09856-5
work_keys_str_mv AT madhavankr telescopeindexingforknearestneighborsearchalgorithmsoverhighdimensionaldatalargedatasets
AT hasankurban telescopeindexingforknearestneighborsearchalgorithmsoverhighdimensionaldatalargedatasets
AT oguzhanmkulekci telescopeindexingforknearestneighborsearchalgorithmsoverhighdimensionaldatalargedatasets
AT mehmetmdalkilic telescopeindexingforknearestneighborsearchalgorithmsoverhighdimensionaldatalargedatasets