A Distribution Agnostic Rank-Based Measure for Proximity Search

Proximity search is extensively used in modern machine learning algorithms across various applications. Proximity search aims at finding data points which are close to the data point of interest. Extant algorithms depend on distance-based metrics to find the closest data points. However, these metri...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mayur Garg, Ashutosh Nayak, Rajasekhara Reddy Duvvuru Muni
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Rank based methods nearest neighbor outlier detection unsupervised learning proximity search text similarity
Online Access:	https://ieeexplore.ieee.org/document/10815932/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832590338791833600
author	Mayur Garg Ashutosh Nayak Rajasekhara Reddy Duvvuru Muni
author_facet	Mayur Garg Ashutosh Nayak Rajasekhara Reddy Duvvuru Muni
author_sort	Mayur Garg
collection	DOAJ
description	Proximity search is extensively used in modern machine learning algorithms across various applications. Proximity search aims at finding data points which are close to the data point of interest. Extant algorithms depend on distance-based metrics to find the closest data points. However, these metrics are limited by their dependency on the distribution of data along different dimensions, making them sensitive to scaling and translation. The performance also suffers as the number of dimensions increase. Furthermore, proximity estimation between any two data points in extant metrics does not factor in the relative position of the rest of the data. In this paper, we aim to provide an alternative to these metrics by proposing Rank Adjacency Measure (RAM) which is agnostic to the distribution of the data. RAM estimates the probability of proximity between points by extending the concept of ordering in one dimension. We provide a detailed mathematical construction of RAM. We illustrate the effectiveness of the proposed methodology using five datasets in three application areas - Outlier Detection, Nearest Neighbor Search, and Text Similarity. While our proposed methodology outperforms existing algorithms in outlier detection by 50%, it performs at par with existing metrics for other two applications. We conclude the paper with discussion on its limitations and research directions for improving RAM.
format	Article
id	doaj-art-8089ca079d004f98a6d262cef02bb436
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-8089ca079d004f98a6d262cef02bb4362025-01-24T00:01:51ZengIEEEIEEE Access2169-35362025-01-0113121031211210.1109/ACCESS.2024.352266910815932A Distribution Agnostic Rank-Based Measure for Proximity SearchMayur Garg0https://orcid.org/0009-0006-3126-2777Ashutosh Nayak1https://orcid.org/0000-0002-6688-4780Rajasekhara Reddy Duvvuru Muni2https://orcid.org/0000-0001-6838-0482United Airlines, Gurugram, IndiaSamsung Research Institute Bangalore, Bengaluru, IndiaSamsung Research Institute Bangalore, Bengaluru, IndiaProximity search is extensively used in modern machine learning algorithms across various applications. Proximity search aims at finding data points which are close to the data point of interest. Extant algorithms depend on distance-based metrics to find the closest data points. However, these metrics are limited by their dependency on the distribution of data along different dimensions, making them sensitive to scaling and translation. The performance also suffers as the number of dimensions increase. Furthermore, proximity estimation between any two data points in extant metrics does not factor in the relative position of the rest of the data. In this paper, we aim to provide an alternative to these metrics by proposing Rank Adjacency Measure (RAM) which is agnostic to the distribution of the data. RAM estimates the probability of proximity between points by extending the concept of ordering in one dimension. We provide a detailed mathematical construction of RAM. We illustrate the effectiveness of the proposed methodology using five datasets in three application areas - Outlier Detection, Nearest Neighbor Search, and Text Similarity. While our proposed methodology outperforms existing algorithms in outlier detection by 50%, it performs at par with existing metrics for other two applications. We conclude the paper with discussion on its limitations and research directions for improving RAM.https://ieeexplore.ieee.org/document/10815932/Rank based methodsnearest neighboroutlier detectionunsupervised learningproximity searchtext similarity
spellingShingle	Mayur Garg Ashutosh Nayak Rajasekhara Reddy Duvvuru Muni A Distribution Agnostic Rank-Based Measure for Proximity Search IEEE Access Rank based methods nearest neighbor outlier detection unsupervised learning proximity search text similarity
title	A Distribution Agnostic Rank-Based Measure for Proximity Search
title_full	A Distribution Agnostic Rank-Based Measure for Proximity Search
title_fullStr	A Distribution Agnostic Rank-Based Measure for Proximity Search
title_full_unstemmed	A Distribution Agnostic Rank-Based Measure for Proximity Search
title_short	A Distribution Agnostic Rank-Based Measure for Proximity Search
title_sort	distribution agnostic rank based measure for proximity search
topic	Rank based methods nearest neighbor outlier detection unsupervised learning proximity search text similarity
url	https://ieeexplore.ieee.org/document/10815932/
work_keys_str_mv	AT mayurgarg adistributionagnosticrankbasedmeasureforproximitysearch AT ashutoshnayak adistributionagnosticrankbasedmeasureforproximitysearch AT rajasekharareddyduvvurumuni adistributionagnosticrankbasedmeasureforproximitysearch AT mayurgarg distributionagnosticrankbasedmeasureforproximitysearch AT ashutoshnayak distributionagnosticrankbasedmeasureforproximitysearch AT rajasekharareddyduvvurumuni distributionagnosticrankbasedmeasureforproximitysearch

A Distribution Agnostic Rank-Based Measure for Proximity Search

Similar Items