Using Proximity Graph Cut for Fast and Robust Instance-Based Classification in Large Datasets

K-nearest neighbours (kNN) is a very popular instance-based classifier due to its simplicity and good empirical performance. However, large-scale datasets are a big problem for building fast and compact neighbourhood-based classifiers. This work presents the design and implementation of a classifica...

Full description

Saved in:
Bibliographic Details
Main Authors: Stanislav Protasov, Adil Mehmood Khan
Format: Article
Language:English
Published: Wiley 2021-01-01
Series:Complexity
Online Access:http://dx.doi.org/10.1155/2021/2011738
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849306940465741824
author Stanislav Protasov
Adil Mehmood Khan
author_facet Stanislav Protasov
Adil Mehmood Khan
author_sort Stanislav Protasov
collection DOAJ
description K-nearest neighbours (kNN) is a very popular instance-based classifier due to its simplicity and good empirical performance. However, large-scale datasets are a big problem for building fast and compact neighbourhood-based classifiers. This work presents the design and implementation of a classification algorithm with index data structures, which would allow us to build fast and scalable solutions for large multidimensional datasets. We propose a novel approach that uses navigable small-world (NSW) proximity graph representation of large-scale datasets. Our approach shows 2–4 times classification speedup for both average and 99th percentile time with asymptotically close classification accuracy compared to the 1-NN method. We observe two orders of magnitude better classification time in cases when method uses swap memory. We show that NSW graph used in our method outperforms other proximity graphs in classification accuracy. Our results suggest that the algorithm can be used in large-scale applications for fast and robust classification, especially when the search index is already constructed for the data.
format Article
id doaj-art-2b877f8528d84394ba3f6fd5d2776363
institution Kabale University
issn 1099-0526
language English
publishDate 2021-01-01
publisher Wiley
record_format Article
series Complexity
spelling doaj-art-2b877f8528d84394ba3f6fd5d27763632025-08-20T03:54:56ZengWileyComplexity1099-05262021-01-01202110.1155/2021/2011738Using Proximity Graph Cut for Fast and Robust Instance-Based Classification in Large DatasetsStanislav Protasov0Adil Mehmood Khan1Machine Learning and Knowledge Representation LabMachine Learning and Knowledge Representation LabK-nearest neighbours (kNN) is a very popular instance-based classifier due to its simplicity and good empirical performance. However, large-scale datasets are a big problem for building fast and compact neighbourhood-based classifiers. This work presents the design and implementation of a classification algorithm with index data structures, which would allow us to build fast and scalable solutions for large multidimensional datasets. We propose a novel approach that uses navigable small-world (NSW) proximity graph representation of large-scale datasets. Our approach shows 2–4 times classification speedup for both average and 99th percentile time with asymptotically close classification accuracy compared to the 1-NN method. We observe two orders of magnitude better classification time in cases when method uses swap memory. We show that NSW graph used in our method outperforms other proximity graphs in classification accuracy. Our results suggest that the algorithm can be used in large-scale applications for fast and robust classification, especially when the search index is already constructed for the data.http://dx.doi.org/10.1155/2021/2011738
spellingShingle Stanislav Protasov
Adil Mehmood Khan
Using Proximity Graph Cut for Fast and Robust Instance-Based Classification in Large Datasets
Complexity
title Using Proximity Graph Cut for Fast and Robust Instance-Based Classification in Large Datasets
title_full Using Proximity Graph Cut for Fast and Robust Instance-Based Classification in Large Datasets
title_fullStr Using Proximity Graph Cut for Fast and Robust Instance-Based Classification in Large Datasets
title_full_unstemmed Using Proximity Graph Cut for Fast and Robust Instance-Based Classification in Large Datasets
title_short Using Proximity Graph Cut for Fast and Robust Instance-Based Classification in Large Datasets
title_sort using proximity graph cut for fast and robust instance based classification in large datasets
url http://dx.doi.org/10.1155/2021/2011738
work_keys_str_mv AT stanislavprotasov usingproximitygraphcutforfastandrobustinstancebasedclassificationinlargedatasets
AT adilmehmoodkhan usingproximitygraphcutforfastandrobustinstancebasedclassificationinlargedatasets