Accelerating the k-Means++ Algorithm by Using Geometric Information

Clustering is a fundamental task in data analysis with applications across a wide range of fields, such as computer vision, pattern recognition, and data mining. Real-world use cases include social network analysis, medical imaging, market segmentation, and anomaly detection, to name a few. In this...

Full description

Saved in:
Bibliographic Details
Main Authors: Guillem Rodriguez Corominas, Maria J. Blesa, Christian Blum
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10966875/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849239075885678592
author Guillem Rodriguez Corominas
Maria J. Blesa
Christian Blum
author_facet Guillem Rodriguez Corominas
Maria J. Blesa
Christian Blum
author_sort Guillem Rodriguez Corominas
collection DOAJ
description Clustering is a fundamental task in data analysis with applications across a wide range of fields, such as computer vision, pattern recognition, and data mining. Real-world use cases include social network analysis, medical imaging, market segmentation, and anomaly detection, to name a few. In this paper, we propose an acceleration of the exact k-means++ algorithm using geometric information, specifically the Triangle Inequality and additional norm filters, along with a two-step sampling procedure. Our experiments demonstrate that the accelerated version outperforms the standard k-means++ version in terms of the number of visited points and distance calculations, achieving greater speedup as the number of clusters increases. The version utilizing the Triangle Inequality is particularly effective for low-dimensional data, while the additional norm-based filter enhances performance in high-dimensional instances with greater norm variance among points. Additional experiments show the behavior of our algorithms when executed concurrently across multiple jobs and examine how memory performance impacts practical speedup.
format Article
id doaj-art-112872ae03da4fb59a0aa0c8470bed7e
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-112872ae03da4fb59a0aa0c8470bed7e2025-08-20T04:01:15ZengIEEEIEEE Access2169-35362025-01-0113676936771710.1109/ACCESS.2025.356129310966875Accelerating the k-Means++ Algorithm by Using Geometric InformationGuillem Rodriguez Corominas0https://orcid.org/0000-0002-3863-2017Maria J. Blesa1Christian Blum2https://orcid.org/0000-0002-1736-3559Department of Computer Science, Universitat Politècnica de Catalunya (UPC), Barcelona, SpainDepartment of Computer Science, Universitat Politècnica de Catalunya (UPC), Barcelona, SpainArtificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, SpainClustering is a fundamental task in data analysis with applications across a wide range of fields, such as computer vision, pattern recognition, and data mining. Real-world use cases include social network analysis, medical imaging, market segmentation, and anomaly detection, to name a few. In this paper, we propose an acceleration of the exact k-means++ algorithm using geometric information, specifically the Triangle Inequality and additional norm filters, along with a two-step sampling procedure. Our experiments demonstrate that the accelerated version outperforms the standard k-means++ version in terms of the number of visited points and distance calculations, achieving greater speedup as the number of clusters increases. The version utilizing the Triangle Inequality is particularly effective for low-dimensional data, while the additional norm-based filter enhances performance in high-dimensional instances with greater norm variance among points. Additional experiments show the behavior of our algorithms when executed concurrently across multiple jobs and examine how memory performance impacts practical speedup.https://ieeexplore.ieee.org/document/10966875/ClusteringD² samplingk-means++normtriangle inequality
spellingShingle Guillem Rodriguez Corominas
Maria J. Blesa
Christian Blum
Accelerating the k-Means++ Algorithm by Using Geometric Information
IEEE Access
Clustering
D² sampling
k-means++
norm
triangle inequality
title Accelerating the k-Means++ Algorithm by Using Geometric Information
title_full Accelerating the k-Means++ Algorithm by Using Geometric Information
title_fullStr Accelerating the k-Means++ Algorithm by Using Geometric Information
title_full_unstemmed Accelerating the k-Means++ Algorithm by Using Geometric Information
title_short Accelerating the k-Means++ Algorithm by Using Geometric Information
title_sort accelerating the k means algorithm by using geometric information
topic Clustering
D² sampling
k-means++
norm
triangle inequality
url https://ieeexplore.ieee.org/document/10966875/
work_keys_str_mv AT guillemrodriguezcorominas acceleratingthekmeansalgorithmbyusinggeometricinformation
AT mariajblesa acceleratingthekmeansalgorithmbyusinggeometricinformation
AT christianblum acceleratingthekmeansalgorithmbyusinggeometricinformation