Vectorized Highly Parallel Density-Based Clustering for Applications With Noise

Clustering in data mining involves grouping similar objects into categories based on their characteristics. As the volume of data continues to grow and advancements in high-performance computing evolve, a critical need has emerged for algorithms that can efficiently process these computations and ex...

Full description

Saved in:
Bibliographic Details
Main Authors: Joseph Arnold Xavier, Juan Pedro Gutierrez Hermosillo Muriedas, Stepan Nassyr, Rocco Sedona, Markus Gotz, Achim Streit, Morris Riedel, Gabriele Cavallaro
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10769413/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850107378840633344
author Joseph Arnold Xavier
Juan Pedro Gutierrez Hermosillo Muriedas
Stepan Nassyr
Rocco Sedona
Markus Gotz
Achim Streit
Morris Riedel
Gabriele Cavallaro
author_facet Joseph Arnold Xavier
Juan Pedro Gutierrez Hermosillo Muriedas
Stepan Nassyr
Rocco Sedona
Markus Gotz
Achim Streit
Morris Riedel
Gabriele Cavallaro
author_sort Joseph Arnold Xavier
collection DOAJ
description Clustering in data mining involves grouping similar objects into categories based on their characteristics. As the volume of data continues to grow and advancements in high-performance computing evolve, a critical need has emerged for algorithms that can efficiently process these computations and exploit the various levels of parallelism offered by modern supercomputing systems. Exploiting Single Instruction Multiple Data (SIMD) instructions enhances parallelism at the instruction level and minimizes data movement within the memory hierarchy. To fully harness a processor’s SIMD capabilities and achieve optimal performance, adapting algorithms for better compatibility with vector operations is necessary. In this paper, we introduce a vectorized implementation of the Density-based Clustering for Applications with Noise (DBSCAN) algorithm suitable for the execution on both shared and distributed memory systems. By leveraging SIMD, we enhance the performance of distance computations. Our proposed Vectorized HPDBSCAN (VHPDBSCAN) demonstrates a performance improvement of up to two times over the state-of-the-art parallel version, Highly Parallel DBSCAN (HPDBSCAN), on the ARM-based A64FX processor on two different datasets with varying dimensions. We have parallelized computations which are essential for the efficient workload distribution. This has significantly enhanced the performance on higher dimensional datasets. Additionally, we evaluate VHPDBSCAN’s energy consumption on the A64FX and Intel Xeon processors. The results show that in both processors, due to the reduced runtime, the total energy consumption of the application is reduced by 50% on the A64FX Central Processing Unit (CPU) and by approximately 19% on the Intel Xeon 8368 CPU compared to HPDBSCAN.
format Article
id doaj-art-89f509a67be546e0bbae4250b331d8ca
institution OA Journals
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-89f509a67be546e0bbae4250b331d8ca2025-08-20T02:38:35ZengIEEEIEEE Access2169-35362024-01-011218167918169210.1109/ACCESS.2024.350719310769413Vectorized Highly Parallel Density-Based Clustering for Applications With NoiseJoseph Arnold Xavier0https://orcid.org/0009-0007-5215-6022Juan Pedro Gutierrez Hermosillo Muriedas1https://orcid.org/0000-0001-8439-7145Stepan Nassyr2Rocco Sedona3https://orcid.org/0000-0003-4089-972XMarkus Gotz4https://orcid.org/0000-0002-2233-1041Achim Streit5https://orcid.org/0000-0002-5065-469XMorris Riedel6https://orcid.org/0000-0003-1810-9330Gabriele Cavallaro7https://orcid.org/0000-0002-3239-9904Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich, Jülich, GermanyScientific Computing Center (SCC), Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, GermanyJülich Supercomputing Centre (JSC), Forschungszentrum Jülich, Jülich, GermanyJülich Supercomputing Centre (JSC), Forschungszentrum Jülich, Jülich, GermanyScientific Computing Center (SCC), Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, GermanyScientific Computing Center (SCC), Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, GermanyJülich Supercomputing Centre (JSC), Forschungszentrum Jülich, Jülich, GermanyJülich Supercomputing Centre (JSC), Forschungszentrum Jülich, Jülich, GermanyClustering in data mining involves grouping similar objects into categories based on their characteristics. As the volume of data continues to grow and advancements in high-performance computing evolve, a critical need has emerged for algorithms that can efficiently process these computations and exploit the various levels of parallelism offered by modern supercomputing systems. Exploiting Single Instruction Multiple Data (SIMD) instructions enhances parallelism at the instruction level and minimizes data movement within the memory hierarchy. To fully harness a processor’s SIMD capabilities and achieve optimal performance, adapting algorithms for better compatibility with vector operations is necessary. In this paper, we introduce a vectorized implementation of the Density-based Clustering for Applications with Noise (DBSCAN) algorithm suitable for the execution on both shared and distributed memory systems. By leveraging SIMD, we enhance the performance of distance computations. Our proposed Vectorized HPDBSCAN (VHPDBSCAN) demonstrates a performance improvement of up to two times over the state-of-the-art parallel version, Highly Parallel DBSCAN (HPDBSCAN), on the ARM-based A64FX processor on two different datasets with varying dimensions. We have parallelized computations which are essential for the efficient workload distribution. This has significantly enhanced the performance on higher dimensional datasets. Additionally, we evaluate VHPDBSCAN’s energy consumption on the A64FX and Intel Xeon processors. The results show that in both processors, due to the reduced runtime, the total energy consumption of the application is reduced by 50% on the A64FX Central Processing Unit (CPU) and by approximately 19% on the Intel Xeon 8368 CPU compared to HPDBSCAN.https://ieeexplore.ieee.org/document/10769413/High-performance computingdensity-based clusteringvectorizationVHPDBSCAN
spellingShingle Joseph Arnold Xavier
Juan Pedro Gutierrez Hermosillo Muriedas
Stepan Nassyr
Rocco Sedona
Markus Gotz
Achim Streit
Morris Riedel
Gabriele Cavallaro
Vectorized Highly Parallel Density-Based Clustering for Applications With Noise
IEEE Access
High-performance computing
density-based clustering
vectorization
VHPDBSCAN
title Vectorized Highly Parallel Density-Based Clustering for Applications With Noise
title_full Vectorized Highly Parallel Density-Based Clustering for Applications With Noise
title_fullStr Vectorized Highly Parallel Density-Based Clustering for Applications With Noise
title_full_unstemmed Vectorized Highly Parallel Density-Based Clustering for Applications With Noise
title_short Vectorized Highly Parallel Density-Based Clustering for Applications With Noise
title_sort vectorized highly parallel density based clustering for applications with noise
topic High-performance computing
density-based clustering
vectorization
VHPDBSCAN
url https://ieeexplore.ieee.org/document/10769413/
work_keys_str_mv AT josepharnoldxavier vectorizedhighlyparalleldensitybasedclusteringforapplicationswithnoise
AT juanpedrogutierrezhermosillomuriedas vectorizedhighlyparalleldensitybasedclusteringforapplicationswithnoise
AT stepannassyr vectorizedhighlyparalleldensitybasedclusteringforapplicationswithnoise
AT roccosedona vectorizedhighlyparalleldensitybasedclusteringforapplicationswithnoise
AT markusgotz vectorizedhighlyparalleldensitybasedclusteringforapplicationswithnoise
AT achimstreit vectorizedhighlyparalleldensitybasedclusteringforapplicationswithnoise
AT morrisriedel vectorizedhighlyparalleldensitybasedclusteringforapplicationswithnoise
AT gabrielecavallaro vectorizedhighlyparalleldensitybasedclusteringforapplicationswithnoise