A Clustering Algorithm for Large Datasets Based on Detection of Density Variations

Clustering algorithms help handle unlabeled datasets. In large datasets, density-based clustering algorithms effectively capture the intricate structures and varied distributions that these datasets often exhibit. However, while these algorithms can adapt to large datasets by building clusters with...

Full description

Saved in:
Bibliographic Details
Main Authors: Adrián Josué Ramírez-Díaz, José Francisco Martínez-Trinidad, Jesús Ariel Carrasco-Ochoa
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/14/2272
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850072495869132800
author Adrián Josué Ramírez-Díaz
José Francisco Martínez-Trinidad
Jesús Ariel Carrasco-Ochoa
author_facet Adrián Josué Ramírez-Díaz
José Francisco Martínez-Trinidad
Jesús Ariel Carrasco-Ochoa
author_sort Adrián Josué Ramírez-Díaz
collection DOAJ
description Clustering algorithms help handle unlabeled datasets. In large datasets, density-based clustering algorithms effectively capture the intricate structures and varied distributions that these datasets often exhibit. However, while these algorithms can adapt to large datasets by building clusters with arbitrary shapes by identifying low-density regions, they usually struggle to identify density variations. This paper proposes a Variable DEnsity Clustering Algorithm for Large datasets (VDECAL) to address this limitation. VDECAL introduces a large-dataset partitioning strategy that allows working with manageable subsets and prevents workload imbalance. Within each partition, relevant objects subsets characterized by attributes such as density, position, and overlap ratio are computed to identify both low-density regions and density variations, thereby facilitating the building of the clusters. Extensive experiments on diverse datasets show that VDECAL effectively detects density variations, improving clustering quality and runtime performance compared to state-of-the-art DBSCAN-based algorithms developed for clustering large datasets.
format Article
id doaj-art-2a93d538e0c4482485014c19f1fc7e13
institution DOAJ
issn 2227-7390
language English
publishDate 2025-07-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj-art-2a93d538e0c4482485014c19f1fc7e132025-08-20T02:47:04ZengMDPI AGMathematics2227-73902025-07-011314227210.3390/math13142272A Clustering Algorithm for Large Datasets Based on Detection of Density VariationsAdrián Josué Ramírez-Díaz0José Francisco Martínez-Trinidad1Jesús Ariel Carrasco-Ochoa2Instituto Nacional de Astrofísica, Óptica y Electrónica, Luis Enrique Erro # 1, Tonantzintla, Puebla 72840, MexicoInstituto Nacional de Astrofísica, Óptica y Electrónica, Luis Enrique Erro # 1, Tonantzintla, Puebla 72840, MexicoInstituto Nacional de Astrofísica, Óptica y Electrónica, Luis Enrique Erro # 1, Tonantzintla, Puebla 72840, MexicoClustering algorithms help handle unlabeled datasets. In large datasets, density-based clustering algorithms effectively capture the intricate structures and varied distributions that these datasets often exhibit. However, while these algorithms can adapt to large datasets by building clusters with arbitrary shapes by identifying low-density regions, they usually struggle to identify density variations. This paper proposes a Variable DEnsity Clustering Algorithm for Large datasets (VDECAL) to address this limitation. VDECAL introduces a large-dataset partitioning strategy that allows working with manageable subsets and prevents workload imbalance. Within each partition, relevant objects subsets characterized by attributes such as density, position, and overlap ratio are computed to identify both low-density regions and density variations, thereby facilitating the building of the clusters. Extensive experiments on diverse datasets show that VDECAL effectively detects density variations, improving clustering quality and runtime performance compared to state-of-the-art DBSCAN-based algorithms developed for clustering large datasets.https://www.mdpi.com/2227-7390/13/14/2272clusteringlarge datasetdensity variations
spellingShingle Adrián Josué Ramírez-Díaz
José Francisco Martínez-Trinidad
Jesús Ariel Carrasco-Ochoa
A Clustering Algorithm for Large Datasets Based on Detection of Density Variations
Mathematics
clustering
large dataset
density variations
title A Clustering Algorithm for Large Datasets Based on Detection of Density Variations
title_full A Clustering Algorithm for Large Datasets Based on Detection of Density Variations
title_fullStr A Clustering Algorithm for Large Datasets Based on Detection of Density Variations
title_full_unstemmed A Clustering Algorithm for Large Datasets Based on Detection of Density Variations
title_short A Clustering Algorithm for Large Datasets Based on Detection of Density Variations
title_sort clustering algorithm for large datasets based on detection of density variations
topic clustering
large dataset
density variations
url https://www.mdpi.com/2227-7390/13/14/2272
work_keys_str_mv AT adrianjosueramirezdiaz aclusteringalgorithmforlargedatasetsbasedondetectionofdensityvariations
AT josefranciscomartineztrinidad aclusteringalgorithmforlargedatasetsbasedondetectionofdensityvariations
AT jesusarielcarrascoochoa aclusteringalgorithmforlargedatasetsbasedondetectionofdensityvariations
AT adrianjosueramirezdiaz clusteringalgorithmforlargedatasetsbasedondetectionofdensityvariations
AT josefranciscomartineztrinidad clusteringalgorithmforlargedatasetsbasedondetectionofdensityvariations
AT jesusarielcarrascoochoa clusteringalgorithmforlargedatasetsbasedondetectionofdensityvariations