A Clustering Algorithm for Large Datasets Based on Detection of Density Variations

Clustering algorithms help handle unlabeled datasets. In large datasets, density-based clustering algorithms effectively capture the intricate structures and varied distributions that these datasets often exhibit. However, while these algorithms can adapt to large datasets by building clusters with...

Full description

Saved in:

Bibliographic Details
Main Authors:	Adrián Josué Ramírez-Díaz, José Francisco Martínez-Trinidad, Jesús Ariel Carrasco-Ochoa
Format:	Article
Language:	English
Published:	MDPI AG 2025-07-01
Series:	Mathematics
Subjects:	clustering large dataset density variations
Online Access:	https://www.mdpi.com/2227-7390/13/14/2272
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850072495869132800
author	Adrián Josué Ramírez-Díaz José Francisco Martínez-Trinidad Jesús Ariel Carrasco-Ochoa
author_facet	Adrián Josué Ramírez-Díaz José Francisco Martínez-Trinidad Jesús Ariel Carrasco-Ochoa
author_sort	Adrián Josué Ramírez-Díaz
collection	DOAJ
description	Clustering algorithms help handle unlabeled datasets. In large datasets, density-based clustering algorithms effectively capture the intricate structures and varied distributions that these datasets often exhibit. However, while these algorithms can adapt to large datasets by building clusters with arbitrary shapes by identifying low-density regions, they usually struggle to identify density variations. This paper proposes a Variable DEnsity Clustering Algorithm for Large datasets (VDECAL) to address this limitation. VDECAL introduces a large-dataset partitioning strategy that allows working with manageable subsets and prevents workload imbalance. Within each partition, relevant objects subsets characterized by attributes such as density, position, and overlap ratio are computed to identify both low-density regions and density variations, thereby facilitating the building of the clusters. Extensive experiments on diverse datasets show that VDECAL effectively detects density variations, improving clustering quality and runtime performance compared to state-of-the-art DBSCAN-based algorithms developed for clustering large datasets.
format	Article
id	doaj-art-2a93d538e0c4482485014c19f1fc7e13
institution	DOAJ
issn	2227-7390
language	English
publishDate	2025-07-01
publisher	MDPI AG
record_format	Article
series	Mathematics
spelling	doaj-art-2a93d538e0c4482485014c19f1fc7e132025-08-20T02:47:04ZengMDPI AGMathematics2227-73902025-07-011314227210.3390/math13142272A Clustering Algorithm for Large Datasets Based on Detection of Density VariationsAdrián Josué Ramírez-Díaz0José Francisco Martínez-Trinidad1Jesús Ariel Carrasco-Ochoa2Instituto Nacional de Astrofísica, Óptica y Electrónica, Luis Enrique Erro # 1, Tonantzintla, Puebla 72840, MexicoInstituto Nacional de Astrofísica, Óptica y Electrónica, Luis Enrique Erro # 1, Tonantzintla, Puebla 72840, MexicoInstituto Nacional de Astrofísica, Óptica y Electrónica, Luis Enrique Erro # 1, Tonantzintla, Puebla 72840, MexicoClustering algorithms help handle unlabeled datasets. In large datasets, density-based clustering algorithms effectively capture the intricate structures and varied distributions that these datasets often exhibit. However, while these algorithms can adapt to large datasets by building clusters with arbitrary shapes by identifying low-density regions, they usually struggle to identify density variations. This paper proposes a Variable DEnsity Clustering Algorithm for Large datasets (VDECAL) to address this limitation. VDECAL introduces a large-dataset partitioning strategy that allows working with manageable subsets and prevents workload imbalance. Within each partition, relevant objects subsets characterized by attributes such as density, position, and overlap ratio are computed to identify both low-density regions and density variations, thereby facilitating the building of the clusters. Extensive experiments on diverse datasets show that VDECAL effectively detects density variations, improving clustering quality and runtime performance compared to state-of-the-art DBSCAN-based algorithms developed for clustering large datasets.https://www.mdpi.com/2227-7390/13/14/2272clusteringlarge datasetdensity variations
spellingShingle	Adrián Josué Ramírez-Díaz José Francisco Martínez-Trinidad Jesús Ariel Carrasco-Ochoa A Clustering Algorithm for Large Datasets Based on Detection of Density Variations Mathematics clustering large dataset density variations
title	A Clustering Algorithm for Large Datasets Based on Detection of Density Variations
title_full	A Clustering Algorithm for Large Datasets Based on Detection of Density Variations
title_fullStr	A Clustering Algorithm for Large Datasets Based on Detection of Density Variations
title_full_unstemmed	A Clustering Algorithm for Large Datasets Based on Detection of Density Variations
title_short	A Clustering Algorithm for Large Datasets Based on Detection of Density Variations
title_sort	clustering algorithm for large datasets based on detection of density variations
topic	clustering large dataset density variations
url	https://www.mdpi.com/2227-7390/13/14/2272
work_keys_str_mv	AT adrianjosueramirezdiaz aclusteringalgorithmforlargedatasetsbasedondetectionofdensityvariations AT josefranciscomartineztrinidad aclusteringalgorithmforlargedatasetsbasedondetectionofdensityvariations AT jesusarielcarrascoochoa aclusteringalgorithmforlargedatasetsbasedondetectionofdensityvariations AT adrianjosueramirezdiaz clusteringalgorithmforlargedatasetsbasedondetectionofdensityvariations AT josefranciscomartineztrinidad clusteringalgorithmforlargedatasetsbasedondetectionofdensityvariations AT jesusarielcarrascoochoa clusteringalgorithmforlargedatasetsbasedondetectionofdensityvariations

A Clustering Algorithm for Large Datasets Based on Detection of Density Variations

Similar Items