A Clustering Algorithm for Large Datasets Based on Detection of Density Variations
Clustering algorithms help handle unlabeled datasets. In large datasets, density-based clustering algorithms effectively capture the intricate structures and varied distributions that these datasets often exhibit. However, while these algorithms can adapt to large datasets by building clusters with...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-07-01
|
| Series: | Mathematics |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2227-7390/13/14/2272 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850072495869132800 |
|---|---|
| author | Adrián Josué Ramírez-Díaz José Francisco Martínez-Trinidad Jesús Ariel Carrasco-Ochoa |
| author_facet | Adrián Josué Ramírez-Díaz José Francisco Martínez-Trinidad Jesús Ariel Carrasco-Ochoa |
| author_sort | Adrián Josué Ramírez-Díaz |
| collection | DOAJ |
| description | Clustering algorithms help handle unlabeled datasets. In large datasets, density-based clustering algorithms effectively capture the intricate structures and varied distributions that these datasets often exhibit. However, while these algorithms can adapt to large datasets by building clusters with arbitrary shapes by identifying low-density regions, they usually struggle to identify density variations. This paper proposes a Variable DEnsity Clustering Algorithm for Large datasets (VDECAL) to address this limitation. VDECAL introduces a large-dataset partitioning strategy that allows working with manageable subsets and prevents workload imbalance. Within each partition, relevant objects subsets characterized by attributes such as density, position, and overlap ratio are computed to identify both low-density regions and density variations, thereby facilitating the building of the clusters. Extensive experiments on diverse datasets show that VDECAL effectively detects density variations, improving clustering quality and runtime performance compared to state-of-the-art DBSCAN-based algorithms developed for clustering large datasets. |
| format | Article |
| id | doaj-art-2a93d538e0c4482485014c19f1fc7e13 |
| institution | DOAJ |
| issn | 2227-7390 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Mathematics |
| spelling | doaj-art-2a93d538e0c4482485014c19f1fc7e132025-08-20T02:47:04ZengMDPI AGMathematics2227-73902025-07-011314227210.3390/math13142272A Clustering Algorithm for Large Datasets Based on Detection of Density VariationsAdrián Josué Ramírez-Díaz0José Francisco Martínez-Trinidad1Jesús Ariel Carrasco-Ochoa2Instituto Nacional de Astrofísica, Óptica y Electrónica, Luis Enrique Erro # 1, Tonantzintla, Puebla 72840, MexicoInstituto Nacional de Astrofísica, Óptica y Electrónica, Luis Enrique Erro # 1, Tonantzintla, Puebla 72840, MexicoInstituto Nacional de Astrofísica, Óptica y Electrónica, Luis Enrique Erro # 1, Tonantzintla, Puebla 72840, MexicoClustering algorithms help handle unlabeled datasets. In large datasets, density-based clustering algorithms effectively capture the intricate structures and varied distributions that these datasets often exhibit. However, while these algorithms can adapt to large datasets by building clusters with arbitrary shapes by identifying low-density regions, they usually struggle to identify density variations. This paper proposes a Variable DEnsity Clustering Algorithm for Large datasets (VDECAL) to address this limitation. VDECAL introduces a large-dataset partitioning strategy that allows working with manageable subsets and prevents workload imbalance. Within each partition, relevant objects subsets characterized by attributes such as density, position, and overlap ratio are computed to identify both low-density regions and density variations, thereby facilitating the building of the clusters. Extensive experiments on diverse datasets show that VDECAL effectively detects density variations, improving clustering quality and runtime performance compared to state-of-the-art DBSCAN-based algorithms developed for clustering large datasets.https://www.mdpi.com/2227-7390/13/14/2272clusteringlarge datasetdensity variations |
| spellingShingle | Adrián Josué Ramírez-Díaz José Francisco Martínez-Trinidad Jesús Ariel Carrasco-Ochoa A Clustering Algorithm for Large Datasets Based on Detection of Density Variations Mathematics clustering large dataset density variations |
| title | A Clustering Algorithm for Large Datasets Based on Detection of Density Variations |
| title_full | A Clustering Algorithm for Large Datasets Based on Detection of Density Variations |
| title_fullStr | A Clustering Algorithm for Large Datasets Based on Detection of Density Variations |
| title_full_unstemmed | A Clustering Algorithm for Large Datasets Based on Detection of Density Variations |
| title_short | A Clustering Algorithm for Large Datasets Based on Detection of Density Variations |
| title_sort | clustering algorithm for large datasets based on detection of density variations |
| topic | clustering large dataset density variations |
| url | https://www.mdpi.com/2227-7390/13/14/2272 |
| work_keys_str_mv | AT adrianjosueramirezdiaz aclusteringalgorithmforlargedatasetsbasedondetectionofdensityvariations AT josefranciscomartineztrinidad aclusteringalgorithmforlargedatasetsbasedondetectionofdensityvariations AT jesusarielcarrascoochoa aclusteringalgorithmforlargedatasetsbasedondetectionofdensityvariations AT adrianjosueramirezdiaz clusteringalgorithmforlargedatasetsbasedondetectionofdensityvariations AT josefranciscomartineztrinidad clusteringalgorithmforlargedatasetsbasedondetectionofdensityvariations AT jesusarielcarrascoochoa clusteringalgorithmforlargedatasetsbasedondetectionofdensityvariations |