Ellipsoidal <i>K</i>-Means: An Automatic Clustering Approach for Non-Uniform Data Distributions
Traditional <i>K</i>-means clustering assumes, to some extent, a uniform distribution of data around predefined centroids, which limits its effectiveness for many realistic datasets. In this paper, a new clustering technique, simulated-annealing-based ellipsoidal clustering (SAELLC), is...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-12-01
|
| Series: | Algorithms |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1999-4893/17/12/551 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850240061084270592 |
|---|---|
| author | Alaa E. Abdel-Hakim Abdel-Monem M. Ibrahim Kheir Eddine Bouazza Wael Deabes Abdel-Rahman Hedar |
| author_facet | Alaa E. Abdel-Hakim Abdel-Monem M. Ibrahim Kheir Eddine Bouazza Wael Deabes Abdel-Rahman Hedar |
| author_sort | Alaa E. Abdel-Hakim |
| collection | DOAJ |
| description | Traditional <i>K</i>-means clustering assumes, to some extent, a uniform distribution of data around predefined centroids, which limits its effectiveness for many realistic datasets. In this paper, a new clustering technique, simulated-annealing-based ellipsoidal clustering (SAELLC), is proposed to automatically partition data into an optimal number of ellipsoidal clusters, a capability absent in traditional methods. SAELLC transforms each identified cluster into a hyperspherical cluster, where the diameter of the hypersphere equals the minor axis of the original ellipsoid, and the center is encoded to represent the entire cluster. During the assignment of points to clusters, local ellipsoidal properties are independently considered. For objective function evaluation, the method adaptively transforms these ellipsoidal clusters into a variable number of global clusters. Two objective functions are simultaneously optimized: one reflecting partition compactness using the silhouette function (SF) and Euclidean distance, and another addressing cluster connectedness through a nearest-neighbor algorithm. This optimization is achieved using a newly-developed multiobjective simulated annealing approach. SAELLC is designed to automatically determine the optimal number of clusters, achieve precise partitioning, and accommodate a wide range of cluster shapes, including spherical, ellipsoidal, and non-symmetric forms. Extensive experiments conducted on UCI datasets demonstrated SAELLC’s superior performance compared to six well-known clustering algorithms. The results highlight its remarkable ability to handle diverse data distributions and automatically identify the optimal number of clusters, making it a robust choice for advanced clustering analysis. |
| format | Article |
| id | doaj-art-2460f6d70caa4a5086b1efd2f695822d |
| institution | OA Journals |
| issn | 1999-4893 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Algorithms |
| spelling | doaj-art-2460f6d70caa4a5086b1efd2f695822d2025-08-20T02:00:59ZengMDPI AGAlgorithms1999-48932024-12-01171255110.3390/a17120551Ellipsoidal <i>K</i>-Means: An Automatic Clustering Approach for Non-Uniform Data DistributionsAlaa E. Abdel-Hakim0Abdel-Monem M. Ibrahim1Kheir Eddine Bouazza2Wael Deabes3Abdel-Rahman Hedar4Department of Computer Science in Jamoum, Umm Al-Qura University, Makkah 25371, Saudi ArabiaDepartment of Mathematics, Faculty of Science, Al-Azhar University, Assiut Branch, Assiut 71524, EgyptComputer and Information Science Division, Higher Colleges of Technology, Abu Dhabi 25026, United Arab EmiratesDepartment of Computational, Engineering, Mathematical Sciences (CEMS), Texas A&M University-San Antonio, San Antonio, TX 78224, USADepartment of Computer Science, Faculty of Computers & Information, Assiut University, Assiut 71526, EgyptTraditional <i>K</i>-means clustering assumes, to some extent, a uniform distribution of data around predefined centroids, which limits its effectiveness for many realistic datasets. In this paper, a new clustering technique, simulated-annealing-based ellipsoidal clustering (SAELLC), is proposed to automatically partition data into an optimal number of ellipsoidal clusters, a capability absent in traditional methods. SAELLC transforms each identified cluster into a hyperspherical cluster, where the diameter of the hypersphere equals the minor axis of the original ellipsoid, and the center is encoded to represent the entire cluster. During the assignment of points to clusters, local ellipsoidal properties are independently considered. For objective function evaluation, the method adaptively transforms these ellipsoidal clusters into a variable number of global clusters. Two objective functions are simultaneously optimized: one reflecting partition compactness using the silhouette function (SF) and Euclidean distance, and another addressing cluster connectedness through a nearest-neighbor algorithm. This optimization is achieved using a newly-developed multiobjective simulated annealing approach. SAELLC is designed to automatically determine the optimal number of clusters, achieve precise partitioning, and accommodate a wide range of cluster shapes, including spherical, ellipsoidal, and non-symmetric forms. Extensive experiments conducted on UCI datasets demonstrated SAELLC’s superior performance compared to six well-known clustering algorithms. The results highlight its remarkable ability to handle diverse data distributions and automatically identify the optimal number of clusters, making it a robust choice for advanced clustering analysis.https://www.mdpi.com/1999-4893/17/12/551unsupervised learningdata clusteringsimulated annealingsilhouette functionadaptive ellipsoidal <i>K</i>-meansautomatic clustering |
| spellingShingle | Alaa E. Abdel-Hakim Abdel-Monem M. Ibrahim Kheir Eddine Bouazza Wael Deabes Abdel-Rahman Hedar Ellipsoidal <i>K</i>-Means: An Automatic Clustering Approach for Non-Uniform Data Distributions Algorithms unsupervised learning data clustering simulated annealing silhouette function adaptive ellipsoidal <i>K</i>-means automatic clustering |
| title | Ellipsoidal <i>K</i>-Means: An Automatic Clustering Approach for Non-Uniform Data Distributions |
| title_full | Ellipsoidal <i>K</i>-Means: An Automatic Clustering Approach for Non-Uniform Data Distributions |
| title_fullStr | Ellipsoidal <i>K</i>-Means: An Automatic Clustering Approach for Non-Uniform Data Distributions |
| title_full_unstemmed | Ellipsoidal <i>K</i>-Means: An Automatic Clustering Approach for Non-Uniform Data Distributions |
| title_short | Ellipsoidal <i>K</i>-Means: An Automatic Clustering Approach for Non-Uniform Data Distributions |
| title_sort | ellipsoidal i k i means an automatic clustering approach for non uniform data distributions |
| topic | unsupervised learning data clustering simulated annealing silhouette function adaptive ellipsoidal <i>K</i>-means automatic clustering |
| url | https://www.mdpi.com/1999-4893/17/12/551 |
| work_keys_str_mv | AT alaaeabdelhakim ellipsoidalikimeansanautomaticclusteringapproachfornonuniformdatadistributions AT abdelmonemmibrahim ellipsoidalikimeansanautomaticclusteringapproachfornonuniformdatadistributions AT kheireddinebouazza ellipsoidalikimeansanautomaticclusteringapproachfornonuniformdatadistributions AT waeldeabes ellipsoidalikimeansanautomaticclusteringapproachfornonuniformdatadistributions AT abdelrahmanhedar ellipsoidalikimeansanautomaticclusteringapproachfornonuniformdatadistributions |