Ellipsoidal <i>K</i>-Means: An Automatic Clustering Approach for Non-Uniform Data Distributions

Traditional <i>K</i>-means clustering assumes, to some extent, a uniform distribution of data around predefined centroids, which limits its effectiveness for many realistic datasets. In this paper, a new clustering technique, simulated-annealing-based ellipsoidal clustering (SAELLC), is...

Full description

Saved in:
Bibliographic Details
Main Authors: Alaa E. Abdel-Hakim, Abdel-Monem M. Ibrahim, Kheir Eddine Bouazza, Wael Deabes, Abdel-Rahman Hedar
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/17/12/551
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850240061084270592
author Alaa E. Abdel-Hakim
Abdel-Monem M. Ibrahim
Kheir Eddine Bouazza
Wael Deabes
Abdel-Rahman Hedar
author_facet Alaa E. Abdel-Hakim
Abdel-Monem M. Ibrahim
Kheir Eddine Bouazza
Wael Deabes
Abdel-Rahman Hedar
author_sort Alaa E. Abdel-Hakim
collection DOAJ
description Traditional <i>K</i>-means clustering assumes, to some extent, a uniform distribution of data around predefined centroids, which limits its effectiveness for many realistic datasets. In this paper, a new clustering technique, simulated-annealing-based ellipsoidal clustering (SAELLC), is proposed to automatically partition data into an optimal number of ellipsoidal clusters, a capability absent in traditional methods. SAELLC transforms each identified cluster into a hyperspherical cluster, where the diameter of the hypersphere equals the minor axis of the original ellipsoid, and the center is encoded to represent the entire cluster. During the assignment of points to clusters, local ellipsoidal properties are independently considered. For objective function evaluation, the method adaptively transforms these ellipsoidal clusters into a variable number of global clusters. Two objective functions are simultaneously optimized: one reflecting partition compactness using the silhouette function (SF) and Euclidean distance, and another addressing cluster connectedness through a nearest-neighbor algorithm. This optimization is achieved using a newly-developed multiobjective simulated annealing approach. SAELLC is designed to automatically determine the optimal number of clusters, achieve precise partitioning, and accommodate a wide range of cluster shapes, including spherical, ellipsoidal, and non-symmetric forms. Extensive experiments conducted on UCI datasets demonstrated SAELLC’s superior performance compared to six well-known clustering algorithms. The results highlight its remarkable ability to handle diverse data distributions and automatically identify the optimal number of clusters, making it a robust choice for advanced clustering analysis.
format Article
id doaj-art-2460f6d70caa4a5086b1efd2f695822d
institution OA Journals
issn 1999-4893
language English
publishDate 2024-12-01
publisher MDPI AG
record_format Article
series Algorithms
spelling doaj-art-2460f6d70caa4a5086b1efd2f695822d2025-08-20T02:00:59ZengMDPI AGAlgorithms1999-48932024-12-01171255110.3390/a17120551Ellipsoidal <i>K</i>-Means: An Automatic Clustering Approach for Non-Uniform Data DistributionsAlaa E. Abdel-Hakim0Abdel-Monem M. Ibrahim1Kheir Eddine Bouazza2Wael Deabes3Abdel-Rahman Hedar4Department of Computer Science in Jamoum, Umm Al-Qura University, Makkah 25371, Saudi ArabiaDepartment of Mathematics, Faculty of Science, Al-Azhar University, Assiut Branch, Assiut 71524, EgyptComputer and Information Science Division, Higher Colleges of Technology, Abu Dhabi 25026, United Arab EmiratesDepartment of Computational, Engineering, Mathematical Sciences (CEMS), Texas A&M University-San Antonio, San Antonio, TX 78224, USADepartment of Computer Science, Faculty of Computers & Information, Assiut University, Assiut 71526, EgyptTraditional <i>K</i>-means clustering assumes, to some extent, a uniform distribution of data around predefined centroids, which limits its effectiveness for many realistic datasets. In this paper, a new clustering technique, simulated-annealing-based ellipsoidal clustering (SAELLC), is proposed to automatically partition data into an optimal number of ellipsoidal clusters, a capability absent in traditional methods. SAELLC transforms each identified cluster into a hyperspherical cluster, where the diameter of the hypersphere equals the minor axis of the original ellipsoid, and the center is encoded to represent the entire cluster. During the assignment of points to clusters, local ellipsoidal properties are independently considered. For objective function evaluation, the method adaptively transforms these ellipsoidal clusters into a variable number of global clusters. Two objective functions are simultaneously optimized: one reflecting partition compactness using the silhouette function (SF) and Euclidean distance, and another addressing cluster connectedness through a nearest-neighbor algorithm. This optimization is achieved using a newly-developed multiobjective simulated annealing approach. SAELLC is designed to automatically determine the optimal number of clusters, achieve precise partitioning, and accommodate a wide range of cluster shapes, including spherical, ellipsoidal, and non-symmetric forms. Extensive experiments conducted on UCI datasets demonstrated SAELLC’s superior performance compared to six well-known clustering algorithms. The results highlight its remarkable ability to handle diverse data distributions and automatically identify the optimal number of clusters, making it a robust choice for advanced clustering analysis.https://www.mdpi.com/1999-4893/17/12/551unsupervised learningdata clusteringsimulated annealingsilhouette functionadaptive ellipsoidal <i>K</i>-meansautomatic clustering
spellingShingle Alaa E. Abdel-Hakim
Abdel-Monem M. Ibrahim
Kheir Eddine Bouazza
Wael Deabes
Abdel-Rahman Hedar
Ellipsoidal <i>K</i>-Means: An Automatic Clustering Approach for Non-Uniform Data Distributions
Algorithms
unsupervised learning
data clustering
simulated annealing
silhouette function
adaptive ellipsoidal <i>K</i>-means
automatic clustering
title Ellipsoidal <i>K</i>-Means: An Automatic Clustering Approach for Non-Uniform Data Distributions
title_full Ellipsoidal <i>K</i>-Means: An Automatic Clustering Approach for Non-Uniform Data Distributions
title_fullStr Ellipsoidal <i>K</i>-Means: An Automatic Clustering Approach for Non-Uniform Data Distributions
title_full_unstemmed Ellipsoidal <i>K</i>-Means: An Automatic Clustering Approach for Non-Uniform Data Distributions
title_short Ellipsoidal <i>K</i>-Means: An Automatic Clustering Approach for Non-Uniform Data Distributions
title_sort ellipsoidal i k i means an automatic clustering approach for non uniform data distributions
topic unsupervised learning
data clustering
simulated annealing
silhouette function
adaptive ellipsoidal <i>K</i>-means
automatic clustering
url https://www.mdpi.com/1999-4893/17/12/551
work_keys_str_mv AT alaaeabdelhakim ellipsoidalikimeansanautomaticclusteringapproachfornonuniformdatadistributions
AT abdelmonemmibrahim ellipsoidalikimeansanautomaticclusteringapproachfornonuniformdatadistributions
AT kheireddinebouazza ellipsoidalikimeansanautomaticclusteringapproachfornonuniformdatadistributions
AT waeldeabes ellipsoidalikimeansanautomaticclusteringapproachfornonuniformdatadistributions
AT abdelrahmanhedar ellipsoidalikimeansanautomaticclusteringapproachfornonuniformdatadistributions