Automated Exploratory Clustering to Democratize Clustering Analysis
AutoML is enabling many practitioners to use sophisticated Machine Learning pipelines even without being experienced in building application-specific solutions. Adapting AutoML to the field of unsupervised learning, particularly to the task of clustering, is challenging, as clustering is highly subj...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-06-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/12/6876 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849467571745587200 |
|---|---|
| author | Georg Stefan Schlake Max Pernklau Christian Beecks |
| author_facet | Georg Stefan Schlake Max Pernklau Christian Beecks |
| author_sort | Georg Stefan Schlake |
| collection | DOAJ |
| description | AutoML is enabling many practitioners to use sophisticated Machine Learning pipelines even without being experienced in building application-specific solutions. Adapting AutoML to the field of unsupervised learning, particularly to the task of clustering, is challenging, as clustering is highly subjective and application-specific; the goal is not to find the best way to group data objects based on previously seen examples, but to find interesting new structures within potentially unknown data objects that provide actionable insights. The level of interestingness of a clustering is highly subjective and is subject to a variety of different characteristics making different clusterings of the same dataset (e.g., grouping people by age, gender, or special interests). In this paper, we propose an <i>Automated Exploratory Clustering</i> framework which determines multiple clusterings satisfying different notions of interestingness automatically. To this end, we generate multiple clusterings via AutoML processes and return a selection of clusterings, from which the user can explore the most preferred ones. We use different methods like the skyline operator to prune non-Pareto-optimal clusterings wrt. different dimensions of interestingsness and deliver a small set of valuable clusterings. In this way, our approach enables practitioners as well as domain experts to identify valuable clusterings without becoming experts in clustering as well, thus reducing human efforts and resources in finding application-specific solutions. Our empirical investigation with current state-of-the-art methods is carried out on a number of benchmark datasets, where a well-established ground truth can proxy for the wishes of a domain expert and multiple interestingness properties of the clusterings. |
| format | Article |
| id | doaj-art-12684da0d51c4575a3121f2dfb6cfbe2 |
| institution | Kabale University |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-12684da0d51c4575a3121f2dfb6cfbe22025-08-20T03:26:10ZengMDPI AGApplied Sciences2076-34172025-06-011512687610.3390/app15126876Automated Exploratory Clustering to Democratize Clustering AnalysisGeorg Stefan Schlake0Max Pernklau1Christian Beecks2Chair of Data Science, University of Hagen, 58084 Hagen, GermanyChair of Data Science, University of Hagen, 58084 Hagen, GermanyChair of Data Science, University of Hagen, 58084 Hagen, GermanyAutoML is enabling many practitioners to use sophisticated Machine Learning pipelines even without being experienced in building application-specific solutions. Adapting AutoML to the field of unsupervised learning, particularly to the task of clustering, is challenging, as clustering is highly subjective and application-specific; the goal is not to find the best way to group data objects based on previously seen examples, but to find interesting new structures within potentially unknown data objects that provide actionable insights. The level of interestingness of a clustering is highly subjective and is subject to a variety of different characteristics making different clusterings of the same dataset (e.g., grouping people by age, gender, or special interests). In this paper, we propose an <i>Automated Exploratory Clustering</i> framework which determines multiple clusterings satisfying different notions of interestingness automatically. To this end, we generate multiple clusterings via AutoML processes and return a selection of clusterings, from which the user can explore the most preferred ones. We use different methods like the skyline operator to prune non-Pareto-optimal clusterings wrt. different dimensions of interestingsness and deliver a small set of valuable clusterings. In this way, our approach enables practitioners as well as domain experts to identify valuable clusterings without becoming experts in clustering as well, thus reducing human efforts and resources in finding application-specific solutions. Our empirical investigation with current state-of-the-art methods is carried out on a number of benchmark datasets, where a well-established ground truth can proxy for the wishes of a domain expert and multiple interestingness properties of the clusterings.https://www.mdpi.com/2076-3417/15/12/6876automated clusteringAutoMLskyline operatorclustering validationhuman in the loopautomated exploratory clustering |
| spellingShingle | Georg Stefan Schlake Max Pernklau Christian Beecks Automated Exploratory Clustering to Democratize Clustering Analysis Applied Sciences automated clustering AutoML skyline operator clustering validation human in the loop automated exploratory clustering |
| title | Automated Exploratory Clustering to Democratize Clustering Analysis |
| title_full | Automated Exploratory Clustering to Democratize Clustering Analysis |
| title_fullStr | Automated Exploratory Clustering to Democratize Clustering Analysis |
| title_full_unstemmed | Automated Exploratory Clustering to Democratize Clustering Analysis |
| title_short | Automated Exploratory Clustering to Democratize Clustering Analysis |
| title_sort | automated exploratory clustering to democratize clustering analysis |
| topic | automated clustering AutoML skyline operator clustering validation human in the loop automated exploratory clustering |
| url | https://www.mdpi.com/2076-3417/15/12/6876 |
| work_keys_str_mv | AT georgstefanschlake automatedexploratoryclusteringtodemocratizeclusteringanalysis AT maxpernklau automatedexploratoryclusteringtodemocratizeclusteringanalysis AT christianbeecks automatedexploratoryclusteringtodemocratizeclusteringanalysis |