Automated Exploratory Clustering to Democratize Clustering Analysis

AutoML is enabling many practitioners to use sophisticated Machine Learning pipelines even without being experienced in building application-specific solutions. Adapting AutoML to the field of unsupervised learning, particularly to the task of clustering, is challenging, as clustering is highly subj...

Full description

Saved in:
Bibliographic Details
Main Authors: Georg Stefan Schlake, Max Pernklau, Christian Beecks
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/12/6876
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849467571745587200
author Georg Stefan Schlake
Max Pernklau
Christian Beecks
author_facet Georg Stefan Schlake
Max Pernklau
Christian Beecks
author_sort Georg Stefan Schlake
collection DOAJ
description AutoML is enabling many practitioners to use sophisticated Machine Learning pipelines even without being experienced in building application-specific solutions. Adapting AutoML to the field of unsupervised learning, particularly to the task of clustering, is challenging, as clustering is highly subjective and application-specific; the goal is not to find the best way to group data objects based on previously seen examples, but to find interesting new structures within potentially unknown data objects that provide actionable insights. The level of interestingness of a clustering is highly subjective and is subject to a variety of different characteristics making different clusterings of the same dataset (e.g., grouping people by age, gender, or special interests). In this paper, we propose an <i>Automated Exploratory Clustering</i> framework which determines multiple clusterings satisfying different notions of interestingness automatically. To this end, we generate multiple clusterings via AutoML processes and return a selection of clusterings, from which the user can explore the most preferred ones. We use different methods like the skyline operator to prune non-Pareto-optimal clusterings wrt. different dimensions of interestingsness and deliver a small set of valuable clusterings. In this way, our approach enables practitioners as well as domain experts to identify valuable clusterings without becoming experts in clustering as well, thus reducing human efforts and resources in finding application-specific solutions. Our empirical investigation with current state-of-the-art methods is carried out on a number of benchmark datasets, where a well-established ground truth can proxy for the wishes of a domain expert and multiple interestingness properties of the clusterings.
format Article
id doaj-art-12684da0d51c4575a3121f2dfb6cfbe2
institution Kabale University
issn 2076-3417
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-12684da0d51c4575a3121f2dfb6cfbe22025-08-20T03:26:10ZengMDPI AGApplied Sciences2076-34172025-06-011512687610.3390/app15126876Automated Exploratory Clustering to Democratize Clustering AnalysisGeorg Stefan Schlake0Max Pernklau1Christian Beecks2Chair of Data Science, University of Hagen, 58084 Hagen, GermanyChair of Data Science, University of Hagen, 58084 Hagen, GermanyChair of Data Science, University of Hagen, 58084 Hagen, GermanyAutoML is enabling many practitioners to use sophisticated Machine Learning pipelines even without being experienced in building application-specific solutions. Adapting AutoML to the field of unsupervised learning, particularly to the task of clustering, is challenging, as clustering is highly subjective and application-specific; the goal is not to find the best way to group data objects based on previously seen examples, but to find interesting new structures within potentially unknown data objects that provide actionable insights. The level of interestingness of a clustering is highly subjective and is subject to a variety of different characteristics making different clusterings of the same dataset (e.g., grouping people by age, gender, or special interests). In this paper, we propose an <i>Automated Exploratory Clustering</i> framework which determines multiple clusterings satisfying different notions of interestingness automatically. To this end, we generate multiple clusterings via AutoML processes and return a selection of clusterings, from which the user can explore the most preferred ones. We use different methods like the skyline operator to prune non-Pareto-optimal clusterings wrt. different dimensions of interestingsness and deliver a small set of valuable clusterings. In this way, our approach enables practitioners as well as domain experts to identify valuable clusterings without becoming experts in clustering as well, thus reducing human efforts and resources in finding application-specific solutions. Our empirical investigation with current state-of-the-art methods is carried out on a number of benchmark datasets, where a well-established ground truth can proxy for the wishes of a domain expert and multiple interestingness properties of the clusterings.https://www.mdpi.com/2076-3417/15/12/6876automated clusteringAutoMLskyline operatorclustering validationhuman in the loopautomated exploratory clustering
spellingShingle Georg Stefan Schlake
Max Pernklau
Christian Beecks
Automated Exploratory Clustering to Democratize Clustering Analysis
Applied Sciences
automated clustering
AutoML
skyline operator
clustering validation
human in the loop
automated exploratory clustering
title Automated Exploratory Clustering to Democratize Clustering Analysis
title_full Automated Exploratory Clustering to Democratize Clustering Analysis
title_fullStr Automated Exploratory Clustering to Democratize Clustering Analysis
title_full_unstemmed Automated Exploratory Clustering to Democratize Clustering Analysis
title_short Automated Exploratory Clustering to Democratize Clustering Analysis
title_sort automated exploratory clustering to democratize clustering analysis
topic automated clustering
AutoML
skyline operator
clustering validation
human in the loop
automated exploratory clustering
url https://www.mdpi.com/2076-3417/15/12/6876
work_keys_str_mv AT georgstefanschlake automatedexploratoryclusteringtodemocratizeclusteringanalysis
AT maxpernklau automatedexploratoryclusteringtodemocratizeclusteringanalysis
AT christianbeecks automatedexploratoryclusteringtodemocratizeclusteringanalysis