A deep multiple self-supervised clustering model based on autoencoder networks

Abstract Numerous models for deep clustering have been proposed in recent times, exhibiting remarkable performance in unsupervised learning. However, they often concentrate on the features of the data itself, seldom taking into account the structure and distribution of the data during representation...

Full description

Saved in:
Bibliographic Details
Main Authors: Ling Zhu, Zijin Liu, Guangyu Liu
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-00349-z
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Numerous models for deep clustering have been proposed in recent times, exhibiting remarkable performance in unsupervised learning. However, they often concentrate on the features of the data itself, seldom taking into account the structure and distribution of the data during representation learning. To address this challenge, we propose a new Deep Multiple Self-supervised Clustering model, termed DMSC, which places greater emphasis on the structural distribution of the data. The proposed model effectively integrates the advantages of autoencoder and fuzzy C-Means clustering, performing multi-level clustering evaluations throughout multiple iterations of the autoencoder network training process. It leverages a gradient-like approach for data reconstruction, enabling the autoencoder to learn features more conducive to clustering, thereby enhancing clustering performance. The experimental results show that the model performs significantly better than various common clustering algorithms on datasets of different types in multiple fields. Furthermore, to boost the efficiency of the multi-layer clustering module within our model and minimize algorithmic overhead, we integrate a distance-based Two-stage fuzzy C-Means clustering method. This approach introduces an efficient, adaptable, and rational technique for initializing cluster centers and membership matrices for fuzzy C-Means clustering, achieving convergence of the loss function in a shorter time frame. Compared to the performance of traditional fuzzy C-Means clustering algorithms on various public datasets, our proposed method significantly reduces computation time and noticeably improves iteration efficiency.
ISSN:2045-2322