Class-aware contrastive optimization for imbalanced text classification

Abstract The unique characteristics of text data make classification tasks a complex problem. Advances in unsupervised and semi-supervised learning and autoencoder architectures addressed several challenges. However, they still struggle with imbalanced text classification tasks, a common scenario in...

Full description

Saved in:
Bibliographic Details
Main Authors: Grigorii Khvatskii, Nuno Moniz, Khoa D. Doan, Nitesh V. Chawla
Format: Article
Language:English
Published: Springer 2025-07-01
Series:Discover Data
Subjects:
Online Access:https://doi.org/10.1007/s44248-025-00064-0
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract The unique characteristics of text data make classification tasks a complex problem. Advances in unsupervised and semi-supervised learning and autoencoder architectures addressed several challenges. However, they still struggle with imbalanced text classification tasks, a common scenario in real-world applications, demonstrating a tendency to produce embeddings with unfavorable properties, such as class overlap. In this paper, we show that leveraging class-aware contrastive optimization combined with denoising autoencoders can successfully tackle imbalanced text classification tasks, achieving better performance than the other strong text classification models. Concretely, our proposal combines reconstruction loss with contrastive class separation in the embedding space, allowing a better balance between the truthfulness of the generated embeddings and the model’s ability to separate different classes. Compared with an extensive set of traditional and deep learning based competing methods, our proposal demonstrates a notable increase in performance across a wide variety of text datasets.
ISSN:2731-6955