Optimised knowledge distillation for efficient social media emotion recognition using DistilBERT and ALBERT

Abstract Accurate emotion recognition in social media text is critical for applications such as sentiment analysis, mental health monitoring, and human-computer interaction. However, existing approaches face challenges like computational complexity and class imbalance, limiting their deployment in r...

Full description

Saved in:
Bibliographic Details
Main Authors: Muhammad Hussain, Caikou Chen, Muzammil Hussain, Muhammad Anwar, Mohammed Abaker, Abdelzahir Abdelmaboud, Iqra Yamin
Format: Article
Language:English
Published: Nature Portfolio 2025-08-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-16001-9
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Accurate emotion recognition in social media text is critical for applications such as sentiment analysis, mental health monitoring, and human-computer interaction. However, existing approaches face challenges like computational complexity and class imbalance, limiting their deployment in resource-constrained environments. While transformer-based models achieve state-of-the-art performance, their size and latency hinder real-time applications. To address these issues, we propose a novel knowledge distillation framework that transfers knowledge from a fine-tuned BERT-base teacher model to lightweight DistilBERT and ALBERT student models, optimised for efficient emotion recognition. Our approach integrates a hybrid loss function combining focal loss and Kullback-Leibler (KL) divergence to enhance minority class recognition, attention-head alignment for effective contextual knowledge transfer, and semantic-preserving data augmentation to mitigate class imbalance. Experiments on two datasets, Twitter Emotions 416 K samples, six classes, and Social Media Emotion 75 K samples, five classes, show that our distilled models achieve near-teacher performance 97.35% and 73.86% accuracy, respectively. with only a < 1% and < 6% accuracy drop, while reducing model size by 40% and inference latency by 3.2×. Notably, our method significantly improves F1-scores for minority classes. Our work sets a new state-of-the-art in efficient emotion recognition, enabling practical deployment in edge computing and mobile applications.
ISSN:2045-2322