Building a Gender-Bias-Resistant Super Corpus as a Deep Learning Baseline for Speech Emotion Recognition

The focus on Speech Emotion Recognition has dramatically increased in recent years, driven by the need for automatic speech-recognition-based systems and intelligent assistants to enhance user experience by incorporating emotional content. While deep learning techniques have significantly advanced S...

Full description

Saved in:
Bibliographic Details
Main Authors: Babak Abbaschian, Adel Elmaghraby
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/7/1991
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850189063705853952
author Babak Abbaschian
Adel Elmaghraby
author_facet Babak Abbaschian
Adel Elmaghraby
author_sort Babak Abbaschian
collection DOAJ
description The focus on Speech Emotion Recognition has dramatically increased in recent years, driven by the need for automatic speech-recognition-based systems and intelligent assistants to enhance user experience by incorporating emotional content. While deep learning techniques have significantly advanced SER systems, their robustness concerning speaker gender and out-of-distribution data has not been thoroughly examined. Furthermore, standards for SER remain rooted in landmark papers from the 2000s, even though modern deep learning architectures can achieve comparable or superior results to the state of the art of that era. In this research, we address these challenges by creating a new super corpus from existing databases, providing a larger pool of samples. We benchmark this dataset using various deep learning architectures, setting a new baseline for the task. Additionally, our experiments reveal that models trained on this super corpus demonstrate superior generalization and accuracy and exhibit lower gender bias compared to models trained on individual databases. We further show that traditional preprocessing techniques, such as denoising and normalization, are insufficient to address inherent biases in the data. However, our data augmentation approach effectively shifts these biases, improving model fairness across gender groups and emotions and, in some cases, fully debiasing the models.
format Article
id doaj-art-7681122da282451686ce8db53150edb9
institution OA Journals
issn 1424-8220
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-7681122da282451686ce8db53150edb92025-08-20T02:15:42ZengMDPI AGSensors1424-82202025-03-01257199110.3390/s25071991Building a Gender-Bias-Resistant Super Corpus as a Deep Learning Baseline for Speech Emotion RecognitionBabak Abbaschian0Adel Elmaghraby1Computer Science and Engineering Department, University of Louisville, Louisville, KY 40292, USAComputer Science and Engineering Department, University of Louisville, Louisville, KY 40292, USAThe focus on Speech Emotion Recognition has dramatically increased in recent years, driven by the need for automatic speech-recognition-based systems and intelligent assistants to enhance user experience by incorporating emotional content. While deep learning techniques have significantly advanced SER systems, their robustness concerning speaker gender and out-of-distribution data has not been thoroughly examined. Furthermore, standards for SER remain rooted in landmark papers from the 2000s, even though modern deep learning architectures can achieve comparable or superior results to the state of the art of that era. In this research, we address these challenges by creating a new super corpus from existing databases, providing a larger pool of samples. We benchmark this dataset using various deep learning architectures, setting a new baseline for the task. Additionally, our experiments reveal that models trained on this super corpus demonstrate superior generalization and accuracy and exhibit lower gender bias compared to models trained on individual databases. We further show that traditional preprocessing techniques, such as denoising and normalization, are insufficient to address inherent biases in the data. However, our data augmentation approach effectively shifts these biases, improving model fairness across gender groups and emotions and, in some cases, fully debiasing the models.https://www.mdpi.com/1424-8220/25/7/1991speech emotion recognitiondeep learningLSTMCNNgender biasfairness
spellingShingle Babak Abbaschian
Adel Elmaghraby
Building a Gender-Bias-Resistant Super Corpus as a Deep Learning Baseline for Speech Emotion Recognition
Sensors
speech emotion recognition
deep learning
LSTM
CNN
gender bias
fairness
title Building a Gender-Bias-Resistant Super Corpus as a Deep Learning Baseline for Speech Emotion Recognition
title_full Building a Gender-Bias-Resistant Super Corpus as a Deep Learning Baseline for Speech Emotion Recognition
title_fullStr Building a Gender-Bias-Resistant Super Corpus as a Deep Learning Baseline for Speech Emotion Recognition
title_full_unstemmed Building a Gender-Bias-Resistant Super Corpus as a Deep Learning Baseline for Speech Emotion Recognition
title_short Building a Gender-Bias-Resistant Super Corpus as a Deep Learning Baseline for Speech Emotion Recognition
title_sort building a gender bias resistant super corpus as a deep learning baseline for speech emotion recognition
topic speech emotion recognition
deep learning
LSTM
CNN
gender bias
fairness
url https://www.mdpi.com/1424-8220/25/7/1991
work_keys_str_mv AT babakabbaschian buildingagenderbiasresistantsupercorpusasadeeplearningbaselineforspeechemotionrecognition
AT adelelmaghraby buildingagenderbiasresistantsupercorpusasadeeplearningbaselineforspeechemotionrecognition