Building a Gender-Bias-Resistant Super Corpus as a Deep Learning Baseline for Speech Emotion Recognition
The focus on Speech Emotion Recognition has dramatically increased in recent years, driven by the need for automatic speech-recognition-based systems and intelligent assistants to enhance user experience by incorporating emotional content. While deep learning techniques have significantly advanced S...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Sensors |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1424-8220/25/7/1991 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850189063705853952 |
|---|---|
| author | Babak Abbaschian Adel Elmaghraby |
| author_facet | Babak Abbaschian Adel Elmaghraby |
| author_sort | Babak Abbaschian |
| collection | DOAJ |
| description | The focus on Speech Emotion Recognition has dramatically increased in recent years, driven by the need for automatic speech-recognition-based systems and intelligent assistants to enhance user experience by incorporating emotional content. While deep learning techniques have significantly advanced SER systems, their robustness concerning speaker gender and out-of-distribution data has not been thoroughly examined. Furthermore, standards for SER remain rooted in landmark papers from the 2000s, even though modern deep learning architectures can achieve comparable or superior results to the state of the art of that era. In this research, we address these challenges by creating a new super corpus from existing databases, providing a larger pool of samples. We benchmark this dataset using various deep learning architectures, setting a new baseline for the task. Additionally, our experiments reveal that models trained on this super corpus demonstrate superior generalization and accuracy and exhibit lower gender bias compared to models trained on individual databases. We further show that traditional preprocessing techniques, such as denoising and normalization, are insufficient to address inherent biases in the data. However, our data augmentation approach effectively shifts these biases, improving model fairness across gender groups and emotions and, in some cases, fully debiasing the models. |
| format | Article |
| id | doaj-art-7681122da282451686ce8db53150edb9 |
| institution | OA Journals |
| issn | 1424-8220 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Sensors |
| spelling | doaj-art-7681122da282451686ce8db53150edb92025-08-20T02:15:42ZengMDPI AGSensors1424-82202025-03-01257199110.3390/s25071991Building a Gender-Bias-Resistant Super Corpus as a Deep Learning Baseline for Speech Emotion RecognitionBabak Abbaschian0Adel Elmaghraby1Computer Science and Engineering Department, University of Louisville, Louisville, KY 40292, USAComputer Science and Engineering Department, University of Louisville, Louisville, KY 40292, USAThe focus on Speech Emotion Recognition has dramatically increased in recent years, driven by the need for automatic speech-recognition-based systems and intelligent assistants to enhance user experience by incorporating emotional content. While deep learning techniques have significantly advanced SER systems, their robustness concerning speaker gender and out-of-distribution data has not been thoroughly examined. Furthermore, standards for SER remain rooted in landmark papers from the 2000s, even though modern deep learning architectures can achieve comparable or superior results to the state of the art of that era. In this research, we address these challenges by creating a new super corpus from existing databases, providing a larger pool of samples. We benchmark this dataset using various deep learning architectures, setting a new baseline for the task. Additionally, our experiments reveal that models trained on this super corpus demonstrate superior generalization and accuracy and exhibit lower gender bias compared to models trained on individual databases. We further show that traditional preprocessing techniques, such as denoising and normalization, are insufficient to address inherent biases in the data. However, our data augmentation approach effectively shifts these biases, improving model fairness across gender groups and emotions and, in some cases, fully debiasing the models.https://www.mdpi.com/1424-8220/25/7/1991speech emotion recognitiondeep learningLSTMCNNgender biasfairness |
| spellingShingle | Babak Abbaschian Adel Elmaghraby Building a Gender-Bias-Resistant Super Corpus as a Deep Learning Baseline for Speech Emotion Recognition Sensors speech emotion recognition deep learning LSTM CNN gender bias fairness |
| title | Building a Gender-Bias-Resistant Super Corpus as a Deep Learning Baseline for Speech Emotion Recognition |
| title_full | Building a Gender-Bias-Resistant Super Corpus as a Deep Learning Baseline for Speech Emotion Recognition |
| title_fullStr | Building a Gender-Bias-Resistant Super Corpus as a Deep Learning Baseline for Speech Emotion Recognition |
| title_full_unstemmed | Building a Gender-Bias-Resistant Super Corpus as a Deep Learning Baseline for Speech Emotion Recognition |
| title_short | Building a Gender-Bias-Resistant Super Corpus as a Deep Learning Baseline for Speech Emotion Recognition |
| title_sort | building a gender bias resistant super corpus as a deep learning baseline for speech emotion recognition |
| topic | speech emotion recognition deep learning LSTM CNN gender bias fairness |
| url | https://www.mdpi.com/1424-8220/25/7/1991 |
| work_keys_str_mv | AT babakabbaschian buildingagenderbiasresistantsupercorpusasadeeplearningbaselineforspeechemotionrecognition AT adelelmaghraby buildingagenderbiasresistantsupercorpusasadeeplearningbaselineforspeechemotionrecognition |