BiMER: Design and Implementation of a Bimodal Emotion Recognition System Enhanced by Data Augmentation Techniques

In today’s world, accurately understanding and interpreting emotions in human-computer interaction is important. In this context, this study has adopted a detailed approach to the emotion recognition problem on both speech and text data using the Interactive Emotional Dyadic Motion Captur...

Full description

Saved in:
Bibliographic Details
Main Authors: Emrah Dikbiyik, Onder Demir, Buket Dogan
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10960679/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850179858867421184
author Emrah Dikbiyik
Onder Demir
Buket Dogan
author_facet Emrah Dikbiyik
Onder Demir
Buket Dogan
author_sort Emrah Dikbiyik
collection DOAJ
description In today’s world, accurately understanding and interpreting emotions in human-computer interaction is important. In this context, this study has adopted a detailed approach to the emotion recognition problem on both speech and text data using the Interactive Emotional Dyadic Motion Capture (IEMOCAP) dataset. First, the problem of datasets with limited number of records and unbalanced distribution across classes was addressed. For this purpose, a dataset obtained from records created as improvised in the IEMOCAP dataset was used and data augmentation methods were applied for both speech and text data. Using datasets that were balanced by applying data augmentation, single-mode emotion recognition experiments were performed with models developed for Speech Emotion Recognition (SER) and Textual Emotion Recognition (TER). Subsequently, the features obtained from these two single modalities were combined with the intermediate fusion method to provide more comprehensive emotion recognition and accuracy, and the Bimodal Emotion Recognition (BiMER) system was developed. The ResNet50-CRNN+AT model, which we obtained the highest accuracy from the three different models developed for SER, creates the speech mode of BiMER, while the Bidirectional Encoder Representations from Transformers (BERT) model used for TER creates the text mode of BiMER. In this way, BiMER was supported with data augmentation methods and the robustness and generalization ability of the model were improved, reaching 88.33% accuracy. Finally, the developed BiMER system was implemented as a real-time web application using the Flask framework, and the capacity of this application to recognize emotions interactively through the user interface was tested.
format Article
id doaj-art-5d4bf5371f7e46658051bfd951b28a25
institution OA Journals
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-5d4bf5371f7e46658051bfd951b28a252025-08-20T02:18:24ZengIEEEIEEE Access2169-35362025-01-0113643306435210.1109/ACCESS.2025.355933910960679BiMER: Design and Implementation of a Bimodal Emotion Recognition System Enhanced by Data Augmentation TechniquesEmrah Dikbiyik0https://orcid.org/0000-0002-3568-4938Onder Demir1https://orcid.org/0000-0003-4540-663XBuket Dogan2https://orcid.org/0000-0003-1062-2439Department of Computer Technologies, Vocational School of Technical Sciences, Istanbul University-Cerrahpaşa, Istanbul, TürkiyeDepartment of Computer Engineering, Faculty of Technology, Marmara University, Istanbul, TürkiyeDepartment of Computer Engineering, Faculty of Technology, Marmara University, Istanbul, TürkiyeIn today’s world, accurately understanding and interpreting emotions in human-computer interaction is important. In this context, this study has adopted a detailed approach to the emotion recognition problem on both speech and text data using the Interactive Emotional Dyadic Motion Capture (IEMOCAP) dataset. First, the problem of datasets with limited number of records and unbalanced distribution across classes was addressed. For this purpose, a dataset obtained from records created as improvised in the IEMOCAP dataset was used and data augmentation methods were applied for both speech and text data. Using datasets that were balanced by applying data augmentation, single-mode emotion recognition experiments were performed with models developed for Speech Emotion Recognition (SER) and Textual Emotion Recognition (TER). Subsequently, the features obtained from these two single modalities were combined with the intermediate fusion method to provide more comprehensive emotion recognition and accuracy, and the Bimodal Emotion Recognition (BiMER) system was developed. The ResNet50-CRNN+AT model, which we obtained the highest accuracy from the three different models developed for SER, creates the speech mode of BiMER, while the Bidirectional Encoder Representations from Transformers (BERT) model used for TER creates the text mode of BiMER. In this way, BiMER was supported with data augmentation methods and the robustness and generalization ability of the model were improved, reaching 88.33% accuracy. Finally, the developed BiMER system was implemented as a real-time web application using the Flask framework, and the capacity of this application to recognize emotions interactively through the user interface was tested.https://ieeexplore.ieee.org/document/10960679/Bimodal emotion recognitionintermediate fusiondata augmentationreal-time emotion recognitionIEMOCAP
spellingShingle Emrah Dikbiyik
Onder Demir
Buket Dogan
BiMER: Design and Implementation of a Bimodal Emotion Recognition System Enhanced by Data Augmentation Techniques
IEEE Access
Bimodal emotion recognition
intermediate fusion
data augmentation
real-time emotion recognition
IEMOCAP
title BiMER: Design and Implementation of a Bimodal Emotion Recognition System Enhanced by Data Augmentation Techniques
title_full BiMER: Design and Implementation of a Bimodal Emotion Recognition System Enhanced by Data Augmentation Techniques
title_fullStr BiMER: Design and Implementation of a Bimodal Emotion Recognition System Enhanced by Data Augmentation Techniques
title_full_unstemmed BiMER: Design and Implementation of a Bimodal Emotion Recognition System Enhanced by Data Augmentation Techniques
title_short BiMER: Design and Implementation of a Bimodal Emotion Recognition System Enhanced by Data Augmentation Techniques
title_sort bimer design and implementation of a bimodal emotion recognition system enhanced by data augmentation techniques
topic Bimodal emotion recognition
intermediate fusion
data augmentation
real-time emotion recognition
IEMOCAP
url https://ieeexplore.ieee.org/document/10960679/
work_keys_str_mv AT emrahdikbiyik bimerdesignandimplementationofabimodalemotionrecognitionsystemenhancedbydataaugmentationtechniques
AT onderdemir bimerdesignandimplementationofabimodalemotionrecognitionsystemenhancedbydataaugmentationtechniques
AT buketdogan bimerdesignandimplementationofabimodalemotionrecognitionsystemenhancedbydataaugmentationtechniques