Multimodal Emotion Recognition Based on Facial Expressions, Speech, and EEG

<italic>Goal:</italic> As an essential human-machine interactive task, emotion recognition has become an emerging area over the decades. Although previous attempts to classify emotions have achieved high performance, several challenges remain open: 1) How to effectively recognize emotion...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jiahui Pan, Weijie Fang, Zhihang Zhang, Bingzhi Chen, Zheng Zhang, Shuihua Wang
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Open Journal of Engineering in Medicine and Biology
Subjects:	Multimodal emotion recognition electroencephalogram facial expressions speech
Online Access:	https://ieeexplore.ieee.org/document/10026861/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849422912043352064
author	Jiahui Pan Weijie Fang Zhihang Zhang Bingzhi Chen Zheng Zhang Shuihua Wang
author_facet	Jiahui Pan Weijie Fang Zhihang Zhang Bingzhi Chen Zheng Zhang Shuihua Wang
author_sort	Jiahui Pan
collection	DOAJ
description	<italic>Goal:</italic> As an essential human-machine interactive task, emotion recognition has become an emerging area over the decades. Although previous attempts to classify emotions have achieved high performance, several challenges remain open: 1) How to effectively recognize emotions using different modalities remains challenging. 2) Due to the increasing amount of computing power required for deep learning, how to provide real-time detection and improve the robustness of deep neural networks is important. <italic>Method:</italic> In this paper, we propose a deep learning-based multimodal emotion recognition (MER) called Deep-Emotion, which can adaptively integrate the most discriminating features from facial expressions, speech, and electroencephalogram (EEG) to improve the performance of the MER. Specifically, the proposed Deep-Emotion framework consists of three branches, i.e., the facial branch, speech branch, and EEG branch. Correspondingly, the facial branch uses the improved GhostNet neural network proposed in this paper for feature extraction, which effectively alleviates the overfitting phenomenon in the training process and improves the classification accuracy compared with the original GhostNet network. For work on the speech branch, this paper proposes a lightweight fully convolutional neural network (LFCNN) for the efficient extraction of speech emotion features. Regarding the study of EEG branches, we proposed a tree-like LSTM (tLSTM) model capable of fusing multi-stage features for EEG emotion feature extraction. Finally, we adopted the strategy of decision-level fusion to integrate the recognition results of the above three modes, resulting in more comprehensive and accurate performance. <italic>Result and Conclusions:</italic> Extensive experiments on the CK+, EMO-DB, and MAHNOB-HCI datasets have demonstrated the advanced nature of the Deep-Emotion method proposed in this paper, as well as the feasibility and superiority of the MER approach.
format	Article
id	doaj-art-4d724c47499a483da4d4393346854a3f
institution	Kabale University
issn	2644-1276
language	English
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Open Journal of Engineering in Medicine and Biology
spelling	doaj-art-4d724c47499a483da4d4393346854a3f2025-08-20T03:30:52ZengIEEEIEEE Open Journal of Engineering in Medicine and Biology2644-12762024-01-01539640310.1109/OJEMB.2023.324028010026861Multimodal Emotion Recognition Based on Facial Expressions, Speech, and EEGJiahui Pan0https://orcid.org/0000-0002-7576-6743Weijie Fang1https://orcid.org/0000-0003-0354-8373Zhihang Zhang2Bingzhi Chen3https://orcid.org/0000-0002-2497-6214Zheng Zhang4https://orcid.org/0000-0003-1470-6998Shuihua Wang5https://orcid.org/0000-0003-2238-6808School of Software, South China Normal University, Guangzhou, ChinaSchool of Software, South China Normal University, Guangzhou, ChinaSchool of Software, South China Normal University, Guangzhou, ChinaSchool of Software, South China Normal University, Guangzhou, ChinaShenzhen Medical Biometrics Perception and Analysis Engineering Laboratory, Harbin Institute of Technology, Shenzhen, ChinaSchool of Computing and Mathematical Sciences, University of Leicester, Leicester, U.K.<italic>Goal:</italic> As an essential human-machine interactive task, emotion recognition has become an emerging area over the decades. Although previous attempts to classify emotions have achieved high performance, several challenges remain open: 1) How to effectively recognize emotions using different modalities remains challenging. 2) Due to the increasing amount of computing power required for deep learning, how to provide real-time detection and improve the robustness of deep neural networks is important. <italic>Method:</italic> In this paper, we propose a deep learning-based multimodal emotion recognition (MER) called Deep-Emotion, which can adaptively integrate the most discriminating features from facial expressions, speech, and electroencephalogram (EEG) to improve the performance of the MER. Specifically, the proposed Deep-Emotion framework consists of three branches, i.e., the facial branch, speech branch, and EEG branch. Correspondingly, the facial branch uses the improved GhostNet neural network proposed in this paper for feature extraction, which effectively alleviates the overfitting phenomenon in the training process and improves the classification accuracy compared with the original GhostNet network. For work on the speech branch, this paper proposes a lightweight fully convolutional neural network (LFCNN) for the efficient extraction of speech emotion features. Regarding the study of EEG branches, we proposed a tree-like LSTM (tLSTM) model capable of fusing multi-stage features for EEG emotion feature extraction. Finally, we adopted the strategy of decision-level fusion to integrate the recognition results of the above three modes, resulting in more comprehensive and accurate performance. <italic>Result and Conclusions:</italic> Extensive experiments on the CK+, EMO-DB, and MAHNOB-HCI datasets have demonstrated the advanced nature of the Deep-Emotion method proposed in this paper, as well as the feasibility and superiority of the MER approach.https://ieeexplore.ieee.org/document/10026861/Multimodal emotion recognitionelectroencephalogramfacial expressionsspeech
spellingShingle	Jiahui Pan Weijie Fang Zhihang Zhang Bingzhi Chen Zheng Zhang Shuihua Wang Multimodal Emotion Recognition Based on Facial Expressions, Speech, and EEG IEEE Open Journal of Engineering in Medicine and Biology Multimodal emotion recognition electroencephalogram facial expressions speech
title	Multimodal Emotion Recognition Based on Facial Expressions, Speech, and EEG
title_full	Multimodal Emotion Recognition Based on Facial Expressions, Speech, and EEG
title_fullStr	Multimodal Emotion Recognition Based on Facial Expressions, Speech, and EEG
title_full_unstemmed	Multimodal Emotion Recognition Based on Facial Expressions, Speech, and EEG
title_short	Multimodal Emotion Recognition Based on Facial Expressions, Speech, and EEG
title_sort	multimodal emotion recognition based on facial expressions speech and eeg
topic	Multimodal emotion recognition electroencephalogram facial expressions speech
url	https://ieeexplore.ieee.org/document/10026861/
work_keys_str_mv	AT jiahuipan multimodalemotionrecognitionbasedonfacialexpressionsspeechandeeg AT weijiefang multimodalemotionrecognitionbasedonfacialexpressionsspeechandeeg AT zhihangzhang multimodalemotionrecognitionbasedonfacialexpressionsspeechandeeg AT bingzhichen multimodalemotionrecognitionbasedonfacialexpressionsspeechandeeg AT zhengzhang multimodalemotionrecognitionbasedonfacialexpressionsspeechandeeg AT shuihuawang multimodalemotionrecognitionbasedonfacialexpressionsspeechandeeg

Multimodal Emotion Recognition Based on Facial Expressions, Speech, and EEG

Similar Items