Graph convolutional network model with a feature compensation module and dual-channel second-order pooling module for multimodal emotion recognition in conversation

Abstract Multimodal emotion recognition in conversation (MERC) involves predicting the emotion category of a conversation on the basis of textual, acoustic, and visual modalities. Information from these diverse modalities can reinforce each other to enhance the accuracy of emotion prediction. Howeve...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaocong Tan, Zhengze Gong, Mengkun Gan, Weijie Xie, Wenhui Wang
Format: Article
Language:English
Published: Springer 2025-07-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:https://doi.org/10.1007/s44443-025-00091-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849341986665922560
author Xiaocong Tan
Zhengze Gong
Mengkun Gan
Weijie Xie
Wenhui Wang
author_facet Xiaocong Tan
Zhengze Gong
Mengkun Gan
Weijie Xie
Wenhui Wang
author_sort Xiaocong Tan
collection DOAJ
description Abstract Multimodal emotion recognition in conversation (MERC) involves predicting the emotion category of a conversation on the basis of textual, acoustic, and visual modalities. Information from these diverse modalities can reinforce each other to enhance the accuracy of emotion prediction. However, some information modalities may be absent in real-world applications and information from various modalities may be difficult to integrate. Therefore, a suitable strategy is required to compensate for missing modalities by using information from the available modalities and prioritizing important information. Consequently, this study developed a graph convolutional network (GCN) model with a feature compensation module and dual-channel second-order pooling module for MERC. This model initially uses a GCN to compensate for missing features by aggregating features corresponding to the same utterance node. Subsequently, it applies dual-channel second-order pooling to sift through and integrate all features. Empirical evaluations of the proposed model against other baseline models on two benchmark datasets, namely the IEMOCAP and MELD datasets, indicated that the proposed model outperformed the other models; this finding underscores the effectiveness of the proposed model in recognizing the emotions expressed in multimodal dialogue data.
format Article
id doaj-art-44aaee35f99247269e6ce24166c64221
institution Kabale University
issn 1319-1578
2213-1248
language English
publishDate 2025-07-01
publisher Springer
record_format Article
series Journal of King Saud University: Computer and Information Sciences
spelling doaj-art-44aaee35f99247269e6ce24166c642212025-08-20T03:43:31ZengSpringerJournal of King Saud University: Computer and Information Sciences1319-15782213-12482025-07-0137511210.1007/s44443-025-00091-6Graph convolutional network model with a feature compensation module and dual-channel second-order pooling module for multimodal emotion recognition in conversationXiaocong Tan0Zhengze Gong1Mengkun Gan2Weijie Xie3Wenhui Wang4Information and Data Centre, Guangzhou First People’s Hospital, Guangzhou Medical UniversityInformation and Data Centre, Guangzhou First People’s Hospital, Guangzhou Medical UniversityInformation and Data Centre, Guangzhou First People’s Hospital, Guangzhou Medical UniversityInformation and Data Centre, Guangzhou First People’s Hospital, Guangzhou Medical UniversityInformation and Data Centre, Guangzhou First People’s Hospital, Guangzhou Medical UniversityAbstract Multimodal emotion recognition in conversation (MERC) involves predicting the emotion category of a conversation on the basis of textual, acoustic, and visual modalities. Information from these diverse modalities can reinforce each other to enhance the accuracy of emotion prediction. However, some information modalities may be absent in real-world applications and information from various modalities may be difficult to integrate. Therefore, a suitable strategy is required to compensate for missing modalities by using information from the available modalities and prioritizing important information. Consequently, this study developed a graph convolutional network (GCN) model with a feature compensation module and dual-channel second-order pooling module for MERC. This model initially uses a GCN to compensate for missing features by aggregating features corresponding to the same utterance node. Subsequently, it applies dual-channel second-order pooling to sift through and integrate all features. Empirical evaluations of the proposed model against other baseline models on two benchmark datasets, namely the IEMOCAP and MELD datasets, indicated that the proposed model outperformed the other models; this finding underscores the effectiveness of the proposed model in recognizing the emotions expressed in multimodal dialogue data.https://doi.org/10.1007/s44443-025-00091-6Damaged multimodal dataMultimodal emotion recognition in conversationGraph convolutional networks (GCNs)Dual-channel second-order pooling
spellingShingle Xiaocong Tan
Zhengze Gong
Mengkun Gan
Weijie Xie
Wenhui Wang
Graph convolutional network model with a feature compensation module and dual-channel second-order pooling module for multimodal emotion recognition in conversation
Journal of King Saud University: Computer and Information Sciences
Damaged multimodal data
Multimodal emotion recognition in conversation
Graph convolutional networks (GCNs)
Dual-channel second-order pooling
title Graph convolutional network model with a feature compensation module and dual-channel second-order pooling module for multimodal emotion recognition in conversation
title_full Graph convolutional network model with a feature compensation module and dual-channel second-order pooling module for multimodal emotion recognition in conversation
title_fullStr Graph convolutional network model with a feature compensation module and dual-channel second-order pooling module for multimodal emotion recognition in conversation
title_full_unstemmed Graph convolutional network model with a feature compensation module and dual-channel second-order pooling module for multimodal emotion recognition in conversation
title_short Graph convolutional network model with a feature compensation module and dual-channel second-order pooling module for multimodal emotion recognition in conversation
title_sort graph convolutional network model with a feature compensation module and dual channel second order pooling module for multimodal emotion recognition in conversation
topic Damaged multimodal data
Multimodal emotion recognition in conversation
Graph convolutional networks (GCNs)
Dual-channel second-order pooling
url https://doi.org/10.1007/s44443-025-00091-6
work_keys_str_mv AT xiaocongtan graphconvolutionalnetworkmodelwithafeaturecompensationmoduleanddualchannelsecondorderpoolingmoduleformultimodalemotionrecognitioninconversation
AT zhengzegong graphconvolutionalnetworkmodelwithafeaturecompensationmoduleanddualchannelsecondorderpoolingmoduleformultimodalemotionrecognitioninconversation
AT mengkungan graphconvolutionalnetworkmodelwithafeaturecompensationmoduleanddualchannelsecondorderpoolingmoduleformultimodalemotionrecognitioninconversation
AT weijiexie graphconvolutionalnetworkmodelwithafeaturecompensationmoduleanddualchannelsecondorderpoolingmoduleformultimodalemotionrecognitioninconversation
AT wenhuiwang graphconvolutionalnetworkmodelwithafeaturecompensationmoduleanddualchannelsecondorderpoolingmoduleformultimodalemotionrecognitioninconversation