Graph convolutional network model with a feature compensation module and dual-channel second-order pooling module for multimodal emotion recognition in conversation
Abstract Multimodal emotion recognition in conversation (MERC) involves predicting the emotion category of a conversation on the basis of textual, acoustic, and visual modalities. Information from these diverse modalities can reinforce each other to enhance the accuracy of emotion prediction. Howeve...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-07-01
|
| Series: | Journal of King Saud University: Computer and Information Sciences |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44443-025-00091-6 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849341986665922560 |
|---|---|
| author | Xiaocong Tan Zhengze Gong Mengkun Gan Weijie Xie Wenhui Wang |
| author_facet | Xiaocong Tan Zhengze Gong Mengkun Gan Weijie Xie Wenhui Wang |
| author_sort | Xiaocong Tan |
| collection | DOAJ |
| description | Abstract Multimodal emotion recognition in conversation (MERC) involves predicting the emotion category of a conversation on the basis of textual, acoustic, and visual modalities. Information from these diverse modalities can reinforce each other to enhance the accuracy of emotion prediction. However, some information modalities may be absent in real-world applications and information from various modalities may be difficult to integrate. Therefore, a suitable strategy is required to compensate for missing modalities by using information from the available modalities and prioritizing important information. Consequently, this study developed a graph convolutional network (GCN) model with a feature compensation module and dual-channel second-order pooling module for MERC. This model initially uses a GCN to compensate for missing features by aggregating features corresponding to the same utterance node. Subsequently, it applies dual-channel second-order pooling to sift through and integrate all features. Empirical evaluations of the proposed model against other baseline models on two benchmark datasets, namely the IEMOCAP and MELD datasets, indicated that the proposed model outperformed the other models; this finding underscores the effectiveness of the proposed model in recognizing the emotions expressed in multimodal dialogue data. |
| format | Article |
| id | doaj-art-44aaee35f99247269e6ce24166c64221 |
| institution | Kabale University |
| issn | 1319-1578 2213-1248 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Springer |
| record_format | Article |
| series | Journal of King Saud University: Computer and Information Sciences |
| spelling | doaj-art-44aaee35f99247269e6ce24166c642212025-08-20T03:43:31ZengSpringerJournal of King Saud University: Computer and Information Sciences1319-15782213-12482025-07-0137511210.1007/s44443-025-00091-6Graph convolutional network model with a feature compensation module and dual-channel second-order pooling module for multimodal emotion recognition in conversationXiaocong Tan0Zhengze Gong1Mengkun Gan2Weijie Xie3Wenhui Wang4Information and Data Centre, Guangzhou First People’s Hospital, Guangzhou Medical UniversityInformation and Data Centre, Guangzhou First People’s Hospital, Guangzhou Medical UniversityInformation and Data Centre, Guangzhou First People’s Hospital, Guangzhou Medical UniversityInformation and Data Centre, Guangzhou First People’s Hospital, Guangzhou Medical UniversityInformation and Data Centre, Guangzhou First People’s Hospital, Guangzhou Medical UniversityAbstract Multimodal emotion recognition in conversation (MERC) involves predicting the emotion category of a conversation on the basis of textual, acoustic, and visual modalities. Information from these diverse modalities can reinforce each other to enhance the accuracy of emotion prediction. However, some information modalities may be absent in real-world applications and information from various modalities may be difficult to integrate. Therefore, a suitable strategy is required to compensate for missing modalities by using information from the available modalities and prioritizing important information. Consequently, this study developed a graph convolutional network (GCN) model with a feature compensation module and dual-channel second-order pooling module for MERC. This model initially uses a GCN to compensate for missing features by aggregating features corresponding to the same utterance node. Subsequently, it applies dual-channel second-order pooling to sift through and integrate all features. Empirical evaluations of the proposed model against other baseline models on two benchmark datasets, namely the IEMOCAP and MELD datasets, indicated that the proposed model outperformed the other models; this finding underscores the effectiveness of the proposed model in recognizing the emotions expressed in multimodal dialogue data.https://doi.org/10.1007/s44443-025-00091-6Damaged multimodal dataMultimodal emotion recognition in conversationGraph convolutional networks (GCNs)Dual-channel second-order pooling |
| spellingShingle | Xiaocong Tan Zhengze Gong Mengkun Gan Weijie Xie Wenhui Wang Graph convolutional network model with a feature compensation module and dual-channel second-order pooling module for multimodal emotion recognition in conversation Journal of King Saud University: Computer and Information Sciences Damaged multimodal data Multimodal emotion recognition in conversation Graph convolutional networks (GCNs) Dual-channel second-order pooling |
| title | Graph convolutional network model with a feature compensation module and dual-channel second-order pooling module for multimodal emotion recognition in conversation |
| title_full | Graph convolutional network model with a feature compensation module and dual-channel second-order pooling module for multimodal emotion recognition in conversation |
| title_fullStr | Graph convolutional network model with a feature compensation module and dual-channel second-order pooling module for multimodal emotion recognition in conversation |
| title_full_unstemmed | Graph convolutional network model with a feature compensation module and dual-channel second-order pooling module for multimodal emotion recognition in conversation |
| title_short | Graph convolutional network model with a feature compensation module and dual-channel second-order pooling module for multimodal emotion recognition in conversation |
| title_sort | graph convolutional network model with a feature compensation module and dual channel second order pooling module for multimodal emotion recognition in conversation |
| topic | Damaged multimodal data Multimodal emotion recognition in conversation Graph convolutional networks (GCNs) Dual-channel second-order pooling |
| url | https://doi.org/10.1007/s44443-025-00091-6 |
| work_keys_str_mv | AT xiaocongtan graphconvolutionalnetworkmodelwithafeaturecompensationmoduleanddualchannelsecondorderpoolingmoduleformultimodalemotionrecognitioninconversation AT zhengzegong graphconvolutionalnetworkmodelwithafeaturecompensationmoduleanddualchannelsecondorderpoolingmoduleformultimodalemotionrecognitioninconversation AT mengkungan graphconvolutionalnetworkmodelwithafeaturecompensationmoduleanddualchannelsecondorderpoolingmoduleformultimodalemotionrecognitioninconversation AT weijiexie graphconvolutionalnetworkmodelwithafeaturecompensationmoduleanddualchannelsecondorderpoolingmoduleformultimodalemotionrecognitioninconversation AT wenhuiwang graphconvolutionalnetworkmodelwithafeaturecompensationmoduleanddualchannelsecondorderpoolingmoduleformultimodalemotionrecognitioninconversation |