Multimodal emotion recognition method in complex dynamic scenes

Multimodal emotion recognition technology leverages the power of deep learning to address advanced visual and emotional tasks. While generic deep networks can handle simple emotion recognition tasks, their generalization capability in complex and noisy environments, such as multi-scene outdoor setti...

Full description

Saved in:
Bibliographic Details
Main Authors: Long Liu, Qingquan Luo, Wenbo Zhang, Mengxuan Zhang, Bowen Zhai
Format: Article
Language:English
Published: KeAi Communications Co., Ltd. 2025-05-01
Series:Journal of Information and Intelligence
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2949715925000046
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849433355969363968
author Long Liu
Qingquan Luo
Wenbo Zhang
Mengxuan Zhang
Bowen Zhai
author_facet Long Liu
Qingquan Luo
Wenbo Zhang
Mengxuan Zhang
Bowen Zhai
author_sort Long Liu
collection DOAJ
description Multimodal emotion recognition technology leverages the power of deep learning to address advanced visual and emotional tasks. While generic deep networks can handle simple emotion recognition tasks, their generalization capability in complex and noisy environments, such as multi-scene outdoor settings, remains limited. To overcome these challenges, this paper proposes a novel multimodal emotion recognition framework. First, we develop a robust network architecture based on the T5-small model, designed for dynamic-static fusion in complex scenarios, effectively mitigating the impact of noise. Second, we introduce a dynamic-static cross fusion network (D-SCFN) to enhance the integration and extraction of dynamic and static information, embedding it seamlessly within the T5 framework. Finally, we design and evaluate three distinct multi-task analysis frameworks to explore dependencies among tasks. The experimental results demonstrate that our model significantly outperforms other existing models, showcasing exceptional stability and remarkable adaptability to complex and dynamic scenarios.
format Article
id doaj-art-08b67d6e0c524fcf919dab65d31fee3d
institution Kabale University
issn 2949-7159
language English
publishDate 2025-05-01
publisher KeAi Communications Co., Ltd.
record_format Article
series Journal of Information and Intelligence
spelling doaj-art-08b67d6e0c524fcf919dab65d31fee3d2025-08-20T03:27:05ZengKeAi Communications Co., Ltd.Journal of Information and Intelligence2949-71592025-05-013325727410.1016/j.jiixd.2025.02.004Multimodal emotion recognition method in complex dynamic scenesLong Liu0Qingquan Luo1Wenbo Zhang2Mengxuan Zhang3Bowen Zhai4School of Electronic Engineering, Xidian University, Xi'an 710071, China; Corresponding author.School of Electronic Engineering, Xidian University, Xi'an 710071, ChinaSchool of Electronic Engineering, Xidian University, Xi'an 710071, ChinaSchool of Artificial Intelligence, Xidian University, Xi'an 710071, ChinaSchool of Electronic Engineering, Xidian University, Xi'an 710071, ChinaMultimodal emotion recognition technology leverages the power of deep learning to address advanced visual and emotional tasks. While generic deep networks can handle simple emotion recognition tasks, their generalization capability in complex and noisy environments, such as multi-scene outdoor settings, remains limited. To overcome these challenges, this paper proposes a novel multimodal emotion recognition framework. First, we develop a robust network architecture based on the T5-small model, designed for dynamic-static fusion in complex scenarios, effectively mitigating the impact of noise. Second, we introduce a dynamic-static cross fusion network (D-SCFN) to enhance the integration and extraction of dynamic and static information, embedding it seamlessly within the T5 framework. Finally, we design and evaluate three distinct multi-task analysis frameworks to explore dependencies among tasks. The experimental results demonstrate that our model significantly outperforms other existing models, showcasing exceptional stability and remarkable adaptability to complex and dynamic scenarios.http://www.sciencedirect.com/science/article/pii/S2949715925000046Multimodal sentiment recognitionAttention mechanismsContrastive learningMultitask analysis introduction
spellingShingle Long Liu
Qingquan Luo
Wenbo Zhang
Mengxuan Zhang
Bowen Zhai
Multimodal emotion recognition method in complex dynamic scenes
Journal of Information and Intelligence
Multimodal sentiment recognition
Attention mechanisms
Contrastive learning
Multitask analysis introduction
title Multimodal emotion recognition method in complex dynamic scenes
title_full Multimodal emotion recognition method in complex dynamic scenes
title_fullStr Multimodal emotion recognition method in complex dynamic scenes
title_full_unstemmed Multimodal emotion recognition method in complex dynamic scenes
title_short Multimodal emotion recognition method in complex dynamic scenes
title_sort multimodal emotion recognition method in complex dynamic scenes
topic Multimodal sentiment recognition
Attention mechanisms
Contrastive learning
Multitask analysis introduction
url http://www.sciencedirect.com/science/article/pii/S2949715925000046
work_keys_str_mv AT longliu multimodalemotionrecognitionmethodincomplexdynamicscenes
AT qingquanluo multimodalemotionrecognitionmethodincomplexdynamicscenes
AT wenbozhang multimodalemotionrecognitionmethodincomplexdynamicscenes
AT mengxuanzhang multimodalemotionrecognitionmethodincomplexdynamicscenes
AT bowenzhai multimodalemotionrecognitionmethodincomplexdynamicscenes