Multi-modal emotion recognition in conversation based on prompt learning with text-audio fusion features

Abstract With the widespread adoption of interactive machine applications, Emotion Recognition in Conversations (ERC) technology has garnered increasing attention. Although existing methods have improved recognition accuracy by integrating structured data, language barriers and the scarcity of non-E...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yuezhou Wu, Siling Zhang, Pengfei Li
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-03-01
Series:	Scientific Reports
Online Access:	https://doi.org/10.1038/s41598-025-89758-8
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850039571133235200
author	Yuezhou Wu Siling Zhang Pengfei Li
author_facet	Yuezhou Wu Siling Zhang Pengfei Li
author_sort	Yuezhou Wu
collection	DOAJ
description	Abstract With the widespread adoption of interactive machine applications, Emotion Recognition in Conversations (ERC) technology has garnered increasing attention. Although existing methods have improved recognition accuracy by integrating structured data, language barriers and the scarcity of non-English resources limit their cross-lingual applications. In light of this, the MERC-PLTAF method proposed in this paper innovatively focuses on multimodal emotion recognition in conversations, aiming to overcome the limitations of single modality and language barriers through refined feature extraction and a sophisticated cross-fusion strategy. We conducted extensive validation on multiple English and Chinese datasets, and the experimental results demonstrate that this method not only significantly improves emotion recognition accuracy but also exhibits exceptional performance on the Chinese M3ED dataset, paving a new path for cross-lingual emotion recognition. This research not only advances the boundaries of emotion recognition technology but also lays a solid theoretical foundation and practical framework for creating more intelligent and human-centric interactive experiences.
format	Article
id	doaj-art-877fc20f20df4acb91acdaaa6b791773
institution	DOAJ
issn	2045-2322
language	English
publishDate	2025-03-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-877fc20f20df4acb91acdaaa6b7917732025-08-20T02:56:19ZengNature PortfolioScientific Reports2045-23222025-03-0115111510.1038/s41598-025-89758-8Multi-modal emotion recognition in conversation based on prompt learning with text-audio fusion featuresYuezhou Wu0Siling Zhang1Pengfei Li2School of Computer Science, Civil Aviation Flight University of ChinaSchool of Computer Science, Civil Aviation Flight University of ChinaSchool of Computer Science, Civil Aviation Flight University of ChinaAbstract With the widespread adoption of interactive machine applications, Emotion Recognition in Conversations (ERC) technology has garnered increasing attention. Although existing methods have improved recognition accuracy by integrating structured data, language barriers and the scarcity of non-English resources limit their cross-lingual applications. In light of this, the MERC-PLTAF method proposed in this paper innovatively focuses on multimodal emotion recognition in conversations, aiming to overcome the limitations of single modality and language barriers through refined feature extraction and a sophisticated cross-fusion strategy. We conducted extensive validation on multiple English and Chinese datasets, and the experimental results demonstrate that this method not only significantly improves emotion recognition accuracy but also exhibits exceptional performance on the Chinese M3ED dataset, paving a new path for cross-lingual emotion recognition. This research not only advances the boundaries of emotion recognition technology but also lays a solid theoretical foundation and practical framework for creating more intelligent and human-centric interactive experiences.https://doi.org/10.1038/s41598-025-89758-8
spellingShingle	Yuezhou Wu Siling Zhang Pengfei Li Multi-modal emotion recognition in conversation based on prompt learning with text-audio fusion features Scientific Reports
title	Multi-modal emotion recognition in conversation based on prompt learning with text-audio fusion features
title_full	Multi-modal emotion recognition in conversation based on prompt learning with text-audio fusion features
title_fullStr	Multi-modal emotion recognition in conversation based on prompt learning with text-audio fusion features
title_full_unstemmed	Multi-modal emotion recognition in conversation based on prompt learning with text-audio fusion features
title_short	Multi-modal emotion recognition in conversation based on prompt learning with text-audio fusion features
title_sort	multi modal emotion recognition in conversation based on prompt learning with text audio fusion features
url	https://doi.org/10.1038/s41598-025-89758-8
work_keys_str_mv	AT yuezhouwu multimodalemotionrecognitioninconversationbasedonpromptlearningwithtextaudiofusionfeatures AT silingzhang multimodalemotionrecognitioninconversationbasedonpromptlearningwithtextaudiofusionfeatures AT pengfeili multimodalemotionrecognitioninconversationbasedonpromptlearningwithtextaudiofusionfeatures

Multi-modal emotion recognition in conversation based on prompt learning with text-audio fusion features

Similar Items