Analysis of the fusion of multimodal sentiment perception and physiological signals in Chinese-English cross-cultural communication: Transformer approach incorporating self-attention enhancement

With the acceleration of globalization, cross-cultural communication has become a crucial issue in various fields. Emotion, as an essential component of communication, plays a key role in improving understanding and interaction efficiency across different cultures. However, accurately recognizing em...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xin Bi, Tian Zhang
Format:	Article
Language:	English
Published:	PeerJ Inc. 2025-05-01
Series:	PeerJ Computer Science
Subjects:	MFCC Cross cultural communication Transformer Information fusion
Online Access:	https://peerj.com/articles/cs-2890.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850140157309616128
author	Xin Bi Tian Zhang
author_facet	Xin Bi Tian Zhang
author_sort	Xin Bi
collection	DOAJ
description	With the acceleration of globalization, cross-cultural communication has become a crucial issue in various fields. Emotion, as an essential component of communication, plays a key role in improving understanding and interaction efficiency across different cultures. However, accurately recognizing emotions across cultural backgrounds remains a major challenge in affective computing, particularly due to limitations in multimodal feature fusion and temporal dependency modeling in traditional approaches. To address this, we propose the TAF-ATRM framework, which integrates Transformer and multi-head attention mechanisms for cross-cultural emotion recognition. Specifically, the framework employs bidirectional encoder representations from transformers (BERT) for semantic feature extraction from text, Mel-frequency Cepstral Coefficients (MFCC) and Residual Neural Network (ResNet) for capturing critical features from speech and facial expressions, respectively, thereby enhancing multimodal emotion recognition capability. To improve the fusion of multimodal data, the Transformer is utilized for temporal feature modeling, while multi-head attention reinforces feature representation by capturing complex inter-modal dependencies. The framework is evaluated on the MOSI and MOSEI datasets, where experimental results demonstrate that TAF-ATRM outperforms traditional methods in emotion classification accuracy and robustness, particularly in cross-cultural emotion recognition tasks. This study provides a strong technical foundation for future advancements in multimodal emotion analysis and cross-cultural affective computing.
format	Article
id	doaj-art-5164fa4cdb02466daea6c350742217f9
institution	OA Journals
issn	2376-5992
language	English
publishDate	2025-05-01
publisher	PeerJ Inc.
record_format	Article
series	PeerJ Computer Science
spelling	doaj-art-5164fa4cdb02466daea6c350742217f92025-08-20T02:29:55ZengPeerJ Inc.PeerJ Computer Science2376-59922025-05-0111e289010.7717/peerj-cs.2890Analysis of the fusion of multimodal sentiment perception and physiological signals in Chinese-English cross-cultural communication: Transformer approach incorporating self-attention enhancementXin Bi0Tian Zhang1School of Literature, Heilongjiang University, Harbin, Heilongjiang, ChinaDepartment of Languages and Literary Studies, Lafayette College, Easton, PA, United StatesWith the acceleration of globalization, cross-cultural communication has become a crucial issue in various fields. Emotion, as an essential component of communication, plays a key role in improving understanding and interaction efficiency across different cultures. However, accurately recognizing emotions across cultural backgrounds remains a major challenge in affective computing, particularly due to limitations in multimodal feature fusion and temporal dependency modeling in traditional approaches. To address this, we propose the TAF-ATRM framework, which integrates Transformer and multi-head attention mechanisms for cross-cultural emotion recognition. Specifically, the framework employs bidirectional encoder representations from transformers (BERT) for semantic feature extraction from text, Mel-frequency Cepstral Coefficients (MFCC) and Residual Neural Network (ResNet) for capturing critical features from speech and facial expressions, respectively, thereby enhancing multimodal emotion recognition capability. To improve the fusion of multimodal data, the Transformer is utilized for temporal feature modeling, while multi-head attention reinforces feature representation by capturing complex inter-modal dependencies. The framework is evaluated on the MOSI and MOSEI datasets, where experimental results demonstrate that TAF-ATRM outperforms traditional methods in emotion classification accuracy and robustness, particularly in cross-cultural emotion recognition tasks. This study provides a strong technical foundation for future advancements in multimodal emotion analysis and cross-cultural affective computing.https://peerj.com/articles/cs-2890.pdfMFCCCross cultural communicationTransformerInformation fusion
spellingShingle	Xin Bi Tian Zhang Analysis of the fusion of multimodal sentiment perception and physiological signals in Chinese-English cross-cultural communication: Transformer approach incorporating self-attention enhancement PeerJ Computer Science MFCC Cross cultural communication Transformer Information fusion
title	Analysis of the fusion of multimodal sentiment perception and physiological signals in Chinese-English cross-cultural communication: Transformer approach incorporating self-attention enhancement
title_full	Analysis of the fusion of multimodal sentiment perception and physiological signals in Chinese-English cross-cultural communication: Transformer approach incorporating self-attention enhancement
title_fullStr	Analysis of the fusion of multimodal sentiment perception and physiological signals in Chinese-English cross-cultural communication: Transformer approach incorporating self-attention enhancement
title_full_unstemmed	Analysis of the fusion of multimodal sentiment perception and physiological signals in Chinese-English cross-cultural communication: Transformer approach incorporating self-attention enhancement
title_short	Analysis of the fusion of multimodal sentiment perception and physiological signals in Chinese-English cross-cultural communication: Transformer approach incorporating self-attention enhancement
title_sort	analysis of the fusion of multimodal sentiment perception and physiological signals in chinese english cross cultural communication transformer approach incorporating self attention enhancement
topic	MFCC Cross cultural communication Transformer Information fusion
url	https://peerj.com/articles/cs-2890.pdf
work_keys_str_mv	AT xinbi analysisofthefusionofmultimodalsentimentperceptionandphysiologicalsignalsinchineseenglishcrossculturalcommunicationtransformerapproachincorporatingselfattentionenhancement AT tianzhang analysisofthefusionofmultimodalsentimentperceptionandphysiologicalsignalsinchineseenglishcrossculturalcommunicationtransformerapproachincorporatingselfattentionenhancement

Analysis of the fusion of multimodal sentiment perception and physiological signals in Chinese-English cross-cultural communication: Transformer approach incorporating self-attention enhancement

Similar Items