Analysis of the fusion of multimodal sentiment perception and physiological signals in Chinese-English cross-cultural communication: Transformer approach incorporating self-attention enhancement

With the acceleration of globalization, cross-cultural communication has become a crucial issue in various fields. Emotion, as an essential component of communication, plays a key role in improving understanding and interaction efficiency across different cultures. However, accurately recognizing em...

Full description

Saved in:
Bibliographic Details
Main Authors: Xin Bi, Tian Zhang
Format: Article
Language:English
Published: PeerJ Inc. 2025-05-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-2890.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850140157309616128
author Xin Bi
Tian Zhang
author_facet Xin Bi
Tian Zhang
author_sort Xin Bi
collection DOAJ
description With the acceleration of globalization, cross-cultural communication has become a crucial issue in various fields. Emotion, as an essential component of communication, plays a key role in improving understanding and interaction efficiency across different cultures. However, accurately recognizing emotions across cultural backgrounds remains a major challenge in affective computing, particularly due to limitations in multimodal feature fusion and temporal dependency modeling in traditional approaches. To address this, we propose the TAF-ATRM framework, which integrates Transformer and multi-head attention mechanisms for cross-cultural emotion recognition. Specifically, the framework employs bidirectional encoder representations from transformers (BERT) for semantic feature extraction from text, Mel-frequency Cepstral Coefficients (MFCC) and Residual Neural Network (ResNet) for capturing critical features from speech and facial expressions, respectively, thereby enhancing multimodal emotion recognition capability. To improve the fusion of multimodal data, the Transformer is utilized for temporal feature modeling, while multi-head attention reinforces feature representation by capturing complex inter-modal dependencies. The framework is evaluated on the MOSI and MOSEI datasets, where experimental results demonstrate that TAF-ATRM outperforms traditional methods in emotion classification accuracy and robustness, particularly in cross-cultural emotion recognition tasks. This study provides a strong technical foundation for future advancements in multimodal emotion analysis and cross-cultural affective computing.
format Article
id doaj-art-5164fa4cdb02466daea6c350742217f9
institution OA Journals
issn 2376-5992
language English
publishDate 2025-05-01
publisher PeerJ Inc.
record_format Article
series PeerJ Computer Science
spelling doaj-art-5164fa4cdb02466daea6c350742217f92025-08-20T02:29:55ZengPeerJ Inc.PeerJ Computer Science2376-59922025-05-0111e289010.7717/peerj-cs.2890Analysis of the fusion of multimodal sentiment perception and physiological signals in Chinese-English cross-cultural communication: Transformer approach incorporating self-attention enhancementXin Bi0Tian Zhang1School of Literature, Heilongjiang University, Harbin, Heilongjiang, ChinaDepartment of Languages and Literary Studies, Lafayette College, Easton, PA, United StatesWith the acceleration of globalization, cross-cultural communication has become a crucial issue in various fields. Emotion, as an essential component of communication, plays a key role in improving understanding and interaction efficiency across different cultures. However, accurately recognizing emotions across cultural backgrounds remains a major challenge in affective computing, particularly due to limitations in multimodal feature fusion and temporal dependency modeling in traditional approaches. To address this, we propose the TAF-ATRM framework, which integrates Transformer and multi-head attention mechanisms for cross-cultural emotion recognition. Specifically, the framework employs bidirectional encoder representations from transformers (BERT) for semantic feature extraction from text, Mel-frequency Cepstral Coefficients (MFCC) and Residual Neural Network (ResNet) for capturing critical features from speech and facial expressions, respectively, thereby enhancing multimodal emotion recognition capability. To improve the fusion of multimodal data, the Transformer is utilized for temporal feature modeling, while multi-head attention reinforces feature representation by capturing complex inter-modal dependencies. The framework is evaluated on the MOSI and MOSEI datasets, where experimental results demonstrate that TAF-ATRM outperforms traditional methods in emotion classification accuracy and robustness, particularly in cross-cultural emotion recognition tasks. This study provides a strong technical foundation for future advancements in multimodal emotion analysis and cross-cultural affective computing.https://peerj.com/articles/cs-2890.pdfMFCCCross cultural communicationTransformerInformation fusion
spellingShingle Xin Bi
Tian Zhang
Analysis of the fusion of multimodal sentiment perception and physiological signals in Chinese-English cross-cultural communication: Transformer approach incorporating self-attention enhancement
PeerJ Computer Science
MFCC
Cross cultural communication
Transformer
Information fusion
title Analysis of the fusion of multimodal sentiment perception and physiological signals in Chinese-English cross-cultural communication: Transformer approach incorporating self-attention enhancement
title_full Analysis of the fusion of multimodal sentiment perception and physiological signals in Chinese-English cross-cultural communication: Transformer approach incorporating self-attention enhancement
title_fullStr Analysis of the fusion of multimodal sentiment perception and physiological signals in Chinese-English cross-cultural communication: Transformer approach incorporating self-attention enhancement
title_full_unstemmed Analysis of the fusion of multimodal sentiment perception and physiological signals in Chinese-English cross-cultural communication: Transformer approach incorporating self-attention enhancement
title_short Analysis of the fusion of multimodal sentiment perception and physiological signals in Chinese-English cross-cultural communication: Transformer approach incorporating self-attention enhancement
title_sort analysis of the fusion of multimodal sentiment perception and physiological signals in chinese english cross cultural communication transformer approach incorporating self attention enhancement
topic MFCC
Cross cultural communication
Transformer
Information fusion
url https://peerj.com/articles/cs-2890.pdf
work_keys_str_mv AT xinbi analysisofthefusionofmultimodalsentimentperceptionandphysiologicalsignalsinchineseenglishcrossculturalcommunicationtransformerapproachincorporatingselfattentionenhancement
AT tianzhang analysisofthefusionofmultimodalsentimentperceptionandphysiologicalsignalsinchineseenglishcrossculturalcommunicationtransformerapproachincorporatingselfattentionenhancement