Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion

Virtual reality technology has been widely applied in various fields of society, and its content emotion recognition has received much attention. The recognition of emotions in virtual reality content can be employed to regulate emotional states in accordance with the emotional content, to treat men...

Full description

Saved in:

Bibliographic Details
Main Authors:	Siqi Guo, Mian Wu, Chunhui Zhang, Ling Zhong
Format:	Article
Language:	English
Published:	Elsevier 2025-06-01
Series:	Egyptian Informatics Journal
Subjects:	CNN LSTM Attention mechanism Emotion recognition Virtual reality
Online Access:	http://www.sciencedirect.com/science/article/pii/S1110866525000908
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850213794210381824
author	Siqi Guo Mian Wu Chunhui Zhang Ling Zhong
author_facet	Siqi Guo Mian Wu Chunhui Zhang Ling Zhong
author_sort	Siqi Guo
collection	DOAJ
description	Virtual reality technology has been widely applied in various fields of society, and its content emotion recognition has received much attention. The recognition of emotions in virtual reality content can be employed to regulate emotional states in accordance with the emotional content, to treat mental illness and to assess psychological cognition. Nevertheless, the current research on emotion induction and recognition of virtual reality scenes lacks scientific and quantitative methods for establishing the mapping relationship between virtual reality scenes and emotion labels. Furthermore, the associated methods lack clarity regarding image feature extraction, which contributes to the diminished accuracy of emotion recognition in virtual reality content. To solve the current issue of inaccurate emotion recognition in virtual reality content, this study combines convolutional neural networks and long short-term memory. The attention mechanism and multi-modal feature fusion are introduced to improve the speed of feature extraction and convergence. Finally, an improved algorithm-based emotion recognition model for panoramic audio and video virtual reality is proposed. The average accuracy of the proposed algorithm, XLNet-BIGRU-Attention algorithm, and CNN-BiLSTM algorithm was 98.87%, 90.25%, and 86.21%, respectively. The average precision was 98.97%, 97.24% and 97.69%, respectively. The proposed algorithm was significantly superior to the comparison algorithm. A performance comparison was conducted between panoramic audio and video virtual reality emotion recognition models based on the improved algorithm. The improved algorithm’s the mean square error is 0.17 and mean absolute error is 0.19, obviously better than other comparison models. In the analysis of visual classification results, the proposed model has the best classification aggregation effect and is significantly superior to other models. Therefore, the improved algorithm and the panoramic audio and video virtual reality emotion recognition model based on the improved algorithm have good effectiveness and practical value.
format	Article
id	doaj-art-6e802d4eaf7c41d289acf6cb8b789f16
institution	OA Journals
issn	1110-8665
language	English
publishDate	2025-06-01
publisher	Elsevier
record_format	Article
series	Egyptian Informatics Journal
spelling	doaj-art-6e802d4eaf7c41d289acf6cb8b789f162025-08-20T02:09:04ZengElsevierEgyptian Informatics Journal1110-86652025-06-013010069710.1016/j.eij.2025.100697Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusionSiqi Guo0Mian Wu1Chunhui Zhang2Ling Zhong3Corresponding author.; School of Art, East China Jiaotong University, Nanchang 330013, ChinaSchool of Art, East China Jiaotong University, Nanchang 330013, ChinaSchool of Art, East China Jiaotong University, Nanchang 330013, ChinaSchool of Art, East China Jiaotong University, Nanchang 330013, ChinaVirtual reality technology has been widely applied in various fields of society, and its content emotion recognition has received much attention. The recognition of emotions in virtual reality content can be employed to regulate emotional states in accordance with the emotional content, to treat mental illness and to assess psychological cognition. Nevertheless, the current research on emotion induction and recognition of virtual reality scenes lacks scientific and quantitative methods for establishing the mapping relationship between virtual reality scenes and emotion labels. Furthermore, the associated methods lack clarity regarding image feature extraction, which contributes to the diminished accuracy of emotion recognition in virtual reality content. To solve the current issue of inaccurate emotion recognition in virtual reality content, this study combines convolutional neural networks and long short-term memory. The attention mechanism and multi-modal feature fusion are introduced to improve the speed of feature extraction and convergence. Finally, an improved algorithm-based emotion recognition model for panoramic audio and video virtual reality is proposed. The average accuracy of the proposed algorithm, XLNet-BIGRU-Attention algorithm, and CNN-BiLSTM algorithm was 98.87%, 90.25%, and 86.21%, respectively. The average precision was 98.97%, 97.24% and 97.69%, respectively. The proposed algorithm was significantly superior to the comparison algorithm. A performance comparison was conducted between panoramic audio and video virtual reality emotion recognition models based on the improved algorithm. The improved algorithm’s the mean square error is 0.17 and mean absolute error is 0.19, obviously better than other comparison models. In the analysis of visual classification results, the proposed model has the best classification aggregation effect and is significantly superior to other models. Therefore, the improved algorithm and the panoramic audio and video virtual reality emotion recognition model based on the improved algorithm have good effectiveness and practical value.http://www.sciencedirect.com/science/article/pii/S1110866525000908CNNLSTMAttention mechanismEmotion recognitionVirtual reality
spellingShingle	Siqi Guo Mian Wu Chunhui Zhang Ling Zhong Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion Egyptian Informatics Journal CNN LSTM Attention mechanism Emotion recognition Virtual reality
title	Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion
title_full	Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion
title_fullStr	Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion
title_full_unstemmed	Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion
title_short	Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion
title_sort	emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion
topic	CNN LSTM Attention mechanism Emotion recognition Virtual reality
url	http://www.sciencedirect.com/science/article/pii/S1110866525000908
work_keys_str_mv	AT siqiguo emotionrecognitioninpanoramicaudioandvideovirtualrealitybasedondeeplearningandfeaturefusion AT mianwu emotionrecognitioninpanoramicaudioandvideovirtualrealitybasedondeeplearningandfeaturefusion AT chunhuizhang emotionrecognitioninpanoramicaudioandvideovirtualrealitybasedondeeplearningandfeaturefusion AT lingzhong emotionrecognitioninpanoramicaudioandvideovirtualrealitybasedondeeplearningandfeaturefusion

Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion

Similar Items