Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion

Virtual reality technology has been widely applied in various fields of society, and its content emotion recognition has received much attention. The recognition of emotions in virtual reality content can be employed to regulate emotional states in accordance with the emotional content, to treat men...

Full description

Saved in:
Bibliographic Details
Main Authors: Siqi Guo, Mian Wu, Chunhui Zhang, Ling Zhong
Format: Article
Language:English
Published: Elsevier 2025-06-01
Series:Egyptian Informatics Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1110866525000908
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850213794210381824
author Siqi Guo
Mian Wu
Chunhui Zhang
Ling Zhong
author_facet Siqi Guo
Mian Wu
Chunhui Zhang
Ling Zhong
author_sort Siqi Guo
collection DOAJ
description Virtual reality technology has been widely applied in various fields of society, and its content emotion recognition has received much attention. The recognition of emotions in virtual reality content can be employed to regulate emotional states in accordance with the emotional content, to treat mental illness and to assess psychological cognition. Nevertheless, the current research on emotion induction and recognition of virtual reality scenes lacks scientific and quantitative methods for establishing the mapping relationship between virtual reality scenes and emotion labels. Furthermore, the associated methods lack clarity regarding image feature extraction, which contributes to the diminished accuracy of emotion recognition in virtual reality content. To solve the current issue of inaccurate emotion recognition in virtual reality content, this study combines convolutional neural networks and long short-term memory. The attention mechanism and multi-modal feature fusion are introduced to improve the speed of feature extraction and convergence. Finally, an improved algorithm-based emotion recognition model for panoramic audio and video virtual reality is proposed. The average accuracy of the proposed algorithm, XLNet-BIGRU-Attention algorithm, and CNN-BiLSTM algorithm was 98.87%, 90.25%, and 86.21%, respectively. The average precision was 98.97%, 97.24% and 97.69%, respectively. The proposed algorithm was significantly superior to the comparison algorithm. A performance comparison was conducted between panoramic audio and video virtual reality emotion recognition models based on the improved algorithm. The improved algorithm’s the mean square error is 0.17 and mean absolute error is 0.19, obviously better than other comparison models. In the analysis of visual classification results, the proposed model has the best classification aggregation effect and is significantly superior to other models. Therefore, the improved algorithm and the panoramic audio and video virtual reality emotion recognition model based on the improved algorithm have good effectiveness and practical value.
format Article
id doaj-art-6e802d4eaf7c41d289acf6cb8b789f16
institution OA Journals
issn 1110-8665
language English
publishDate 2025-06-01
publisher Elsevier
record_format Article
series Egyptian Informatics Journal
spelling doaj-art-6e802d4eaf7c41d289acf6cb8b789f162025-08-20T02:09:04ZengElsevierEgyptian Informatics Journal1110-86652025-06-013010069710.1016/j.eij.2025.100697Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusionSiqi Guo0Mian Wu1Chunhui Zhang2Ling Zhong3Corresponding author.; School of Art, East China Jiaotong University, Nanchang 330013, ChinaSchool of Art, East China Jiaotong University, Nanchang 330013, ChinaSchool of Art, East China Jiaotong University, Nanchang 330013, ChinaSchool of Art, East China Jiaotong University, Nanchang 330013, ChinaVirtual reality technology has been widely applied in various fields of society, and its content emotion recognition has received much attention. The recognition of emotions in virtual reality content can be employed to regulate emotional states in accordance with the emotional content, to treat mental illness and to assess psychological cognition. Nevertheless, the current research on emotion induction and recognition of virtual reality scenes lacks scientific and quantitative methods for establishing the mapping relationship between virtual reality scenes and emotion labels. Furthermore, the associated methods lack clarity regarding image feature extraction, which contributes to the diminished accuracy of emotion recognition in virtual reality content. To solve the current issue of inaccurate emotion recognition in virtual reality content, this study combines convolutional neural networks and long short-term memory. The attention mechanism and multi-modal feature fusion are introduced to improve the speed of feature extraction and convergence. Finally, an improved algorithm-based emotion recognition model for panoramic audio and video virtual reality is proposed. The average accuracy of the proposed algorithm, XLNet-BIGRU-Attention algorithm, and CNN-BiLSTM algorithm was 98.87%, 90.25%, and 86.21%, respectively. The average precision was 98.97%, 97.24% and 97.69%, respectively. The proposed algorithm was significantly superior to the comparison algorithm. A performance comparison was conducted between panoramic audio and video virtual reality emotion recognition models based on the improved algorithm. The improved algorithm’s the mean square error is 0.17 and mean absolute error is 0.19, obviously better than other comparison models. In the analysis of visual classification results, the proposed model has the best classification aggregation effect and is significantly superior to other models. Therefore, the improved algorithm and the panoramic audio and video virtual reality emotion recognition model based on the improved algorithm have good effectiveness and practical value.http://www.sciencedirect.com/science/article/pii/S1110866525000908CNNLSTMAttention mechanismEmotion recognitionVirtual reality
spellingShingle Siqi Guo
Mian Wu
Chunhui Zhang
Ling Zhong
Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion
Egyptian Informatics Journal
CNN
LSTM
Attention mechanism
Emotion recognition
Virtual reality
title Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion
title_full Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion
title_fullStr Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion
title_full_unstemmed Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion
title_short Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion
title_sort emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion
topic CNN
LSTM
Attention mechanism
Emotion recognition
Virtual reality
url http://www.sciencedirect.com/science/article/pii/S1110866525000908
work_keys_str_mv AT siqiguo emotionrecognitioninpanoramicaudioandvideovirtualrealitybasedondeeplearningandfeaturefusion
AT mianwu emotionrecognitioninpanoramicaudioandvideovirtualrealitybasedondeeplearningandfeaturefusion
AT chunhuizhang emotionrecognitioninpanoramicaudioandvideovirtualrealitybasedondeeplearningandfeaturefusion
AT lingzhong emotionrecognitioninpanoramicaudioandvideovirtualrealitybasedondeeplearningandfeaturefusion