Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion

Virtual reality technology has been widely applied in various fields of society, and its content emotion recognition has received much attention. The recognition of emotions in virtual reality content can be employed to regulate emotional states in accordance with the emotional content, to treat men...

Full description

Saved in:

Bibliographic Details
Main Authors:	Siqi Guo, Mian Wu, Chunhui Zhang, Ling Zhong
Format:	Article
Language:	English
Published:	Elsevier 2025-06-01
Series:	Egyptian Informatics Journal
Subjects:	CNN LSTM Attention mechanism Emotion recognition Virtual reality
Online Access:	http://www.sciencedirect.com/science/article/pii/S1110866525000908
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Virtual reality technology has been widely applied in various fields of society, and its content emotion recognition has received much attention. The recognition of emotions in virtual reality content can be employed to regulate emotional states in accordance with the emotional content, to treat mental illness and to assess psychological cognition. Nevertheless, the current research on emotion induction and recognition of virtual reality scenes lacks scientific and quantitative methods for establishing the mapping relationship between virtual reality scenes and emotion labels. Furthermore, the associated methods lack clarity regarding image feature extraction, which contributes to the diminished accuracy of emotion recognition in virtual reality content. To solve the current issue of inaccurate emotion recognition in virtual reality content, this study combines convolutional neural networks and long short-term memory. The attention mechanism and multi-modal feature fusion are introduced to improve the speed of feature extraction and convergence. Finally, an improved algorithm-based emotion recognition model for panoramic audio and video virtual reality is proposed. The average accuracy of the proposed algorithm, XLNet-BIGRU-Attention algorithm, and CNN-BiLSTM algorithm was 98.87%, 90.25%, and 86.21%, respectively. The average precision was 98.97%, 97.24% and 97.69%, respectively. The proposed algorithm was significantly superior to the comparison algorithm. A performance comparison was conducted between panoramic audio and video virtual reality emotion recognition models based on the improved algorithm. The improved algorithm’s the mean square error is 0.17 and mean absolute error is 0.19, obviously better than other comparison models. In the analysis of visual classification results, the proposed model has the best classification aggregation effect and is significantly superior to other models. Therefore, the improved algorithm and the panoramic audio and video virtual reality emotion recognition model based on the improved algorithm have good effectiveness and practical value.
ISSN:	1110-8665

Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion

Similar Items