Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion
Virtual reality technology has been widely applied in various fields of society, and its content emotion recognition has received much attention. The recognition of emotions in virtual reality content can be employed to regulate emotional states in accordance with the emotional content, to treat men...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-06-01
|
| Series: | Egyptian Informatics Journal |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S1110866525000908 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850213794210381824 |
|---|---|
| author | Siqi Guo Mian Wu Chunhui Zhang Ling Zhong |
| author_facet | Siqi Guo Mian Wu Chunhui Zhang Ling Zhong |
| author_sort | Siqi Guo |
| collection | DOAJ |
| description | Virtual reality technology has been widely applied in various fields of society, and its content emotion recognition has received much attention. The recognition of emotions in virtual reality content can be employed to regulate emotional states in accordance with the emotional content, to treat mental illness and to assess psychological cognition. Nevertheless, the current research on emotion induction and recognition of virtual reality scenes lacks scientific and quantitative methods for establishing the mapping relationship between virtual reality scenes and emotion labels. Furthermore, the associated methods lack clarity regarding image feature extraction, which contributes to the diminished accuracy of emotion recognition in virtual reality content. To solve the current issue of inaccurate emotion recognition in virtual reality content, this study combines convolutional neural networks and long short-term memory. The attention mechanism and multi-modal feature fusion are introduced to improve the speed of feature extraction and convergence. Finally, an improved algorithm-based emotion recognition model for panoramic audio and video virtual reality is proposed. The average accuracy of the proposed algorithm, XLNet-BIGRU-Attention algorithm, and CNN-BiLSTM algorithm was 98.87%, 90.25%, and 86.21%, respectively. The average precision was 98.97%, 97.24% and 97.69%, respectively. The proposed algorithm was significantly superior to the comparison algorithm. A performance comparison was conducted between panoramic audio and video virtual reality emotion recognition models based on the improved algorithm. The improved algorithm’s the mean square error is 0.17 and mean absolute error is 0.19, obviously better than other comparison models. In the analysis of visual classification results, the proposed model has the best classification aggregation effect and is significantly superior to other models. Therefore, the improved algorithm and the panoramic audio and video virtual reality emotion recognition model based on the improved algorithm have good effectiveness and practical value. |
| format | Article |
| id | doaj-art-6e802d4eaf7c41d289acf6cb8b789f16 |
| institution | OA Journals |
| issn | 1110-8665 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Egyptian Informatics Journal |
| spelling | doaj-art-6e802d4eaf7c41d289acf6cb8b789f162025-08-20T02:09:04ZengElsevierEgyptian Informatics Journal1110-86652025-06-013010069710.1016/j.eij.2025.100697Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusionSiqi Guo0Mian Wu1Chunhui Zhang2Ling Zhong3Corresponding author.; School of Art, East China Jiaotong University, Nanchang 330013, ChinaSchool of Art, East China Jiaotong University, Nanchang 330013, ChinaSchool of Art, East China Jiaotong University, Nanchang 330013, ChinaSchool of Art, East China Jiaotong University, Nanchang 330013, ChinaVirtual reality technology has been widely applied in various fields of society, and its content emotion recognition has received much attention. The recognition of emotions in virtual reality content can be employed to regulate emotional states in accordance with the emotional content, to treat mental illness and to assess psychological cognition. Nevertheless, the current research on emotion induction and recognition of virtual reality scenes lacks scientific and quantitative methods for establishing the mapping relationship between virtual reality scenes and emotion labels. Furthermore, the associated methods lack clarity regarding image feature extraction, which contributes to the diminished accuracy of emotion recognition in virtual reality content. To solve the current issue of inaccurate emotion recognition in virtual reality content, this study combines convolutional neural networks and long short-term memory. The attention mechanism and multi-modal feature fusion are introduced to improve the speed of feature extraction and convergence. Finally, an improved algorithm-based emotion recognition model for panoramic audio and video virtual reality is proposed. The average accuracy of the proposed algorithm, XLNet-BIGRU-Attention algorithm, and CNN-BiLSTM algorithm was 98.87%, 90.25%, and 86.21%, respectively. The average precision was 98.97%, 97.24% and 97.69%, respectively. The proposed algorithm was significantly superior to the comparison algorithm. A performance comparison was conducted between panoramic audio and video virtual reality emotion recognition models based on the improved algorithm. The improved algorithm’s the mean square error is 0.17 and mean absolute error is 0.19, obviously better than other comparison models. In the analysis of visual classification results, the proposed model has the best classification aggregation effect and is significantly superior to other models. Therefore, the improved algorithm and the panoramic audio and video virtual reality emotion recognition model based on the improved algorithm have good effectiveness and practical value.http://www.sciencedirect.com/science/article/pii/S1110866525000908CNNLSTMAttention mechanismEmotion recognitionVirtual reality |
| spellingShingle | Siqi Guo Mian Wu Chunhui Zhang Ling Zhong Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion Egyptian Informatics Journal CNN LSTM Attention mechanism Emotion recognition Virtual reality |
| title | Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion |
| title_full | Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion |
| title_fullStr | Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion |
| title_full_unstemmed | Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion |
| title_short | Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion |
| title_sort | emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion |
| topic | CNN LSTM Attention mechanism Emotion recognition Virtual reality |
| url | http://www.sciencedirect.com/science/article/pii/S1110866525000908 |
| work_keys_str_mv | AT siqiguo emotionrecognitioninpanoramicaudioandvideovirtualrealitybasedondeeplearningandfeaturefusion AT mianwu emotionrecognitioninpanoramicaudioandvideovirtualrealitybasedondeeplearningandfeaturefusion AT chunhuizhang emotionrecognitioninpanoramicaudioandvideovirtualrealitybasedondeeplearningandfeaturefusion AT lingzhong emotionrecognitioninpanoramicaudioandvideovirtualrealitybasedondeeplearningandfeaturefusion |