A Learning Emotion Recognition Model Based on Feature Fusion of Photoplethysmography and Video Signal

The ability to recognize learning emotions facilitates the timely detection of students’ difficulties during the learning process, supports teachers in modifying instructional strategies, and allows for personalized student assistance. The detection of learning emotions through the capture of conven...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaoliang Zhu, Zili He, Chuanyong Wang, Zhicheng Dai, Liang Zhao
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/14/24/11594
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The ability to recognize learning emotions facilitates the timely detection of students’ difficulties during the learning process, supports teachers in modifying instructional strategies, and allows for personalized student assistance. The detection of learning emotions through the capture of convenient, non-intrusive signals such as photoplethysmography (PPG) and video offers good practicality; however, it presents new challenges. Firstly, PPG-based emotion recognition is susceptible to external factors like movement and lighting conditions, leading to signal quality degradation and recognition accuracy issues. Secondly, video-based emotion recognition algorithms may witness a reduction in accuracy within spontaneous scenes due to variations, occlusions, and uneven lighting conditions, etc. Therefore, on the one hand, it is necessary to improve the performance of the two recognition methods mentioned above; on the other hand, using the complementary advantages of the two methods through multimodal fusion needs to be considered. To address these concerns, our work mainly includes the following: (i) the development of a temporal convolutional network model incorporating channel attention to overcome PPG-based emotion recognition challenges; (ii) the introduction of a network model that integrates multi-scale spatiotemporal features to address the challenges of emotion recognition in spontaneous environmental videos; (iii) an exploration of a dual-mode fusion approach, along with an improvement of the model-level fusion scheme within a parallel connection attention aggregation network. Experimental comparisons demonstrate the efficacy of the proposed methods, particularly the bimodal fusion, which substantially enhances the accuracy of learning emotion recognition, reaching 95.75%.
ISSN:2076-3417