A Gaze Estimation Method Based on Spatial and Channel Reconstructed ResNet Combined with Multi-Clue Fusion

The complexity of various factors influencing online learning makes it difficult to characterize learning concentration, while Accurately estimating students’ gaze points during learning video sessions represents a critical scientific challenge in assessing and enhancing the attentiveness of online...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhaoyu Shou, Yanjun Lin, Jianwen Mo, Ziyong Wu
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Journal of Imaging
Subjects:
Online Access:https://www.mdpi.com/2313-433X/11/4/99
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850144843170316288
author Zhaoyu Shou
Yanjun Lin
Jianwen Mo
Ziyong Wu
author_facet Zhaoyu Shou
Yanjun Lin
Jianwen Mo
Ziyong Wu
author_sort Zhaoyu Shou
collection DOAJ
description The complexity of various factors influencing online learning makes it difficult to characterize learning concentration, while Accurately estimating students’ gaze points during learning video sessions represents a critical scientific challenge in assessing and enhancing the attentiveness of online learners. However, current appearance-based gaze estimation models lack a focus on extracting essential features and fail to effectively model the spatio-temporal relationships among the head, face, and eye regions, which limits their ability to achieve lower angular errors. This paper proposes an appearance-based gaze estimation model (RSP-MCGaze). The model constructs a feature extraction backbone network for gaze estimation (ResNetSC) by integrating ResNet and SCConv; this integration enhances the model’s ability to extract important features while reducing spatial and channel redundancy. Based on the ResNetSC backbone, the method for video gaze estimation was further optimized by jointly locating the head, eyes, and face. The experimental results demonstrate that our model achieves significantly higher performance compared to existing baseline models on public datasets, thereby fully confirming the superiority of our method in the gaze estimation task. The model achieves a detection error of 9.86 on the Gaze360 dataset and a detection error of 7.11 on the detectable face subset of Gaze360.
format Article
id doaj-art-0744363965af4f3bbaa203db07474196
institution OA Journals
issn 2313-433X
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Journal of Imaging
spelling doaj-art-0744363965af4f3bbaa203db074741962025-08-20T02:28:15ZengMDPI AGJournal of Imaging2313-433X2025-03-011149910.3390/jimaging11040099A Gaze Estimation Method Based on Spatial and Channel Reconstructed ResNet Combined with Multi-Clue FusionZhaoyu Shou0Yanjun Lin1Jianwen Mo2Ziyong Wu3School of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, ChinaSchool of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, ChinaSchool of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, ChinaGuangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, ChinaThe complexity of various factors influencing online learning makes it difficult to characterize learning concentration, while Accurately estimating students’ gaze points during learning video sessions represents a critical scientific challenge in assessing and enhancing the attentiveness of online learners. However, current appearance-based gaze estimation models lack a focus on extracting essential features and fail to effectively model the spatio-temporal relationships among the head, face, and eye regions, which limits their ability to achieve lower angular errors. This paper proposes an appearance-based gaze estimation model (RSP-MCGaze). The model constructs a feature extraction backbone network for gaze estimation (ResNetSC) by integrating ResNet and SCConv; this integration enhances the model’s ability to extract important features while reducing spatial and channel redundancy. Based on the ResNetSC backbone, the method for video gaze estimation was further optimized by jointly locating the head, eyes, and face. The experimental results demonstrate that our model achieves significantly higher performance compared to existing baseline models on public datasets, thereby fully confirming the superiority of our method in the gaze estimation task. The model achieves a detection error of 9.86 on the Gaze360 dataset and a detection error of 7.11 on the detectable face subset of Gaze360.https://www.mdpi.com/2313-433X/11/4/99gaze estimationSCConvResNet
spellingShingle Zhaoyu Shou
Yanjun Lin
Jianwen Mo
Ziyong Wu
A Gaze Estimation Method Based on Spatial and Channel Reconstructed ResNet Combined with Multi-Clue Fusion
Journal of Imaging
gaze estimation
SCConv
ResNet
title A Gaze Estimation Method Based on Spatial and Channel Reconstructed ResNet Combined with Multi-Clue Fusion
title_full A Gaze Estimation Method Based on Spatial and Channel Reconstructed ResNet Combined with Multi-Clue Fusion
title_fullStr A Gaze Estimation Method Based on Spatial and Channel Reconstructed ResNet Combined with Multi-Clue Fusion
title_full_unstemmed A Gaze Estimation Method Based on Spatial and Channel Reconstructed ResNet Combined with Multi-Clue Fusion
title_short A Gaze Estimation Method Based on Spatial and Channel Reconstructed ResNet Combined with Multi-Clue Fusion
title_sort gaze estimation method based on spatial and channel reconstructed resnet combined with multi clue fusion
topic gaze estimation
SCConv
ResNet
url https://www.mdpi.com/2313-433X/11/4/99
work_keys_str_mv AT zhaoyushou agazeestimationmethodbasedonspatialandchannelreconstructedresnetcombinedwithmulticluefusion
AT yanjunlin agazeestimationmethodbasedonspatialandchannelreconstructedresnetcombinedwithmulticluefusion
AT jianwenmo agazeestimationmethodbasedonspatialandchannelreconstructedresnetcombinedwithmulticluefusion
AT ziyongwu agazeestimationmethodbasedonspatialandchannelreconstructedresnetcombinedwithmulticluefusion
AT zhaoyushou gazeestimationmethodbasedonspatialandchannelreconstructedresnetcombinedwithmulticluefusion
AT yanjunlin gazeestimationmethodbasedonspatialandchannelreconstructedresnetcombinedwithmulticluefusion
AT jianwenmo gazeestimationmethodbasedonspatialandchannelreconstructedresnetcombinedwithmulticluefusion
AT ziyongwu gazeestimationmethodbasedonspatialandchannelreconstructedresnetcombinedwithmulticluefusion