Gaze Estimation Based on a Multi-Stream Adaptive Feature Fusion Network

Recently, with the widespread application of deep learning networks, appearance-based gaze estimation has made breakthrough progress. However, most methods focus on feature extraction from the facial region while neglecting the critical role of the eye region in gaze estimation, leading to insuffici...

Full description

Saved in:
Bibliographic Details
Main Authors: Changli Li, Enrui Tong, Kao Zhang, Nenglun Cheng, Zhongyuan Lai, Zhigeng Pan
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/7/3684
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849738774546743296
author Changli Li
Enrui Tong
Kao Zhang
Nenglun Cheng
Zhongyuan Lai
Zhigeng Pan
author_facet Changli Li
Enrui Tong
Kao Zhang
Nenglun Cheng
Zhongyuan Lai
Zhigeng Pan
author_sort Changli Li
collection DOAJ
description Recently, with the widespread application of deep learning networks, appearance-based gaze estimation has made breakthrough progress. However, most methods focus on feature extraction from the facial region while neglecting the critical role of the eye region in gaze estimation, leading to insufficient eye detail representation. To address this issue, this paper proposes a multi-stream multi-input network architecture (MSMI-Net) based on appearance. The model consists of two independent streams designed to extract high-dimensional eye features and low-dimensional features, integrating both eye and facial information. A parallel channel and spatial attention mechanism is employed to fuse low-dimensional eye and facial features, while an adaptive weight adjustment mechanism (AWAM) dynamically determines the contribution ratio of eye and facial features. The concatenated high-dimensional and fused low-dimensional features are processed through fully connected layers to predict the final gaze direction. Extensive experiments on the EYEDIAP, MPIIFaceGaze, and Gaze360 datasets validate the superiority of the proposed method.
format Article
id doaj-art-7ea5068552214e31af889b282e9bcaa0
institution DOAJ
issn 2076-3417
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-7ea5068552214e31af889b282e9bcaa02025-08-20T03:06:28ZengMDPI AGApplied Sciences2076-34172025-03-01157368410.3390/app15073684Gaze Estimation Based on a Multi-Stream Adaptive Feature Fusion NetworkChangli Li0Enrui Tong1Kao Zhang2Nenglun Cheng3Zhongyuan Lai4Zhigeng Pan5School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaSchool of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaSchool of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaSchool of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaState Key Laboratory of Precision Blasting, Jianghan University, Wuhan 430056, ChinaSchool of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaRecently, with the widespread application of deep learning networks, appearance-based gaze estimation has made breakthrough progress. However, most methods focus on feature extraction from the facial region while neglecting the critical role of the eye region in gaze estimation, leading to insufficient eye detail representation. To address this issue, this paper proposes a multi-stream multi-input network architecture (MSMI-Net) based on appearance. The model consists of two independent streams designed to extract high-dimensional eye features and low-dimensional features, integrating both eye and facial information. A parallel channel and spatial attention mechanism is employed to fuse low-dimensional eye and facial features, while an adaptive weight adjustment mechanism (AWAM) dynamically determines the contribution ratio of eye and facial features. The concatenated high-dimensional and fused low-dimensional features are processed through fully connected layers to predict the final gaze direction. Extensive experiments on the EYEDIAP, MPIIFaceGaze, and Gaze360 datasets validate the superiority of the proposed method.https://www.mdpi.com/2076-3417/15/7/3684gaze estimationmulti-stream networkadaptive feature fusion mechanismdeep learning
spellingShingle Changli Li
Enrui Tong
Kao Zhang
Nenglun Cheng
Zhongyuan Lai
Zhigeng Pan
Gaze Estimation Based on a Multi-Stream Adaptive Feature Fusion Network
Applied Sciences
gaze estimation
multi-stream network
adaptive feature fusion mechanism
deep learning
title Gaze Estimation Based on a Multi-Stream Adaptive Feature Fusion Network
title_full Gaze Estimation Based on a Multi-Stream Adaptive Feature Fusion Network
title_fullStr Gaze Estimation Based on a Multi-Stream Adaptive Feature Fusion Network
title_full_unstemmed Gaze Estimation Based on a Multi-Stream Adaptive Feature Fusion Network
title_short Gaze Estimation Based on a Multi-Stream Adaptive Feature Fusion Network
title_sort gaze estimation based on a multi stream adaptive feature fusion network
topic gaze estimation
multi-stream network
adaptive feature fusion mechanism
deep learning
url https://www.mdpi.com/2076-3417/15/7/3684
work_keys_str_mv AT changlili gazeestimationbasedonamultistreamadaptivefeaturefusionnetwork
AT enruitong gazeestimationbasedonamultistreamadaptivefeaturefusionnetwork
AT kaozhang gazeestimationbasedonamultistreamadaptivefeaturefusionnetwork
AT nengluncheng gazeestimationbasedonamultistreamadaptivefeaturefusionnetwork
AT zhongyuanlai gazeestimationbasedonamultistreamadaptivefeaturefusionnetwork
AT zhigengpan gazeestimationbasedonamultistreamadaptivefeaturefusionnetwork