Gaze Estimation Based on a Multi-Stream Adaptive Feature Fusion Network
Recently, with the widespread application of deep learning networks, appearance-based gaze estimation has made breakthrough progress. However, most methods focus on feature extraction from the facial region while neglecting the critical role of the eye region in gaze estimation, leading to insuffici...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/7/3684 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849738774546743296 |
|---|---|
| author | Changli Li Enrui Tong Kao Zhang Nenglun Cheng Zhongyuan Lai Zhigeng Pan |
| author_facet | Changli Li Enrui Tong Kao Zhang Nenglun Cheng Zhongyuan Lai Zhigeng Pan |
| author_sort | Changli Li |
| collection | DOAJ |
| description | Recently, with the widespread application of deep learning networks, appearance-based gaze estimation has made breakthrough progress. However, most methods focus on feature extraction from the facial region while neglecting the critical role of the eye region in gaze estimation, leading to insufficient eye detail representation. To address this issue, this paper proposes a multi-stream multi-input network architecture (MSMI-Net) based on appearance. The model consists of two independent streams designed to extract high-dimensional eye features and low-dimensional features, integrating both eye and facial information. A parallel channel and spatial attention mechanism is employed to fuse low-dimensional eye and facial features, while an adaptive weight adjustment mechanism (AWAM) dynamically determines the contribution ratio of eye and facial features. The concatenated high-dimensional and fused low-dimensional features are processed through fully connected layers to predict the final gaze direction. Extensive experiments on the EYEDIAP, MPIIFaceGaze, and Gaze360 datasets validate the superiority of the proposed method. |
| format | Article |
| id | doaj-art-7ea5068552214e31af889b282e9bcaa0 |
| institution | DOAJ |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-7ea5068552214e31af889b282e9bcaa02025-08-20T03:06:28ZengMDPI AGApplied Sciences2076-34172025-03-01157368410.3390/app15073684Gaze Estimation Based on a Multi-Stream Adaptive Feature Fusion NetworkChangli Li0Enrui Tong1Kao Zhang2Nenglun Cheng3Zhongyuan Lai4Zhigeng Pan5School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaSchool of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaSchool of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaSchool of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaState Key Laboratory of Precision Blasting, Jianghan University, Wuhan 430056, ChinaSchool of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaRecently, with the widespread application of deep learning networks, appearance-based gaze estimation has made breakthrough progress. However, most methods focus on feature extraction from the facial region while neglecting the critical role of the eye region in gaze estimation, leading to insufficient eye detail representation. To address this issue, this paper proposes a multi-stream multi-input network architecture (MSMI-Net) based on appearance. The model consists of two independent streams designed to extract high-dimensional eye features and low-dimensional features, integrating both eye and facial information. A parallel channel and spatial attention mechanism is employed to fuse low-dimensional eye and facial features, while an adaptive weight adjustment mechanism (AWAM) dynamically determines the contribution ratio of eye and facial features. The concatenated high-dimensional and fused low-dimensional features are processed through fully connected layers to predict the final gaze direction. Extensive experiments on the EYEDIAP, MPIIFaceGaze, and Gaze360 datasets validate the superiority of the proposed method.https://www.mdpi.com/2076-3417/15/7/3684gaze estimationmulti-stream networkadaptive feature fusion mechanismdeep learning |
| spellingShingle | Changli Li Enrui Tong Kao Zhang Nenglun Cheng Zhongyuan Lai Zhigeng Pan Gaze Estimation Based on a Multi-Stream Adaptive Feature Fusion Network Applied Sciences gaze estimation multi-stream network adaptive feature fusion mechanism deep learning |
| title | Gaze Estimation Based on a Multi-Stream Adaptive Feature Fusion Network |
| title_full | Gaze Estimation Based on a Multi-Stream Adaptive Feature Fusion Network |
| title_fullStr | Gaze Estimation Based on a Multi-Stream Adaptive Feature Fusion Network |
| title_full_unstemmed | Gaze Estimation Based on a Multi-Stream Adaptive Feature Fusion Network |
| title_short | Gaze Estimation Based on a Multi-Stream Adaptive Feature Fusion Network |
| title_sort | gaze estimation based on a multi stream adaptive feature fusion network |
| topic | gaze estimation multi-stream network adaptive feature fusion mechanism deep learning |
| url | https://www.mdpi.com/2076-3417/15/7/3684 |
| work_keys_str_mv | AT changlili gazeestimationbasedonamultistreamadaptivefeaturefusionnetwork AT enruitong gazeestimationbasedonamultistreamadaptivefeaturefusionnetwork AT kaozhang gazeestimationbasedonamultistreamadaptivefeaturefusionnetwork AT nengluncheng gazeestimationbasedonamultistreamadaptivefeaturefusionnetwork AT zhongyuanlai gazeestimationbasedonamultistreamadaptivefeaturefusionnetwork AT zhigengpan gazeestimationbasedonamultistreamadaptivefeaturefusionnetwork |