Detecting Driver Drowsiness Using Hybrid Facial Features and Ensemble Learning

Drowsiness while driving poses a significant risk in terms of road safety, making effective drowsiness detection systems essential for the prevention of accidents. Facial signal-based detection methods have proven to be an effective approach to drowsiness detection. However, they bring challenges ar...

Full description

Saved in:
Bibliographic Details
Main Authors: Changbiao Xu, Wenhao Huang, Jiao Liu, Lang Li
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/16/4/294
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Drowsiness while driving poses a significant risk in terms of road safety, making effective drowsiness detection systems essential for the prevention of accidents. Facial signal-based detection methods have proven to be an effective approach to drowsiness detection. However, they bring challenges arising from inter-individual differences among drivers. Variations in facial structure necessitate personalized feature extraction thresholds, yet existing methods apply a uniform threshold, leading to inaccurate feature extraction. Furthermore, many current methods focus on only one or two facial regions, overlooking the possibility that drowsiness may manifest differently across different facial areas among different drivers. To address these issues, we propose a drowsiness detection method that combines an ensemble model with hybrid facial features. This approach enables the accurate extraction of features from four key facial regions—the eye region, mouth contour, head pose, and gaze direction—through adaptive threshold correction to ensure comprehensive coverage. An ensemble model, combining Random Forest, XGBoost, and Multilayer Perceptron with a soft voting criterion, is then employed to classify the drivers’ drowsiness state. Additionally, we use the SHAP method to ensure model explainability and analyze the correlations between features from various facial regions. Trained and tested on the UTA-RLDD dataset, our method achieves a video accuracy (VA) of 86.52%, outperforming similar techniques introduced in recent years. The interpretability analysis demonstrates the value of our approach, offering a valuable reference for future research and contributing significantly to road safety.
ISSN:2078-2489