An Interpretable Machine Learning-Based Hurdle Model for Zero-Inflated Road Crash Frequency Data Analysis: Real-World Assessment and Validation
Road traffic crashes pose significant economic and public health burdens, necessitating an in-depth understanding of crash causation and its links to underlying factors. This study introduces a machine learning-based hurdle model framework tailored for analyzing zero-inflated crash frequency data, a...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-11-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/14/23/10790 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850261438958927872 |
|---|---|
| author | Moataz Bellah Ben Khedher Dukgeun Yun |
| author_facet | Moataz Bellah Ben Khedher Dukgeun Yun |
| author_sort | Moataz Bellah Ben Khedher |
| collection | DOAJ |
| description | Road traffic crashes pose significant economic and public health burdens, necessitating an in-depth understanding of crash causation and its links to underlying factors. This study introduces a machine learning-based hurdle model framework tailored for analyzing zero-inflated crash frequency data, addressing the limitations of traditional statistical models like the Poisson and negative binomial models, which struggle with zero-inflation and overdispersion. The research employs a two-stage modeling process using CatBoost. The first stage uses binary classification to identify road segments with potential crash occurrences, applying a customized loss function to tackle data imbalance. The second stage predicts crash frequency, also utilizing a customized loss function for count data. SHapley Additive exPlanations (SHAP) analysis interprets the model outcomes, providing insights into factors affecting crash likelihood and frequency. This study validates the model’s performance with real-world crash data from 2011 to 2015 in South Korea, demonstrating superior accuracy in both the classification and regression stages compared to other machine learning algorithms and traditional models. These findings have significant implications for traffic safety research and policymaking, offering stakeholders a more accurate and interpretable tool for crash data analysis to develop targeted safety interventions. |
| format | Article |
| id | doaj-art-dd81da29e6de4ac98fb00e2f3c52d7a4 |
| institution | OA Journals |
| issn | 2076-3417 |
| language | English |
| publishDate | 2024-11-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-dd81da29e6de4ac98fb00e2f3c52d7a42025-08-20T01:55:26ZengMDPI AGApplied Sciences2076-34172024-11-0114231079010.3390/app142310790An Interpretable Machine Learning-Based Hurdle Model for Zero-Inflated Road Crash Frequency Data Analysis: Real-World Assessment and ValidationMoataz Bellah Ben Khedher0Dukgeun Yun1Department of Civil and Environmental Engineering, KICT School, University of Science and Technology, Daejeon 34113, Republic of KoreaDepartment of Civil and Environmental Engineering, KICT School, University of Science and Technology, Daejeon 34113, Republic of KoreaRoad traffic crashes pose significant economic and public health burdens, necessitating an in-depth understanding of crash causation and its links to underlying factors. This study introduces a machine learning-based hurdle model framework tailored for analyzing zero-inflated crash frequency data, addressing the limitations of traditional statistical models like the Poisson and negative binomial models, which struggle with zero-inflation and overdispersion. The research employs a two-stage modeling process using CatBoost. The first stage uses binary classification to identify road segments with potential crash occurrences, applying a customized loss function to tackle data imbalance. The second stage predicts crash frequency, also utilizing a customized loss function for count data. SHapley Additive exPlanations (SHAP) analysis interprets the model outcomes, providing insights into factors affecting crash likelihood and frequency. This study validates the model’s performance with real-world crash data from 2011 to 2015 in South Korea, demonstrating superior accuracy in both the classification and regression stages compared to other machine learning algorithms and traditional models. These findings have significant implications for traffic safety research and policymaking, offering stakeholders a more accurate and interpretable tool for crash data analysis to develop targeted safety interventions.https://www.mdpi.com/2076-3417/14/23/10790crash frequencymachine learningCatBoostSHAPaccident analysisroad safety |
| spellingShingle | Moataz Bellah Ben Khedher Dukgeun Yun An Interpretable Machine Learning-Based Hurdle Model for Zero-Inflated Road Crash Frequency Data Analysis: Real-World Assessment and Validation Applied Sciences crash frequency machine learning CatBoost SHAP accident analysis road safety |
| title | An Interpretable Machine Learning-Based Hurdle Model for Zero-Inflated Road Crash Frequency Data Analysis: Real-World Assessment and Validation |
| title_full | An Interpretable Machine Learning-Based Hurdle Model for Zero-Inflated Road Crash Frequency Data Analysis: Real-World Assessment and Validation |
| title_fullStr | An Interpretable Machine Learning-Based Hurdle Model for Zero-Inflated Road Crash Frequency Data Analysis: Real-World Assessment and Validation |
| title_full_unstemmed | An Interpretable Machine Learning-Based Hurdle Model for Zero-Inflated Road Crash Frequency Data Analysis: Real-World Assessment and Validation |
| title_short | An Interpretable Machine Learning-Based Hurdle Model for Zero-Inflated Road Crash Frequency Data Analysis: Real-World Assessment and Validation |
| title_sort | interpretable machine learning based hurdle model for zero inflated road crash frequency data analysis real world assessment and validation |
| topic | crash frequency machine learning CatBoost SHAP accident analysis road safety |
| url | https://www.mdpi.com/2076-3417/14/23/10790 |
| work_keys_str_mv | AT moatazbellahbenkhedher aninterpretablemachinelearningbasedhurdlemodelforzeroinflatedroadcrashfrequencydataanalysisrealworldassessmentandvalidation AT dukgeunyun aninterpretablemachinelearningbasedhurdlemodelforzeroinflatedroadcrashfrequencydataanalysisrealworldassessmentandvalidation AT moatazbellahbenkhedher interpretablemachinelearningbasedhurdlemodelforzeroinflatedroadcrashfrequencydataanalysisrealworldassessmentandvalidation AT dukgeunyun interpretablemachinelearningbasedhurdlemodelforzeroinflatedroadcrashfrequencydataanalysisrealworldassessmentandvalidation |