An Interpretable Machine Learning-Based Hurdle Model for Zero-Inflated Road Crash Frequency Data Analysis: Real-World Assessment and Validation

Road traffic crashes pose significant economic and public health burdens, necessitating an in-depth understanding of crash causation and its links to underlying factors. This study introduces a machine learning-based hurdle model framework tailored for analyzing zero-inflated crash frequency data, a...

Full description

Saved in:
Bibliographic Details
Main Authors: Moataz Bellah Ben Khedher, Dukgeun Yun
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/14/23/10790
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850261438958927872
author Moataz Bellah Ben Khedher
Dukgeun Yun
author_facet Moataz Bellah Ben Khedher
Dukgeun Yun
author_sort Moataz Bellah Ben Khedher
collection DOAJ
description Road traffic crashes pose significant economic and public health burdens, necessitating an in-depth understanding of crash causation and its links to underlying factors. This study introduces a machine learning-based hurdle model framework tailored for analyzing zero-inflated crash frequency data, addressing the limitations of traditional statistical models like the Poisson and negative binomial models, which struggle with zero-inflation and overdispersion. The research employs a two-stage modeling process using CatBoost. The first stage uses binary classification to identify road segments with potential crash occurrences, applying a customized loss function to tackle data imbalance. The second stage predicts crash frequency, also utilizing a customized loss function for count data. SHapley Additive exPlanations (SHAP) analysis interprets the model outcomes, providing insights into factors affecting crash likelihood and frequency. This study validates the model’s performance with real-world crash data from 2011 to 2015 in South Korea, demonstrating superior accuracy in both the classification and regression stages compared to other machine learning algorithms and traditional models. These findings have significant implications for traffic safety research and policymaking, offering stakeholders a more accurate and interpretable tool for crash data analysis to develop targeted safety interventions.
format Article
id doaj-art-dd81da29e6de4ac98fb00e2f3c52d7a4
institution OA Journals
issn 2076-3417
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-dd81da29e6de4ac98fb00e2f3c52d7a42025-08-20T01:55:26ZengMDPI AGApplied Sciences2076-34172024-11-0114231079010.3390/app142310790An Interpretable Machine Learning-Based Hurdle Model for Zero-Inflated Road Crash Frequency Data Analysis: Real-World Assessment and ValidationMoataz Bellah Ben Khedher0Dukgeun Yun1Department of Civil and Environmental Engineering, KICT School, University of Science and Technology, Daejeon 34113, Republic of KoreaDepartment of Civil and Environmental Engineering, KICT School, University of Science and Technology, Daejeon 34113, Republic of KoreaRoad traffic crashes pose significant economic and public health burdens, necessitating an in-depth understanding of crash causation and its links to underlying factors. This study introduces a machine learning-based hurdle model framework tailored for analyzing zero-inflated crash frequency data, addressing the limitations of traditional statistical models like the Poisson and negative binomial models, which struggle with zero-inflation and overdispersion. The research employs a two-stage modeling process using CatBoost. The first stage uses binary classification to identify road segments with potential crash occurrences, applying a customized loss function to tackle data imbalance. The second stage predicts crash frequency, also utilizing a customized loss function for count data. SHapley Additive exPlanations (SHAP) analysis interprets the model outcomes, providing insights into factors affecting crash likelihood and frequency. This study validates the model’s performance with real-world crash data from 2011 to 2015 in South Korea, demonstrating superior accuracy in both the classification and regression stages compared to other machine learning algorithms and traditional models. These findings have significant implications for traffic safety research and policymaking, offering stakeholders a more accurate and interpretable tool for crash data analysis to develop targeted safety interventions.https://www.mdpi.com/2076-3417/14/23/10790crash frequencymachine learningCatBoostSHAPaccident analysisroad safety
spellingShingle Moataz Bellah Ben Khedher
Dukgeun Yun
An Interpretable Machine Learning-Based Hurdle Model for Zero-Inflated Road Crash Frequency Data Analysis: Real-World Assessment and Validation
Applied Sciences
crash frequency
machine learning
CatBoost
SHAP
accident analysis
road safety
title An Interpretable Machine Learning-Based Hurdle Model for Zero-Inflated Road Crash Frequency Data Analysis: Real-World Assessment and Validation
title_full An Interpretable Machine Learning-Based Hurdle Model for Zero-Inflated Road Crash Frequency Data Analysis: Real-World Assessment and Validation
title_fullStr An Interpretable Machine Learning-Based Hurdle Model for Zero-Inflated Road Crash Frequency Data Analysis: Real-World Assessment and Validation
title_full_unstemmed An Interpretable Machine Learning-Based Hurdle Model for Zero-Inflated Road Crash Frequency Data Analysis: Real-World Assessment and Validation
title_short An Interpretable Machine Learning-Based Hurdle Model for Zero-Inflated Road Crash Frequency Data Analysis: Real-World Assessment and Validation
title_sort interpretable machine learning based hurdle model for zero inflated road crash frequency data analysis real world assessment and validation
topic crash frequency
machine learning
CatBoost
SHAP
accident analysis
road safety
url https://www.mdpi.com/2076-3417/14/23/10790
work_keys_str_mv AT moatazbellahbenkhedher aninterpretablemachinelearningbasedhurdlemodelforzeroinflatedroadcrashfrequencydataanalysisrealworldassessmentandvalidation
AT dukgeunyun aninterpretablemachinelearningbasedhurdlemodelforzeroinflatedroadcrashfrequencydataanalysisrealworldassessmentandvalidation
AT moatazbellahbenkhedher interpretablemachinelearningbasedhurdlemodelforzeroinflatedroadcrashfrequencydataanalysisrealworldassessmentandvalidation
AT dukgeunyun interpretablemachinelearningbasedhurdlemodelforzeroinflatedroadcrashfrequencydataanalysisrealworldassessmentandvalidation