Interpreting machine learning models based on SHAP values in predicting suspended sediment concentration

Machine learning (ML) has become a powerful tool for predicting suspended sediment concentration (SSC). Nonetheless, the ability to interpret the physical process is considered the main issue in applying most of ML approaches. In this regard, the current study presents a novel framework involving fo...

Full description

Saved in:
Bibliographic Details
Main Authors: Houda Lamane, Latifa Mouhir, Rachid Moussadek, Bouamar Baghdad, Ozgur Kisi, Ali El Bilali
Format: Article
Language:English
Published: KeAi Communications Co., Ltd. 2025-02-01
Series:International Journal of Sediment Research
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1001627924001070
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841545574231310336
author Houda Lamane
Latifa Mouhir
Rachid Moussadek
Bouamar Baghdad
Ozgur Kisi
Ali El Bilali
author_facet Houda Lamane
Latifa Mouhir
Rachid Moussadek
Bouamar Baghdad
Ozgur Kisi
Ali El Bilali
author_sort Houda Lamane
collection DOAJ
description Machine learning (ML) has become a powerful tool for predicting suspended sediment concentration (SSC). Nonetheless, the ability to interpret the physical process is considered the main issue in applying most of ML approaches. In this regard, the current study presents a novel framework involving four standalone ML models (extra trees (ET), random forest (RF), categorical boosting (CatBoost), and extreme gradient boosting (XGBoost)) and their combination with genetic programming (GP). Three metrics (coefficient of correlation (r), root mean square error (RMSE), and Nash–Sutcliffe model-fit efficiency (NSE)) and a more advanced interpretation system SHapley Additive exPlanations (SHAP) are used to assess the performance of these models applied to hydro-climatic datasets for prediction of SSC. The calibration process was based on data from 2016 to 2020, and the validation was done for 2021 data. Further description and application of the framework are provided based on a case study of the Bouregreg watershed. The results revealed that all implemented models are efficient in SSC prediction with NSE, RMSE, and r varying from 0.53 to 0.86, 1.20–2.55 g/L, and 0.83–0.91 g/L respectively. Box plot diagrams confirm the enhanced performance of these combined models, and the best-performing ones for the four hydrological stations being the combined RF + GP model at the Aguibat Ziar station, the combined XGBoost + GP model at the Ain Loudah station, the CatBoost model at the Ras Fathia station, and the RF model at the Sidi Med Cherif station. The interpretability results showed that flow (Q) and seasonality (S) are the features most impacting SSC. These outcomes indicate that the applied models can extract accurate and detailed information from the interactions between the hydroclimatic factors and the generation of sediment by erosion (output). ML approaches illustrated the good reliability and transparency of the models developed for predicting SSC in a semi-arid setting, offered new perspectives for reducing ML models' “black box” character, and provided a useful source of information for assessing the consequences of SSC on water quality. The SHAP system and exploring other interpretable techniques are recommended to provide further information in future research. In addition, incorporating additional input data could enhance SSC predictions and deepen understanding of sediment transport dynamics.
format Article
id doaj-art-3ca14ec10573465da07b89b0efb5261a
institution Kabale University
issn 1001-6279
language English
publishDate 2025-02-01
publisher KeAi Communications Co., Ltd.
record_format Article
series International Journal of Sediment Research
spelling doaj-art-3ca14ec10573465da07b89b0efb5261a2025-01-12T05:24:18ZengKeAi Communications Co., Ltd.International Journal of Sediment Research1001-62792025-02-0140191107Interpreting machine learning models based on SHAP values in predicting suspended sediment concentrationHouda Lamane0Latifa Mouhir1Rachid Moussadek2Bouamar Baghdad3Ozgur Kisi4Ali El Bilali5Department of Process Engineering and Environment, Faculty of Sciences and Techniques of Mohammedia, Hassan II University of Casablanca, Mohammedia 28806, Morocco; Department of Environment and Natural Resources, National Institute for Agricultural Research (INRA), Rabat 10000, Morocco; International Centre for Agriculture Research in the Dry Areas (ICARDA), Rabat 10000, Morocco; Corresponding author.Department of Process Engineering and Environment, Faculty of Sciences and Techniques of Mohammedia, Hassan II University of Casablanca, Mohammedia 28806, MoroccoDepartment of Environment and Natural Resources, National Institute for Agricultural Research (INRA), Rabat 10000, Morocco; International Centre for Agriculture Research in the Dry Areas (ICARDA), Rabat 10000, MoroccoSchool of Architecture and Landscape, Casablanca 20100, MoroccoDepartment of Civil Engineering, Luebeck University of Applied Sciences, Lübeck 23562, Germany; Department of Civil Engineering, Ilia State University, Tbilisi 0162, GeorgiaRiver Basin Agency of Bouregreg and Chaouia, Benslimane 13000, MoroccoMachine learning (ML) has become a powerful tool for predicting suspended sediment concentration (SSC). Nonetheless, the ability to interpret the physical process is considered the main issue in applying most of ML approaches. In this regard, the current study presents a novel framework involving four standalone ML models (extra trees (ET), random forest (RF), categorical boosting (CatBoost), and extreme gradient boosting (XGBoost)) and their combination with genetic programming (GP). Three metrics (coefficient of correlation (r), root mean square error (RMSE), and Nash–Sutcliffe model-fit efficiency (NSE)) and a more advanced interpretation system SHapley Additive exPlanations (SHAP) are used to assess the performance of these models applied to hydro-climatic datasets for prediction of SSC. The calibration process was based on data from 2016 to 2020, and the validation was done for 2021 data. Further description and application of the framework are provided based on a case study of the Bouregreg watershed. The results revealed that all implemented models are efficient in SSC prediction with NSE, RMSE, and r varying from 0.53 to 0.86, 1.20–2.55 g/L, and 0.83–0.91 g/L respectively. Box plot diagrams confirm the enhanced performance of these combined models, and the best-performing ones for the four hydrological stations being the combined RF + GP model at the Aguibat Ziar station, the combined XGBoost + GP model at the Ain Loudah station, the CatBoost model at the Ras Fathia station, and the RF model at the Sidi Med Cherif station. The interpretability results showed that flow (Q) and seasonality (S) are the features most impacting SSC. These outcomes indicate that the applied models can extract accurate and detailed information from the interactions between the hydroclimatic factors and the generation of sediment by erosion (output). ML approaches illustrated the good reliability and transparency of the models developed for predicting SSC in a semi-arid setting, offered new perspectives for reducing ML models' “black box” character, and provided a useful source of information for assessing the consequences of SSC on water quality. The SHAP system and exploring other interpretable techniques are recommended to provide further information in future research. In addition, incorporating additional input data could enhance SSC predictions and deepen understanding of sediment transport dynamics.http://www.sciencedirect.com/science/article/pii/S1001627924001070InterpretabilityMachine learning (ML)Shapley valuesSuspended sediment concentration (SSC)Soil erosionBouregreg watershed (BW)
spellingShingle Houda Lamane
Latifa Mouhir
Rachid Moussadek
Bouamar Baghdad
Ozgur Kisi
Ali El Bilali
Interpreting machine learning models based on SHAP values in predicting suspended sediment concentration
International Journal of Sediment Research
Interpretability
Machine learning (ML)
Shapley values
Suspended sediment concentration (SSC)
Soil erosion
Bouregreg watershed (BW)
title Interpreting machine learning models based on SHAP values in predicting suspended sediment concentration
title_full Interpreting machine learning models based on SHAP values in predicting suspended sediment concentration
title_fullStr Interpreting machine learning models based on SHAP values in predicting suspended sediment concentration
title_full_unstemmed Interpreting machine learning models based on SHAP values in predicting suspended sediment concentration
title_short Interpreting machine learning models based on SHAP values in predicting suspended sediment concentration
title_sort interpreting machine learning models based on shap values in predicting suspended sediment concentration
topic Interpretability
Machine learning (ML)
Shapley values
Suspended sediment concentration (SSC)
Soil erosion
Bouregreg watershed (BW)
url http://www.sciencedirect.com/science/article/pii/S1001627924001070
work_keys_str_mv AT houdalamane interpretingmachinelearningmodelsbasedonshapvaluesinpredictingsuspendedsedimentconcentration
AT latifamouhir interpretingmachinelearningmodelsbasedonshapvaluesinpredictingsuspendedsedimentconcentration
AT rachidmoussadek interpretingmachinelearningmodelsbasedonshapvaluesinpredictingsuspendedsedimentconcentration
AT bouamarbaghdad interpretingmachinelearningmodelsbasedonshapvaluesinpredictingsuspendedsedimentconcentration
AT ozgurkisi interpretingmachinelearningmodelsbasedonshapvaluesinpredictingsuspendedsedimentconcentration
AT alielbilali interpretingmachinelearningmodelsbasedonshapvaluesinpredictingsuspendedsedimentconcentration