Development of interpretable intelligent frameworks for estimating river water turbidity

Turbidity (TU) is one of the paramount water quality indicators in rivers and streams. Therefore, knowledge of water TU plays a fundamental role in optimal managing and monitoring river water quality. This study aimed at developing four intelligent schemes including three boosting methods i.e. Categ...

Full description

Saved in:
Bibliographic Details
Main Authors: Amin Gharehbaghi, Salim Heddam, Saeid Mehdizadeh, Sungwon Kim
Format: Article
Language:English
Published: Taylor & Francis Group 2025-12-01
Series:Engineering Applications of Computational Fluid Mechanics
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/19942060.2025.2511886
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Turbidity (TU) is one of the paramount water quality indicators in rivers and streams. Therefore, knowledge of water TU plays a fundamental role in optimal managing and monitoring river water quality. This study aimed at developing four intelligent schemes including three boosting methods i.e. Categorical Boosting (CatBoost), Light Gradient-Boosting Machine (LightGBM), eXtreme Gradient Boosting (XGBoost), and a deep learning method named Convolutional Neural Networks (CNN). To evaluate the performance of proposed models, two gauging river stations situated in United States (i.e. USGS 14206950 and USGS 14211720) were selected as a case study. 70% and 30% of whole data were utilized as the training and validation datasets when developing the models, respectively. It is worthwhile to note that the development of boosting models and their performance companions with a deep learning model, as well as addressing the impacts of input features on the models’ outputs using SHapley Additive exPlanations (SHAP) are the novel aspects of this study, which have been rarely considered in preceding studies for river water TU estimation. Based on the achieved results during the validation period, the CatBoost and XGBoost models were found to be generally best models for an accurate estimation of river water TU in the studied sites. During the validation period, the best-performing models were XGBoost (R = 0.951, NSE = 0.903, RMSE = 3.552 FNU, MAE = 1.816 FNU) at USGS 14206950, and CatBoost (R = 0.961, NSE = 0.920, RMSE = 2.502 FNU, MAE = 1.219 FNU) at USGS 14211720 both established using full-input estimators. An interpretability assessment of the developed models was finally conducted taking into account the SHAP. Analysis of the SHAP graphs in a global level during the validation phase illustrated that river discharge was the most important input variable affecting the output results of the best-performing implemented models.
ISSN:1994-2060
1997-003X