Comparative analysis of machine learning models for predicting river water quality: a case study of the Zayandeh Rood River

Given the key role of rivers in supplying drinking water, supporting industry, agriculture, and ecosystems, water quality assessment and pollution quantification are essential for sustainable use. This study evaluated five machine learning models, i.e., Lasso Regression, Random Forest (RF), Gradient...

Full description

Saved in:
Bibliographic Details
Main Authors: Elham Fazel Najafabadi, Paria Shojaei, Mojgan Askarizadeh
Format: Article
Language:English
Published: Elsevier 2025-09-01
Series:Results in Engineering
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S259012302502732X
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849233517687341056
author Elham Fazel Najafabadi
Paria Shojaei
Mojgan Askarizadeh
author_facet Elham Fazel Najafabadi
Paria Shojaei
Mojgan Askarizadeh
author_sort Elham Fazel Najafabadi
collection DOAJ
description Given the key role of rivers in supplying drinking water, supporting industry, agriculture, and ecosystems, water quality assessment and pollution quantification are essential for sustainable use. This study evaluated five machine learning models, i.e., Lasso Regression, Random Forest (RF), Gradient Boosting (GB), XGBoost, and Support Vector Machine (SVM) for predicting four water quality parameters—EC (Electrical Conductivity), TDS (Total Dissolved Solids), Sodium Adsorption Ratio (SAR), and TH (Total Hardness)—using data collected over a 31-year period from eight monitoring stations along the Zayandeh Rood River, a vital water source for drinking, agriculture, and industry in the arid region of central Iran. The models were evaluated based on five statistical criteria: R², RMSE, RRMSE, r, and MAE. Two dimensionality reduction techniques—PCA and correlation matrix-based feature reduction—were implemented to enhance model efficiency and mitigate multicollinearity. The findings indicate that the best-performing model for a given parameter varied across stations. However, the differences in evaluation metrics between the best models were quite low in most stations. The GB and SVM models outperformed other models in predicting EC, and TDS (0.80<R²<0.99). However, in predicting SAR, the GB and XGBoost models (0.955<R2<0.999), and in predicting TH, the Lasso and SVM models achieved higher accuracy (0.830<R²<0.996). The Lasso regression model proved to be the most effective for predicting TH at half of the monitoring stations.
format Article
id doaj-art-036f2b0ad4a543fd8359eb9b5bfe8d4e
institution Kabale University
issn 2590-1230
language English
publishDate 2025-09-01
publisher Elsevier
record_format Article
series Results in Engineering
spelling doaj-art-036f2b0ad4a543fd8359eb9b5bfe8d4e2025-08-20T05:07:32ZengElsevierResults in Engineering2590-12302025-09-012710666510.1016/j.rineng.2025.106665Comparative analysis of machine learning models for predicting river water quality: a case study of the Zayandeh Rood RiverElham Fazel Najafabadi0Paria Shojaei1Mojgan Askarizadeh2Department of Water Science and Engineering. College of Agriculture, Isfahan University of Technology, Isfahan, Iran; Corresponding author.Department of Architecture and Civil Engineering, University of Bath, Bath, UKDepartment of Computer Engineering, Faculty of Engineering, Ardakan University, Ardakan, Yazd, IranGiven the key role of rivers in supplying drinking water, supporting industry, agriculture, and ecosystems, water quality assessment and pollution quantification are essential for sustainable use. This study evaluated five machine learning models, i.e., Lasso Regression, Random Forest (RF), Gradient Boosting (GB), XGBoost, and Support Vector Machine (SVM) for predicting four water quality parameters—EC (Electrical Conductivity), TDS (Total Dissolved Solids), Sodium Adsorption Ratio (SAR), and TH (Total Hardness)—using data collected over a 31-year period from eight monitoring stations along the Zayandeh Rood River, a vital water source for drinking, agriculture, and industry in the arid region of central Iran. The models were evaluated based on five statistical criteria: R², RMSE, RRMSE, r, and MAE. Two dimensionality reduction techniques—PCA and correlation matrix-based feature reduction—were implemented to enhance model efficiency and mitigate multicollinearity. The findings indicate that the best-performing model for a given parameter varied across stations. However, the differences in evaluation metrics between the best models were quite low in most stations. The GB and SVM models outperformed other models in predicting EC, and TDS (0.80<R²<0.99). However, in predicting SAR, the GB and XGBoost models (0.955<R2<0.999), and in predicting TH, the Lasso and SVM models achieved higher accuracy (0.830<R²<0.996). The Lasso regression model proved to be the most effective for predicting TH at half of the monitoring stations.http://www.sciencedirect.com/science/article/pii/S259012302502732XWater quality predictionMachine learning algorithmsZayandeh Rood RiverSurface water
spellingShingle Elham Fazel Najafabadi
Paria Shojaei
Mojgan Askarizadeh
Comparative analysis of machine learning models for predicting river water quality: a case study of the Zayandeh Rood River
Results in Engineering
Water quality prediction
Machine learning algorithms
Zayandeh Rood River
Surface water
title Comparative analysis of machine learning models for predicting river water quality: a case study of the Zayandeh Rood River
title_full Comparative analysis of machine learning models for predicting river water quality: a case study of the Zayandeh Rood River
title_fullStr Comparative analysis of machine learning models for predicting river water quality: a case study of the Zayandeh Rood River
title_full_unstemmed Comparative analysis of machine learning models for predicting river water quality: a case study of the Zayandeh Rood River
title_short Comparative analysis of machine learning models for predicting river water quality: a case study of the Zayandeh Rood River
title_sort comparative analysis of machine learning models for predicting river water quality a case study of the zayandeh rood river
topic Water quality prediction
Machine learning algorithms
Zayandeh Rood River
Surface water
url http://www.sciencedirect.com/science/article/pii/S259012302502732X
work_keys_str_mv AT elhamfazelnajafabadi comparativeanalysisofmachinelearningmodelsforpredictingriverwaterqualityacasestudyofthezayandehroodriver
AT pariashojaei comparativeanalysisofmachinelearningmodelsforpredictingriverwaterqualityacasestudyofthezayandehroodriver
AT mojganaskarizadeh comparativeanalysisofmachinelearningmodelsforpredictingriverwaterqualityacasestudyofthezayandehroodriver