Comparative analysis of machine learning models for predicting river water quality: a case study of the Zayandeh Rood River
Given the key role of rivers in supplying drinking water, supporting industry, agriculture, and ecosystems, water quality assessment and pollution quantification are essential for sustainable use. This study evaluated five machine learning models, i.e., Lasso Regression, Random Forest (RF), Gradient...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-09-01
|
| Series: | Results in Engineering |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S259012302502732X |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849233517687341056 |
|---|---|
| author | Elham Fazel Najafabadi Paria Shojaei Mojgan Askarizadeh |
| author_facet | Elham Fazel Najafabadi Paria Shojaei Mojgan Askarizadeh |
| author_sort | Elham Fazel Najafabadi |
| collection | DOAJ |
| description | Given the key role of rivers in supplying drinking water, supporting industry, agriculture, and ecosystems, water quality assessment and pollution quantification are essential for sustainable use. This study evaluated five machine learning models, i.e., Lasso Regression, Random Forest (RF), Gradient Boosting (GB), XGBoost, and Support Vector Machine (SVM) for predicting four water quality parameters—EC (Electrical Conductivity), TDS (Total Dissolved Solids), Sodium Adsorption Ratio (SAR), and TH (Total Hardness)—using data collected over a 31-year period from eight monitoring stations along the Zayandeh Rood River, a vital water source for drinking, agriculture, and industry in the arid region of central Iran. The models were evaluated based on five statistical criteria: R², RMSE, RRMSE, r, and MAE. Two dimensionality reduction techniques—PCA and correlation matrix-based feature reduction—were implemented to enhance model efficiency and mitigate multicollinearity. The findings indicate that the best-performing model for a given parameter varied across stations. However, the differences in evaluation metrics between the best models were quite low in most stations. The GB and SVM models outperformed other models in predicting EC, and TDS (0.80<R²<0.99). However, in predicting SAR, the GB and XGBoost models (0.955<R2<0.999), and in predicting TH, the Lasso and SVM models achieved higher accuracy (0.830<R²<0.996). The Lasso regression model proved to be the most effective for predicting TH at half of the monitoring stations. |
| format | Article |
| id | doaj-art-036f2b0ad4a543fd8359eb9b5bfe8d4e |
| institution | Kabale University |
| issn | 2590-1230 |
| language | English |
| publishDate | 2025-09-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Results in Engineering |
| spelling | doaj-art-036f2b0ad4a543fd8359eb9b5bfe8d4e2025-08-20T05:07:32ZengElsevierResults in Engineering2590-12302025-09-012710666510.1016/j.rineng.2025.106665Comparative analysis of machine learning models for predicting river water quality: a case study of the Zayandeh Rood RiverElham Fazel Najafabadi0Paria Shojaei1Mojgan Askarizadeh2Department of Water Science and Engineering. College of Agriculture, Isfahan University of Technology, Isfahan, Iran; Corresponding author.Department of Architecture and Civil Engineering, University of Bath, Bath, UKDepartment of Computer Engineering, Faculty of Engineering, Ardakan University, Ardakan, Yazd, IranGiven the key role of rivers in supplying drinking water, supporting industry, agriculture, and ecosystems, water quality assessment and pollution quantification are essential for sustainable use. This study evaluated five machine learning models, i.e., Lasso Regression, Random Forest (RF), Gradient Boosting (GB), XGBoost, and Support Vector Machine (SVM) for predicting four water quality parameters—EC (Electrical Conductivity), TDS (Total Dissolved Solids), Sodium Adsorption Ratio (SAR), and TH (Total Hardness)—using data collected over a 31-year period from eight monitoring stations along the Zayandeh Rood River, a vital water source for drinking, agriculture, and industry in the arid region of central Iran. The models were evaluated based on five statistical criteria: R², RMSE, RRMSE, r, and MAE. Two dimensionality reduction techniques—PCA and correlation matrix-based feature reduction—were implemented to enhance model efficiency and mitigate multicollinearity. The findings indicate that the best-performing model for a given parameter varied across stations. However, the differences in evaluation metrics between the best models were quite low in most stations. The GB and SVM models outperformed other models in predicting EC, and TDS (0.80<R²<0.99). However, in predicting SAR, the GB and XGBoost models (0.955<R2<0.999), and in predicting TH, the Lasso and SVM models achieved higher accuracy (0.830<R²<0.996). The Lasso regression model proved to be the most effective for predicting TH at half of the monitoring stations.http://www.sciencedirect.com/science/article/pii/S259012302502732XWater quality predictionMachine learning algorithmsZayandeh Rood RiverSurface water |
| spellingShingle | Elham Fazel Najafabadi Paria Shojaei Mojgan Askarizadeh Comparative analysis of machine learning models for predicting river water quality: a case study of the Zayandeh Rood River Results in Engineering Water quality prediction Machine learning algorithms Zayandeh Rood River Surface water |
| title | Comparative analysis of machine learning models for predicting river water quality: a case study of the Zayandeh Rood River |
| title_full | Comparative analysis of machine learning models for predicting river water quality: a case study of the Zayandeh Rood River |
| title_fullStr | Comparative analysis of machine learning models for predicting river water quality: a case study of the Zayandeh Rood River |
| title_full_unstemmed | Comparative analysis of machine learning models for predicting river water quality: a case study of the Zayandeh Rood River |
| title_short | Comparative analysis of machine learning models for predicting river water quality: a case study of the Zayandeh Rood River |
| title_sort | comparative analysis of machine learning models for predicting river water quality a case study of the zayandeh rood river |
| topic | Water quality prediction Machine learning algorithms Zayandeh Rood River Surface water |
| url | http://www.sciencedirect.com/science/article/pii/S259012302502732X |
| work_keys_str_mv | AT elhamfazelnajafabadi comparativeanalysisofmachinelearningmodelsforpredictingriverwaterqualityacasestudyofthezayandehroodriver AT pariashojaei comparativeanalysisofmachinelearningmodelsforpredictingriverwaterqualityacasestudyofthezayandehroodriver AT mojganaskarizadeh comparativeanalysisofmachinelearningmodelsforpredictingriverwaterqualityacasestudyofthezayandehroodriver |