Optimizing machine learning methods for groundwater quality prediction: Case study in District Bagh, Azad Kashmir, Pakistan

Groundwater quality monitoring is crucial for protecting the environment and human health. Machine learning (ML) offers substantial potential for enhancing groundwater quality prediction, classification, and identification of pollution indicators. This study evaluates various base ML algorithms and...

Full description

Saved in:
Bibliographic Details
Main Authors: Usman Basharat, Wenjing Zhang, Cuihong Han, Shoukat Husain Khan, Arshad Abbasi, Sehrish Mahroof, Shuxin Li
Format: Article
Language:English
Published: Elsevier 2025-09-01
Series:Ecotoxicology and Environmental Safety
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S0147651325009558
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849391471263744000
author Usman Basharat
Wenjing Zhang
Cuihong Han
Shoukat Husain Khan
Arshad Abbasi
Sehrish Mahroof
Shuxin Li
author_facet Usman Basharat
Wenjing Zhang
Cuihong Han
Shoukat Husain Khan
Arshad Abbasi
Sehrish Mahroof
Shuxin Li
author_sort Usman Basharat
collection DOAJ
description Groundwater quality monitoring is crucial for protecting the environment and human health. Machine learning (ML) offers substantial potential for enhancing groundwater quality prediction, classification, and identification of pollution indicators. This study evaluates various base ML algorithms and stacking ensemble classifiers (meta-classifiers) using data from 90 groundwater samples collected in District Bagh, Azad Kashmir, Pakistan. The aim was to establish a reliable method for predicting groundwater quality classification. Six supervised machine learning classifiers were utilized, namely Logistic Regression (LR), K-Nearest Neighbours (KNN), Decision Trees (DT), Support Vector Machines (SVM), Random Forest (RF), and Extreme Gradient Boosting (XGB). These classifiers, along with their corresponding meta-classifiers (Meta-LR, Meta-KNN, Meta-DT, Meta-SVM, Meta-RF, and Meta-XGB), were developed and compared to evaluate their effectiveness in classifying and predicting groundwater quality. Evaluation metrics such as precision, recall, F1-score, accuracy, R2, RMSE and ROC curves were used to assess classifiers' performance. Among all the classifiers, SVM and its meta-classifier (Meta-SVM) emerged as the most effective, achieving the highest accuracy score of 0.85–0.89, F1-score (0.88–0.89), R2 (0.88–1), RMSE (6.72), and Area Under the Curve (AUC) of 0.795. Meta-classifiers achieved better performance than base models for LR (0.85–0.92), SVM (0.88–1.00), and XGB (0.52–0.89). The study also identified key pollution indicators influencing groundwater quality in the area, such as Total Dissolved Solids (TDS), Sulphate (SO4), and Nitrate (NO3). These indicators showed an increasing trend over time. The research highlights the potential of ML techniques, particularly SVM and meta-SVM, in predicting groundwater quality based on key pollution indicators. The findings underscore the importance of ongoing monitoring and predictive modeling in managing groundwater resources effectively and mitigating pollution impacts. Future applications could refine models and expand datasets to enhance predictive accuracy and applicability across regions and conditions.
format Article
id doaj-art-e82405ecda7a4312b8919ac7d33ee534
institution Kabale University
issn 0147-6513
language English
publishDate 2025-09-01
publisher Elsevier
record_format Article
series Ecotoxicology and Environmental Safety
spelling doaj-art-e82405ecda7a4312b8919ac7d33ee5342025-08-20T03:41:04ZengElsevierEcotoxicology and Environmental Safety0147-65132025-09-0130211861010.1016/j.ecoenv.2025.118610Optimizing machine learning methods for groundwater quality prediction: Case study in District Bagh, Azad Kashmir, PakistanUsman Basharat0Wenjing Zhang1Cuihong Han2Shoukat Husain Khan3Arshad Abbasi4Sehrish Mahroof5Shuxin Li6Key Laboratory of Groundwater Resources and Environment (Jilin University), Ministry of Education, Changchun 130021, China; College of New Energy and Environment, Jilin University, Changchun 130021, ChinaKey Laboratory of Groundwater Resources and Environment (Jilin University), Ministry of Education, Changchun 130021, China; College of New Energy and Environment, Jilin University, Changchun 130021, China; Corresponding author at: Key Laboratory of Groundwater Resources and Environment (Jilin University), Ministry of Education, Changchun 130021, China.Key Laboratory of Groundwater Resources and Environment (Jilin University), Ministry of Education, Changchun 130021, China; College of New Energy and Environment, Jilin University, Changchun 130021, ChinaSchool of Earth and Space Sciences, University of Science and Technology of China, Hefei, Anhui 230026, ChinaCollege of New Energy and Environment, Jilin University, Changchun 130021, ChinaInstitute of Grassland Science, Key Laboratory of Vegetation Ecology of the Ministry of Education, Jilin Songnen Grassland Ecosystem National Observation and Research Station, Northeast Normal University, Changchun 130024, ChinaKey Laboratory of Groundwater Resources and Environment (Jilin University), Ministry of Education, Changchun 130021, China; College of New Energy and Environment, Jilin University, Changchun 130021, ChinaGroundwater quality monitoring is crucial for protecting the environment and human health. Machine learning (ML) offers substantial potential for enhancing groundwater quality prediction, classification, and identification of pollution indicators. This study evaluates various base ML algorithms and stacking ensemble classifiers (meta-classifiers) using data from 90 groundwater samples collected in District Bagh, Azad Kashmir, Pakistan. The aim was to establish a reliable method for predicting groundwater quality classification. Six supervised machine learning classifiers were utilized, namely Logistic Regression (LR), K-Nearest Neighbours (KNN), Decision Trees (DT), Support Vector Machines (SVM), Random Forest (RF), and Extreme Gradient Boosting (XGB). These classifiers, along with their corresponding meta-classifiers (Meta-LR, Meta-KNN, Meta-DT, Meta-SVM, Meta-RF, and Meta-XGB), were developed and compared to evaluate their effectiveness in classifying and predicting groundwater quality. Evaluation metrics such as precision, recall, F1-score, accuracy, R2, RMSE and ROC curves were used to assess classifiers' performance. Among all the classifiers, SVM and its meta-classifier (Meta-SVM) emerged as the most effective, achieving the highest accuracy score of 0.85–0.89, F1-score (0.88–0.89), R2 (0.88–1), RMSE (6.72), and Area Under the Curve (AUC) of 0.795. Meta-classifiers achieved better performance than base models for LR (0.85–0.92), SVM (0.88–1.00), and XGB (0.52–0.89). The study also identified key pollution indicators influencing groundwater quality in the area, such as Total Dissolved Solids (TDS), Sulphate (SO4), and Nitrate (NO3). These indicators showed an increasing trend over time. The research highlights the potential of ML techniques, particularly SVM and meta-SVM, in predicting groundwater quality based on key pollution indicators. The findings underscore the importance of ongoing monitoring and predictive modeling in managing groundwater resources effectively and mitigating pollution impacts. Future applications could refine models and expand datasets to enhance predictive accuracy and applicability across regions and conditions.http://www.sciencedirect.com/science/article/pii/S0147651325009558Groundwater qualityMachine learningKey pollution indicatorsFeature importanceStacking ensemble learningClassification algorithms
spellingShingle Usman Basharat
Wenjing Zhang
Cuihong Han
Shoukat Husain Khan
Arshad Abbasi
Sehrish Mahroof
Shuxin Li
Optimizing machine learning methods for groundwater quality prediction: Case study in District Bagh, Azad Kashmir, Pakistan
Ecotoxicology and Environmental Safety
Groundwater quality
Machine learning
Key pollution indicators
Feature importance
Stacking ensemble learning
Classification algorithms
title Optimizing machine learning methods for groundwater quality prediction: Case study in District Bagh, Azad Kashmir, Pakistan
title_full Optimizing machine learning methods for groundwater quality prediction: Case study in District Bagh, Azad Kashmir, Pakistan
title_fullStr Optimizing machine learning methods for groundwater quality prediction: Case study in District Bagh, Azad Kashmir, Pakistan
title_full_unstemmed Optimizing machine learning methods for groundwater quality prediction: Case study in District Bagh, Azad Kashmir, Pakistan
title_short Optimizing machine learning methods for groundwater quality prediction: Case study in District Bagh, Azad Kashmir, Pakistan
title_sort optimizing machine learning methods for groundwater quality prediction case study in district bagh azad kashmir pakistan
topic Groundwater quality
Machine learning
Key pollution indicators
Feature importance
Stacking ensemble learning
Classification algorithms
url http://www.sciencedirect.com/science/article/pii/S0147651325009558
work_keys_str_mv AT usmanbasharat optimizingmachinelearningmethodsforgroundwaterqualitypredictioncasestudyindistrictbaghazadkashmirpakistan
AT wenjingzhang optimizingmachinelearningmethodsforgroundwaterqualitypredictioncasestudyindistrictbaghazadkashmirpakistan
AT cuihonghan optimizingmachinelearningmethodsforgroundwaterqualitypredictioncasestudyindistrictbaghazadkashmirpakistan
AT shoukathusainkhan optimizingmachinelearningmethodsforgroundwaterqualitypredictioncasestudyindistrictbaghazadkashmirpakistan
AT arshadabbasi optimizingmachinelearningmethodsforgroundwaterqualitypredictioncasestudyindistrictbaghazadkashmirpakistan
AT sehrishmahroof optimizingmachinelearningmethodsforgroundwaterqualitypredictioncasestudyindistrictbaghazadkashmirpakistan
AT shuxinli optimizingmachinelearningmethodsforgroundwaterqualitypredictioncasestudyindistrictbaghazadkashmirpakistan