Optimizing machine learning methods for groundwater quality prediction: Case study in District Bagh, Azad Kashmir, Pakistan

Groundwater quality monitoring is crucial for protecting the environment and human health. Machine learning (ML) offers substantial potential for enhancing groundwater quality prediction, classification, and identification of pollution indicators. This study evaluates various base ML algorithms and...

Full description

Saved in:
Bibliographic Details
Main Authors: Usman Basharat, Wenjing Zhang, Cuihong Han, Shoukat Husain Khan, Arshad Abbasi, Sehrish Mahroof, Shuxin Li
Format: Article
Language:English
Published: Elsevier 2025-09-01
Series:Ecotoxicology and Environmental Safety
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S0147651325009558
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Groundwater quality monitoring is crucial for protecting the environment and human health. Machine learning (ML) offers substantial potential for enhancing groundwater quality prediction, classification, and identification of pollution indicators. This study evaluates various base ML algorithms and stacking ensemble classifiers (meta-classifiers) using data from 90 groundwater samples collected in District Bagh, Azad Kashmir, Pakistan. The aim was to establish a reliable method for predicting groundwater quality classification. Six supervised machine learning classifiers were utilized, namely Logistic Regression (LR), K-Nearest Neighbours (KNN), Decision Trees (DT), Support Vector Machines (SVM), Random Forest (RF), and Extreme Gradient Boosting (XGB). These classifiers, along with their corresponding meta-classifiers (Meta-LR, Meta-KNN, Meta-DT, Meta-SVM, Meta-RF, and Meta-XGB), were developed and compared to evaluate their effectiveness in classifying and predicting groundwater quality. Evaluation metrics such as precision, recall, F1-score, accuracy, R2, RMSE and ROC curves were used to assess classifiers' performance. Among all the classifiers, SVM and its meta-classifier (Meta-SVM) emerged as the most effective, achieving the highest accuracy score of 0.85–0.89, F1-score (0.88–0.89), R2 (0.88–1), RMSE (6.72), and Area Under the Curve (AUC) of 0.795. Meta-classifiers achieved better performance than base models for LR (0.85–0.92), SVM (0.88–1.00), and XGB (0.52–0.89). The study also identified key pollution indicators influencing groundwater quality in the area, such as Total Dissolved Solids (TDS), Sulphate (SO4), and Nitrate (NO3). These indicators showed an increasing trend over time. The research highlights the potential of ML techniques, particularly SVM and meta-SVM, in predicting groundwater quality based on key pollution indicators. The findings underscore the importance of ongoing monitoring and predictive modeling in managing groundwater resources effectively and mitigating pollution impacts. Future applications could refine models and expand datasets to enhance predictive accuracy and applicability across regions and conditions.
ISSN:0147-6513