Optimizing machine learning methods for groundwater quality prediction: Case study in District Bagh, Azad Kashmir, Pakistan
Groundwater quality monitoring is crucial for protecting the environment and human health. Machine learning (ML) offers substantial potential for enhancing groundwater quality prediction, classification, and identification of pollution indicators. This study evaluates various base ML algorithms and...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-09-01
|
| Series: | Ecotoxicology and Environmental Safety |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S0147651325009558 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849391471263744000 |
|---|---|
| author | Usman Basharat Wenjing Zhang Cuihong Han Shoukat Husain Khan Arshad Abbasi Sehrish Mahroof Shuxin Li |
| author_facet | Usman Basharat Wenjing Zhang Cuihong Han Shoukat Husain Khan Arshad Abbasi Sehrish Mahroof Shuxin Li |
| author_sort | Usman Basharat |
| collection | DOAJ |
| description | Groundwater quality monitoring is crucial for protecting the environment and human health. Machine learning (ML) offers substantial potential for enhancing groundwater quality prediction, classification, and identification of pollution indicators. This study evaluates various base ML algorithms and stacking ensemble classifiers (meta-classifiers) using data from 90 groundwater samples collected in District Bagh, Azad Kashmir, Pakistan. The aim was to establish a reliable method for predicting groundwater quality classification. Six supervised machine learning classifiers were utilized, namely Logistic Regression (LR), K-Nearest Neighbours (KNN), Decision Trees (DT), Support Vector Machines (SVM), Random Forest (RF), and Extreme Gradient Boosting (XGB). These classifiers, along with their corresponding meta-classifiers (Meta-LR, Meta-KNN, Meta-DT, Meta-SVM, Meta-RF, and Meta-XGB), were developed and compared to evaluate their effectiveness in classifying and predicting groundwater quality. Evaluation metrics such as precision, recall, F1-score, accuracy, R2, RMSE and ROC curves were used to assess classifiers' performance. Among all the classifiers, SVM and its meta-classifier (Meta-SVM) emerged as the most effective, achieving the highest accuracy score of 0.85–0.89, F1-score (0.88–0.89), R2 (0.88–1), RMSE (6.72), and Area Under the Curve (AUC) of 0.795. Meta-classifiers achieved better performance than base models for LR (0.85–0.92), SVM (0.88–1.00), and XGB (0.52–0.89). The study also identified key pollution indicators influencing groundwater quality in the area, such as Total Dissolved Solids (TDS), Sulphate (SO4), and Nitrate (NO3). These indicators showed an increasing trend over time. The research highlights the potential of ML techniques, particularly SVM and meta-SVM, in predicting groundwater quality based on key pollution indicators. The findings underscore the importance of ongoing monitoring and predictive modeling in managing groundwater resources effectively and mitigating pollution impacts. Future applications could refine models and expand datasets to enhance predictive accuracy and applicability across regions and conditions. |
| format | Article |
| id | doaj-art-e82405ecda7a4312b8919ac7d33ee534 |
| institution | Kabale University |
| issn | 0147-6513 |
| language | English |
| publishDate | 2025-09-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Ecotoxicology and Environmental Safety |
| spelling | doaj-art-e82405ecda7a4312b8919ac7d33ee5342025-08-20T03:41:04ZengElsevierEcotoxicology and Environmental Safety0147-65132025-09-0130211861010.1016/j.ecoenv.2025.118610Optimizing machine learning methods for groundwater quality prediction: Case study in District Bagh, Azad Kashmir, PakistanUsman Basharat0Wenjing Zhang1Cuihong Han2Shoukat Husain Khan3Arshad Abbasi4Sehrish Mahroof5Shuxin Li6Key Laboratory of Groundwater Resources and Environment (Jilin University), Ministry of Education, Changchun 130021, China; College of New Energy and Environment, Jilin University, Changchun 130021, ChinaKey Laboratory of Groundwater Resources and Environment (Jilin University), Ministry of Education, Changchun 130021, China; College of New Energy and Environment, Jilin University, Changchun 130021, China; Corresponding author at: Key Laboratory of Groundwater Resources and Environment (Jilin University), Ministry of Education, Changchun 130021, China.Key Laboratory of Groundwater Resources and Environment (Jilin University), Ministry of Education, Changchun 130021, China; College of New Energy and Environment, Jilin University, Changchun 130021, ChinaSchool of Earth and Space Sciences, University of Science and Technology of China, Hefei, Anhui 230026, ChinaCollege of New Energy and Environment, Jilin University, Changchun 130021, ChinaInstitute of Grassland Science, Key Laboratory of Vegetation Ecology of the Ministry of Education, Jilin Songnen Grassland Ecosystem National Observation and Research Station, Northeast Normal University, Changchun 130024, ChinaKey Laboratory of Groundwater Resources and Environment (Jilin University), Ministry of Education, Changchun 130021, China; College of New Energy and Environment, Jilin University, Changchun 130021, ChinaGroundwater quality monitoring is crucial for protecting the environment and human health. Machine learning (ML) offers substantial potential for enhancing groundwater quality prediction, classification, and identification of pollution indicators. This study evaluates various base ML algorithms and stacking ensemble classifiers (meta-classifiers) using data from 90 groundwater samples collected in District Bagh, Azad Kashmir, Pakistan. The aim was to establish a reliable method for predicting groundwater quality classification. Six supervised machine learning classifiers were utilized, namely Logistic Regression (LR), K-Nearest Neighbours (KNN), Decision Trees (DT), Support Vector Machines (SVM), Random Forest (RF), and Extreme Gradient Boosting (XGB). These classifiers, along with their corresponding meta-classifiers (Meta-LR, Meta-KNN, Meta-DT, Meta-SVM, Meta-RF, and Meta-XGB), were developed and compared to evaluate their effectiveness in classifying and predicting groundwater quality. Evaluation metrics such as precision, recall, F1-score, accuracy, R2, RMSE and ROC curves were used to assess classifiers' performance. Among all the classifiers, SVM and its meta-classifier (Meta-SVM) emerged as the most effective, achieving the highest accuracy score of 0.85–0.89, F1-score (0.88–0.89), R2 (0.88–1), RMSE (6.72), and Area Under the Curve (AUC) of 0.795. Meta-classifiers achieved better performance than base models for LR (0.85–0.92), SVM (0.88–1.00), and XGB (0.52–0.89). The study also identified key pollution indicators influencing groundwater quality in the area, such as Total Dissolved Solids (TDS), Sulphate (SO4), and Nitrate (NO3). These indicators showed an increasing trend over time. The research highlights the potential of ML techniques, particularly SVM and meta-SVM, in predicting groundwater quality based on key pollution indicators. The findings underscore the importance of ongoing monitoring and predictive modeling in managing groundwater resources effectively and mitigating pollution impacts. Future applications could refine models and expand datasets to enhance predictive accuracy and applicability across regions and conditions.http://www.sciencedirect.com/science/article/pii/S0147651325009558Groundwater qualityMachine learningKey pollution indicatorsFeature importanceStacking ensemble learningClassification algorithms |
| spellingShingle | Usman Basharat Wenjing Zhang Cuihong Han Shoukat Husain Khan Arshad Abbasi Sehrish Mahroof Shuxin Li Optimizing machine learning methods for groundwater quality prediction: Case study in District Bagh, Azad Kashmir, Pakistan Ecotoxicology and Environmental Safety Groundwater quality Machine learning Key pollution indicators Feature importance Stacking ensemble learning Classification algorithms |
| title | Optimizing machine learning methods for groundwater quality prediction: Case study in District Bagh, Azad Kashmir, Pakistan |
| title_full | Optimizing machine learning methods for groundwater quality prediction: Case study in District Bagh, Azad Kashmir, Pakistan |
| title_fullStr | Optimizing machine learning methods for groundwater quality prediction: Case study in District Bagh, Azad Kashmir, Pakistan |
| title_full_unstemmed | Optimizing machine learning methods for groundwater quality prediction: Case study in District Bagh, Azad Kashmir, Pakistan |
| title_short | Optimizing machine learning methods for groundwater quality prediction: Case study in District Bagh, Azad Kashmir, Pakistan |
| title_sort | optimizing machine learning methods for groundwater quality prediction case study in district bagh azad kashmir pakistan |
| topic | Groundwater quality Machine learning Key pollution indicators Feature importance Stacking ensemble learning Classification algorithms |
| url | http://www.sciencedirect.com/science/article/pii/S0147651325009558 |
| work_keys_str_mv | AT usmanbasharat optimizingmachinelearningmethodsforgroundwaterqualitypredictioncasestudyindistrictbaghazadkashmirpakistan AT wenjingzhang optimizingmachinelearningmethodsforgroundwaterqualitypredictioncasestudyindistrictbaghazadkashmirpakistan AT cuihonghan optimizingmachinelearningmethodsforgroundwaterqualitypredictioncasestudyindistrictbaghazadkashmirpakistan AT shoukathusainkhan optimizingmachinelearningmethodsforgroundwaterqualitypredictioncasestudyindistrictbaghazadkashmirpakistan AT arshadabbasi optimizingmachinelearningmethodsforgroundwaterqualitypredictioncasestudyindistrictbaghazadkashmirpakistan AT sehrishmahroof optimizingmachinelearningmethodsforgroundwaterqualitypredictioncasestudyindistrictbaghazadkashmirpakistan AT shuxinli optimizingmachinelearningmethodsforgroundwaterqualitypredictioncasestudyindistrictbaghazadkashmirpakistan |