GeoTemporal clustering for aquifer delineation: a big data approach to synchronizing and analyzing variable-length groundwater time series
Abstract Groundwater is a vital global resource. However, mapping aquifers remains challenging, particularly in developing nations. This study proposes a novel methodology for aquifer delineation using time-series clustering of groundwater-level data. The modular clustering framework utilizes hierar...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2025-02-01
|
Series: | Journal of Big Data |
Subjects: | |
Online Access: | https://doi.org/10.1186/s40537-025-01060-6 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823861970256789504 |
---|---|
author | Khalid ElHaj Dalal Alshamsi |
author_facet | Khalid ElHaj Dalal Alshamsi |
author_sort | Khalid ElHaj |
collection | DOAJ |
description | Abstract Groundwater is a vital global resource. However, mapping aquifers remains challenging, particularly in developing nations. This study proposes a novel methodology for aquifer delineation using time-series clustering of groundwater-level data. The modular clustering framework utilizes hierarchical agglomerative clustering and a custom hydrology-specific distance function. This accounts for the variability in the length, temporal position, and consistency of the time series, in addition to gaps in records, aligning them temporally before comparison. Advantages over traditional techniques such as dynamic time warping, and Euclidean distance are provided for analyzing real-world hydrological data. The algorithm was optimized on a synthetic Texas aquifer dataset to identify the minimum time series lengths required for accurate clustering (> 90% accuracy). Applying this to real data from the Texas Groundwater Database GWDB with over one million readings and 60,000 wells, the modeling achieved ~ 73% accuracy, delineating the nine major Texan aquifers using a filtered number of 74 representative wells. The aquifer boundaries were geographically visualized using the GeoZ library. These findings suggest the effectiveness of groundwater characterization given the limited data. The optimized algorithm could provide inexpensive mapping capabilities in developing nations, requiring only historical data from existing wells over the decades. This technique is adaptive and can be improved through ongoing monitoring. The algorithm components are modular and upgradable thus future studies should optimize and test their generalizability using additional datasets. |
format | Article |
id | doaj-art-729c0e362392474dbbad5b1f5a077260 |
institution | Kabale University |
issn | 2196-1115 |
language | English |
publishDate | 2025-02-01 |
publisher | SpringerOpen |
record_format | Article |
series | Journal of Big Data |
spelling | doaj-art-729c0e362392474dbbad5b1f5a0772602025-02-09T12:41:16ZengSpringerOpenJournal of Big Data2196-11152025-02-0112113710.1186/s40537-025-01060-6GeoTemporal clustering for aquifer delineation: a big data approach to synchronizing and analyzing variable-length groundwater time seriesKhalid ElHaj0Dalal Alshamsi1Department of Geosciences, United Arab Emirates UniversityDepartment of Geosciences, United Arab Emirates UniversityAbstract Groundwater is a vital global resource. However, mapping aquifers remains challenging, particularly in developing nations. This study proposes a novel methodology for aquifer delineation using time-series clustering of groundwater-level data. The modular clustering framework utilizes hierarchical agglomerative clustering and a custom hydrology-specific distance function. This accounts for the variability in the length, temporal position, and consistency of the time series, in addition to gaps in records, aligning them temporally before comparison. Advantages over traditional techniques such as dynamic time warping, and Euclidean distance are provided for analyzing real-world hydrological data. The algorithm was optimized on a synthetic Texas aquifer dataset to identify the minimum time series lengths required for accurate clustering (> 90% accuracy). Applying this to real data from the Texas Groundwater Database GWDB with over one million readings and 60,000 wells, the modeling achieved ~ 73% accuracy, delineating the nine major Texan aquifers using a filtered number of 74 representative wells. The aquifer boundaries were geographically visualized using the GeoZ library. These findings suggest the effectiveness of groundwater characterization given the limited data. The optimized algorithm could provide inexpensive mapping capabilities in developing nations, requiring only historical data from existing wells over the decades. This technique is adaptive and can be improved through ongoing monitoring. The algorithm components are modular and upgradable thus future studies should optimize and test their generalizability using additional datasets.https://doi.org/10.1186/s40537-025-01060-6Groundwater managementAquifer delineationMachine learningTime-series clusteringHydrogeologyTexas |
spellingShingle | Khalid ElHaj Dalal Alshamsi GeoTemporal clustering for aquifer delineation: a big data approach to synchronizing and analyzing variable-length groundwater time series Journal of Big Data Groundwater management Aquifer delineation Machine learning Time-series clustering Hydrogeology Texas |
title | GeoTemporal clustering for aquifer delineation: a big data approach to synchronizing and analyzing variable-length groundwater time series |
title_full | GeoTemporal clustering for aquifer delineation: a big data approach to synchronizing and analyzing variable-length groundwater time series |
title_fullStr | GeoTemporal clustering for aquifer delineation: a big data approach to synchronizing and analyzing variable-length groundwater time series |
title_full_unstemmed | GeoTemporal clustering for aquifer delineation: a big data approach to synchronizing and analyzing variable-length groundwater time series |
title_short | GeoTemporal clustering for aquifer delineation: a big data approach to synchronizing and analyzing variable-length groundwater time series |
title_sort | geotemporal clustering for aquifer delineation a big data approach to synchronizing and analyzing variable length groundwater time series |
topic | Groundwater management Aquifer delineation Machine learning Time-series clustering Hydrogeology Texas |
url | https://doi.org/10.1186/s40537-025-01060-6 |
work_keys_str_mv | AT khalidelhaj geotemporalclusteringforaquiferdelineationabigdataapproachtosynchronizingandanalyzingvariablelengthgroundwatertimeseries AT dalalalshamsi geotemporalclusteringforaquiferdelineationabigdataapproachtosynchronizingandanalyzingvariablelengthgroundwatertimeseries |