GeoTemporal clustering for aquifer delineation: a big data approach to synchronizing and analyzing variable-length groundwater time series

Abstract Groundwater is a vital global resource. However, mapping aquifers remains challenging, particularly in developing nations. This study proposes a novel methodology for aquifer delineation using time-series clustering of groundwater-level data. The modular clustering framework utilizes hierar...

Full description

Saved in:
Bibliographic Details
Main Authors: Khalid ElHaj, Dalal Alshamsi
Format: Article
Language:English
Published: SpringerOpen 2025-02-01
Series:Journal of Big Data
Subjects:
Online Access:https://doi.org/10.1186/s40537-025-01060-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823861970256789504
author Khalid ElHaj
Dalal Alshamsi
author_facet Khalid ElHaj
Dalal Alshamsi
author_sort Khalid ElHaj
collection DOAJ
description Abstract Groundwater is a vital global resource. However, mapping aquifers remains challenging, particularly in developing nations. This study proposes a novel methodology for aquifer delineation using time-series clustering of groundwater-level data. The modular clustering framework utilizes hierarchical agglomerative clustering and a custom hydrology-specific distance function. This accounts for the variability in the length, temporal position, and consistency of the time series, in addition to gaps in records, aligning them temporally before comparison. Advantages over traditional techniques such as dynamic time warping, and Euclidean distance are provided for analyzing real-world hydrological data. The algorithm was optimized on a synthetic Texas aquifer dataset to identify the minimum time series lengths required for accurate clustering (> 90% accuracy). Applying this to real data from the Texas Groundwater Database GWDB with over one million readings and 60,000 wells, the modeling achieved ~ 73% accuracy, delineating the nine major Texan aquifers using a filtered number of 74 representative wells. The aquifer boundaries were geographically visualized using the GeoZ library. These findings suggest the effectiveness of groundwater characterization given the limited data. The optimized algorithm could provide inexpensive mapping capabilities in developing nations, requiring only historical data from existing wells over the decades. This technique is adaptive and can be improved through ongoing monitoring. The algorithm components are modular and upgradable thus future studies should optimize and test their generalizability using additional datasets.
format Article
id doaj-art-729c0e362392474dbbad5b1f5a077260
institution Kabale University
issn 2196-1115
language English
publishDate 2025-02-01
publisher SpringerOpen
record_format Article
series Journal of Big Data
spelling doaj-art-729c0e362392474dbbad5b1f5a0772602025-02-09T12:41:16ZengSpringerOpenJournal of Big Data2196-11152025-02-0112113710.1186/s40537-025-01060-6GeoTemporal clustering for aquifer delineation: a big data approach to synchronizing and analyzing variable-length groundwater time seriesKhalid ElHaj0Dalal Alshamsi1Department of Geosciences, United Arab Emirates UniversityDepartment of Geosciences, United Arab Emirates UniversityAbstract Groundwater is a vital global resource. However, mapping aquifers remains challenging, particularly in developing nations. This study proposes a novel methodology for aquifer delineation using time-series clustering of groundwater-level data. The modular clustering framework utilizes hierarchical agglomerative clustering and a custom hydrology-specific distance function. This accounts for the variability in the length, temporal position, and consistency of the time series, in addition to gaps in records, aligning them temporally before comparison. Advantages over traditional techniques such as dynamic time warping, and Euclidean distance are provided for analyzing real-world hydrological data. The algorithm was optimized on a synthetic Texas aquifer dataset to identify the minimum time series lengths required for accurate clustering (> 90% accuracy). Applying this to real data from the Texas Groundwater Database GWDB with over one million readings and 60,000 wells, the modeling achieved ~ 73% accuracy, delineating the nine major Texan aquifers using a filtered number of 74 representative wells. The aquifer boundaries were geographically visualized using the GeoZ library. These findings suggest the effectiveness of groundwater characterization given the limited data. The optimized algorithm could provide inexpensive mapping capabilities in developing nations, requiring only historical data from existing wells over the decades. This technique is adaptive and can be improved through ongoing monitoring. The algorithm components are modular and upgradable thus future studies should optimize and test their generalizability using additional datasets.https://doi.org/10.1186/s40537-025-01060-6Groundwater managementAquifer delineationMachine learningTime-series clusteringHydrogeologyTexas
spellingShingle Khalid ElHaj
Dalal Alshamsi
GeoTemporal clustering for aquifer delineation: a big data approach to synchronizing and analyzing variable-length groundwater time series
Journal of Big Data
Groundwater management
Aquifer delineation
Machine learning
Time-series clustering
Hydrogeology
Texas
title GeoTemporal clustering for aquifer delineation: a big data approach to synchronizing and analyzing variable-length groundwater time series
title_full GeoTemporal clustering for aquifer delineation: a big data approach to synchronizing and analyzing variable-length groundwater time series
title_fullStr GeoTemporal clustering for aquifer delineation: a big data approach to synchronizing and analyzing variable-length groundwater time series
title_full_unstemmed GeoTemporal clustering for aquifer delineation: a big data approach to synchronizing and analyzing variable-length groundwater time series
title_short GeoTemporal clustering for aquifer delineation: a big data approach to synchronizing and analyzing variable-length groundwater time series
title_sort geotemporal clustering for aquifer delineation a big data approach to synchronizing and analyzing variable length groundwater time series
topic Groundwater management
Aquifer delineation
Machine learning
Time-series clustering
Hydrogeology
Texas
url https://doi.org/10.1186/s40537-025-01060-6
work_keys_str_mv AT khalidelhaj geotemporalclusteringforaquiferdelineationabigdataapproachtosynchronizingandanalyzingvariablelengthgroundwatertimeseries
AT dalalalshamsi geotemporalclusteringforaquiferdelineationabigdataapproachtosynchronizingandanalyzingvariablelengthgroundwatertimeseries