Advancing lakes algal chlorophyll estimation in the contiguous USA: A comparative study of machine learning models and satellite data

Algal blooms are ubiquitous in lentic ecosystems and pose a risk to human and other organisms' health. Accurate measurement of chlorophyll-a (CHL-a) in lakes at a macroscale is challenging due to the optical complexity of individual water bodies, which hinders the optimization of conventional b...

Full description

Saved in:
Bibliographic Details
Main Authors: Md Mamun, Xiao Yang
Format: Article
Language:English
Published: Elsevier 2025-07-01
Series:Ecological Informatics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1574954125000962
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850182920465022976
author Md Mamun
Xiao Yang
author_facet Md Mamun
Xiao Yang
author_sort Md Mamun
collection DOAJ
description Algal blooms are ubiquitous in lentic ecosystems and pose a risk to human and other organisms' health. Accurate measurement of chlorophyll-a (CHL-a) in lakes at a macroscale is challenging due to the optical complexity of individual water bodies, which hinders the optimization of conventional bio-optical algorithms. This study harnesses the synergy of satellite remote sensing and machine learning (ML) to enhance CHL-a quantification from space. Given the cost and logistical demands of in-situ CHL-a data collection, especially over vast areas, we explore the potential of the open-source AquaSat dataset for CHL-a estimation across the contiguous USA. We assess the performance of four ML algorithms (random forest, extra tree regressor, bagging regressor, and xgboost model), discern the most influential spectral bands and indices, and compare these methods to established remote sensing techniques for CHL-a prediction. Both bagging regressor and random forest performed equally well on all AquaSat data or data from each sensor separately (R2 = 0.35–0.54, RMSE = 20.48–23.90 μg/L). Model-agnostic SHAP summary plots were used to identify important indexes in CHL-a estimation. Spatio-temporal validations demonstrated the models' reliability across diverse conditions, with better generalizability in spatial domains compared to seasonal or yearly transitions. The accuracy of algorithms for estimating CHL-a depends on the satellite sensor. We found that by comparing remote sensing studies with various atmospheric correction approaches, the Landsat collection 1 (LC1) surface reflectance product offers consistent CHL-a estimates throughout the USA. Overall, acknowledging the existing limitations and challenges of such approaches, this research illustrates the potential of utilizing open-source data with ML to facilitate large-scale estimation of lake CHL-a.
format Article
id doaj-art-268bee3f043d4394870aaaadb3538f69
institution OA Journals
issn 1574-9541
language English
publishDate 2025-07-01
publisher Elsevier
record_format Article
series Ecological Informatics
spelling doaj-art-268bee3f043d4394870aaaadb3538f692025-08-20T02:17:29ZengElsevierEcological Informatics1574-95412025-07-018710308710.1016/j.ecoinf.2025.103087Advancing lakes algal chlorophyll estimation in the contiguous USA: A comparative study of machine learning models and satellite dataMd Mamun0Xiao Yang1Corresponding authors.; Roy M. Huffington Department of Earth Sciences, Southern Methodist University, Dallas, Texas 75275–0395, USA.Corresponding authors.; Roy M. Huffington Department of Earth Sciences, Southern Methodist University, Dallas, Texas 75275–0395, USA.Algal blooms are ubiquitous in lentic ecosystems and pose a risk to human and other organisms' health. Accurate measurement of chlorophyll-a (CHL-a) in lakes at a macroscale is challenging due to the optical complexity of individual water bodies, which hinders the optimization of conventional bio-optical algorithms. This study harnesses the synergy of satellite remote sensing and machine learning (ML) to enhance CHL-a quantification from space. Given the cost and logistical demands of in-situ CHL-a data collection, especially over vast areas, we explore the potential of the open-source AquaSat dataset for CHL-a estimation across the contiguous USA. We assess the performance of four ML algorithms (random forest, extra tree regressor, bagging regressor, and xgboost model), discern the most influential spectral bands and indices, and compare these methods to established remote sensing techniques for CHL-a prediction. Both bagging regressor and random forest performed equally well on all AquaSat data or data from each sensor separately (R2 = 0.35–0.54, RMSE = 20.48–23.90 μg/L). Model-agnostic SHAP summary plots were used to identify important indexes in CHL-a estimation. Spatio-temporal validations demonstrated the models' reliability across diverse conditions, with better generalizability in spatial domains compared to seasonal or yearly transitions. The accuracy of algorithms for estimating CHL-a depends on the satellite sensor. We found that by comparing remote sensing studies with various atmospheric correction approaches, the Landsat collection 1 (LC1) surface reflectance product offers consistent CHL-a estimates throughout the USA. Overall, acknowledging the existing limitations and challenges of such approaches, this research illustrates the potential of utilizing open-source data with ML to facilitate large-scale estimation of lake CHL-a.http://www.sciencedirect.com/science/article/pii/S1574954125000962AquaSatChlorophyll-aMachine learningRemote sensingUSA
spellingShingle Md Mamun
Xiao Yang
Advancing lakes algal chlorophyll estimation in the contiguous USA: A comparative study of machine learning models and satellite data
Ecological Informatics
AquaSat
Chlorophyll-a
Machine learning
Remote sensing
USA
title Advancing lakes algal chlorophyll estimation in the contiguous USA: A comparative study of machine learning models and satellite data
title_full Advancing lakes algal chlorophyll estimation in the contiguous USA: A comparative study of machine learning models and satellite data
title_fullStr Advancing lakes algal chlorophyll estimation in the contiguous USA: A comparative study of machine learning models and satellite data
title_full_unstemmed Advancing lakes algal chlorophyll estimation in the contiguous USA: A comparative study of machine learning models and satellite data
title_short Advancing lakes algal chlorophyll estimation in the contiguous USA: A comparative study of machine learning models and satellite data
title_sort advancing lakes algal chlorophyll estimation in the contiguous usa a comparative study of machine learning models and satellite data
topic AquaSat
Chlorophyll-a
Machine learning
Remote sensing
USA
url http://www.sciencedirect.com/science/article/pii/S1574954125000962
work_keys_str_mv AT mdmamun advancinglakesalgalchlorophyllestimationinthecontiguoususaacomparativestudyofmachinelearningmodelsandsatellitedata
AT xiaoyang advancinglakesalgalchlorophyllestimationinthecontiguoususaacomparativestudyofmachinelearningmodelsandsatellitedata