Advancing lakes algal chlorophyll estimation in the contiguous USA: A comparative study of machine learning models and satellite data
Algal blooms are ubiquitous in lentic ecosystems and pose a risk to human and other organisms' health. Accurate measurement of chlorophyll-a (CHL-a) in lakes at a macroscale is challenging due to the optical complexity of individual water bodies, which hinders the optimization of conventional b...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-07-01
|
| Series: | Ecological Informatics |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S1574954125000962 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850182920465022976 |
|---|---|
| author | Md Mamun Xiao Yang |
| author_facet | Md Mamun Xiao Yang |
| author_sort | Md Mamun |
| collection | DOAJ |
| description | Algal blooms are ubiquitous in lentic ecosystems and pose a risk to human and other organisms' health. Accurate measurement of chlorophyll-a (CHL-a) in lakes at a macroscale is challenging due to the optical complexity of individual water bodies, which hinders the optimization of conventional bio-optical algorithms. This study harnesses the synergy of satellite remote sensing and machine learning (ML) to enhance CHL-a quantification from space. Given the cost and logistical demands of in-situ CHL-a data collection, especially over vast areas, we explore the potential of the open-source AquaSat dataset for CHL-a estimation across the contiguous USA. We assess the performance of four ML algorithms (random forest, extra tree regressor, bagging regressor, and xgboost model), discern the most influential spectral bands and indices, and compare these methods to established remote sensing techniques for CHL-a prediction. Both bagging regressor and random forest performed equally well on all AquaSat data or data from each sensor separately (R2 = 0.35–0.54, RMSE = 20.48–23.90 μg/L). Model-agnostic SHAP summary plots were used to identify important indexes in CHL-a estimation. Spatio-temporal validations demonstrated the models' reliability across diverse conditions, with better generalizability in spatial domains compared to seasonal or yearly transitions. The accuracy of algorithms for estimating CHL-a depends on the satellite sensor. We found that by comparing remote sensing studies with various atmospheric correction approaches, the Landsat collection 1 (LC1) surface reflectance product offers consistent CHL-a estimates throughout the USA. Overall, acknowledging the existing limitations and challenges of such approaches, this research illustrates the potential of utilizing open-source data with ML to facilitate large-scale estimation of lake CHL-a. |
| format | Article |
| id | doaj-art-268bee3f043d4394870aaaadb3538f69 |
| institution | OA Journals |
| issn | 1574-9541 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Ecological Informatics |
| spelling | doaj-art-268bee3f043d4394870aaaadb3538f692025-08-20T02:17:29ZengElsevierEcological Informatics1574-95412025-07-018710308710.1016/j.ecoinf.2025.103087Advancing lakes algal chlorophyll estimation in the contiguous USA: A comparative study of machine learning models and satellite dataMd Mamun0Xiao Yang1Corresponding authors.; Roy M. Huffington Department of Earth Sciences, Southern Methodist University, Dallas, Texas 75275–0395, USA.Corresponding authors.; Roy M. Huffington Department of Earth Sciences, Southern Methodist University, Dallas, Texas 75275–0395, USA.Algal blooms are ubiquitous in lentic ecosystems and pose a risk to human and other organisms' health. Accurate measurement of chlorophyll-a (CHL-a) in lakes at a macroscale is challenging due to the optical complexity of individual water bodies, which hinders the optimization of conventional bio-optical algorithms. This study harnesses the synergy of satellite remote sensing and machine learning (ML) to enhance CHL-a quantification from space. Given the cost and logistical demands of in-situ CHL-a data collection, especially over vast areas, we explore the potential of the open-source AquaSat dataset for CHL-a estimation across the contiguous USA. We assess the performance of four ML algorithms (random forest, extra tree regressor, bagging regressor, and xgboost model), discern the most influential spectral bands and indices, and compare these methods to established remote sensing techniques for CHL-a prediction. Both bagging regressor and random forest performed equally well on all AquaSat data or data from each sensor separately (R2 = 0.35–0.54, RMSE = 20.48–23.90 μg/L). Model-agnostic SHAP summary plots were used to identify important indexes in CHL-a estimation. Spatio-temporal validations demonstrated the models' reliability across diverse conditions, with better generalizability in spatial domains compared to seasonal or yearly transitions. The accuracy of algorithms for estimating CHL-a depends on the satellite sensor. We found that by comparing remote sensing studies with various atmospheric correction approaches, the Landsat collection 1 (LC1) surface reflectance product offers consistent CHL-a estimates throughout the USA. Overall, acknowledging the existing limitations and challenges of such approaches, this research illustrates the potential of utilizing open-source data with ML to facilitate large-scale estimation of lake CHL-a.http://www.sciencedirect.com/science/article/pii/S1574954125000962AquaSatChlorophyll-aMachine learningRemote sensingUSA |
| spellingShingle | Md Mamun Xiao Yang Advancing lakes algal chlorophyll estimation in the contiguous USA: A comparative study of machine learning models and satellite data Ecological Informatics AquaSat Chlorophyll-a Machine learning Remote sensing USA |
| title | Advancing lakes algal chlorophyll estimation in the contiguous USA: A comparative study of machine learning models and satellite data |
| title_full | Advancing lakes algal chlorophyll estimation in the contiguous USA: A comparative study of machine learning models and satellite data |
| title_fullStr | Advancing lakes algal chlorophyll estimation in the contiguous USA: A comparative study of machine learning models and satellite data |
| title_full_unstemmed | Advancing lakes algal chlorophyll estimation in the contiguous USA: A comparative study of machine learning models and satellite data |
| title_short | Advancing lakes algal chlorophyll estimation in the contiguous USA: A comparative study of machine learning models and satellite data |
| title_sort | advancing lakes algal chlorophyll estimation in the contiguous usa a comparative study of machine learning models and satellite data |
| topic | AquaSat Chlorophyll-a Machine learning Remote sensing USA |
| url | http://www.sciencedirect.com/science/article/pii/S1574954125000962 |
| work_keys_str_mv | AT mdmamun advancinglakesalgalchlorophyllestimationinthecontiguoususaacomparativestudyofmachinelearningmodelsandsatellitedata AT xiaoyang advancinglakesalgalchlorophyllestimationinthecontiguoususaacomparativestudyofmachinelearningmodelsandsatellitedata |