A Machine Learning-Based Assessment of Proxies and Drivers of Harmful Algal Blooms in the Western Lake Erie Basin Using Satellite Remote Sensing

The western region of Lake Erie has been experiencing severe water-quality issues, mainly through the infestation of algal blooms, highlighting the urgent need for action. Understanding the drivers and the intricacies associated with algal bloom phenomena is important to develop effective water-qual...

Full description

Saved in:
Bibliographic Details
Main Authors: Neha Joshi, Armeen Ghoorkhanian, Jongmin Park, Kaiguang Zhao, Sami Khanal
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/17/13/2164
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849319808199294976
author Neha Joshi
Armeen Ghoorkhanian
Jongmin Park
Kaiguang Zhao
Sami Khanal
author_facet Neha Joshi
Armeen Ghoorkhanian
Jongmin Park
Kaiguang Zhao
Sami Khanal
author_sort Neha Joshi
collection DOAJ
description The western region of Lake Erie has been experiencing severe water-quality issues, mainly through the infestation of algal blooms, highlighting the urgent need for action. Understanding the drivers and the intricacies associated with algal bloom phenomena is important to develop effective water-quality remediation strategies. In this study, the influences of multiple bloom drivers were explored, together with Harmonized Landsat Sentinel-2 (HLS) images, using the datasets collected in Western Lake Erie from 2013 to 2022. Bloom drivers included a group of physicochemical and meteorological variables, and Chlorophyll-a (Chl-a) served as a proxy for algal blooms. Various combinations of these datasets were used as predictor variables for three machine learning models, including Support Vector Regression (SVR), Extreme Gradient Boosting (XGB), and Random Forest (RF). Each model is complemented with the SHapley Additive exPlanations (SHAP) model to understand the role of predictor variables in Chl-a estimation. A combination of physicochemical variables and optical spectral bands yielded the highest model performance (R<sup>2</sup> up to 0.76, RMSE as low as 8.04 µg/L). The models using only meteorological data and spectral bands performed poorly (R<sup>2</sup> < 0.40), indicating the limited standalone predictive power of meteorological variables. While satellite-only models achieved moderate performance (R<sup>2</sup> up to 0.48), they could still be useful for preliminary monitoring where field data are unavailable. Furthermore, all 20 variables did not substantially improve model performance over models with only spectral and physicochemical inputs. While SVR achieved the highest R<sup>2</sup> in individual runs, XGB provided the most stable and consistently strong performance across input configurations, which could be an important consideration for operational use. These findings are highly relevant for harmful algal bloom (HAB) monitoring, where Chl-a serves as a critical proxy. By clarifying the contribution of diverse variables to Chl-a prediction and identifying robust modeling approaches, this study provides actionable insights to support data-driven management decisions aimed at mitigating HAB impacts in freshwater systems.
format Article
id doaj-art-ddbb0212c8f644bf8ba115c75f954f3e
institution Kabale University
issn 2072-4292
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj-art-ddbb0212c8f644bf8ba115c75f954f3e2025-08-20T03:50:20ZengMDPI AGRemote Sensing2072-42922025-06-011713216410.3390/rs17132164A Machine Learning-Based Assessment of Proxies and Drivers of Harmful Algal Blooms in the Western Lake Erie Basin Using Satellite Remote SensingNeha Joshi0Armeen Ghoorkhanian1Jongmin Park2Kaiguang Zhao3Sami Khanal4Department of Food, Agricultural and Biological Engineering, The Ohio State University, Columbus, OH 43210, USADepartment of Food, Agricultural and Biological Engineering, The Ohio State University, Columbus, OH 43210, USADepartment of Food, Agricultural and Biological Engineering, The Ohio State University, Columbus, OH 43210, USASchool of Environment and Natural Resources, The Ohio State University, Columbus, OH 43210, USADepartment of Food, Agricultural and Biological Engineering, The Ohio State University, Columbus, OH 43210, USAThe western region of Lake Erie has been experiencing severe water-quality issues, mainly through the infestation of algal blooms, highlighting the urgent need for action. Understanding the drivers and the intricacies associated with algal bloom phenomena is important to develop effective water-quality remediation strategies. In this study, the influences of multiple bloom drivers were explored, together with Harmonized Landsat Sentinel-2 (HLS) images, using the datasets collected in Western Lake Erie from 2013 to 2022. Bloom drivers included a group of physicochemical and meteorological variables, and Chlorophyll-a (Chl-a) served as a proxy for algal blooms. Various combinations of these datasets were used as predictor variables for three machine learning models, including Support Vector Regression (SVR), Extreme Gradient Boosting (XGB), and Random Forest (RF). Each model is complemented with the SHapley Additive exPlanations (SHAP) model to understand the role of predictor variables in Chl-a estimation. A combination of physicochemical variables and optical spectral bands yielded the highest model performance (R<sup>2</sup> up to 0.76, RMSE as low as 8.04 µg/L). The models using only meteorological data and spectral bands performed poorly (R<sup>2</sup> < 0.40), indicating the limited standalone predictive power of meteorological variables. While satellite-only models achieved moderate performance (R<sup>2</sup> up to 0.48), they could still be useful for preliminary monitoring where field data are unavailable. Furthermore, all 20 variables did not substantially improve model performance over models with only spectral and physicochemical inputs. While SVR achieved the highest R<sup>2</sup> in individual runs, XGB provided the most stable and consistently strong performance across input configurations, which could be an important consideration for operational use. These findings are highly relevant for harmful algal bloom (HAB) monitoring, where Chl-a serves as a critical proxy. By clarifying the contribution of diverse variables to Chl-a prediction and identifying robust modeling approaches, this study provides actionable insights to support data-driven management decisions aimed at mitigating HAB impacts in freshwater systems.https://www.mdpi.com/2072-4292/17/13/2164harmful algal bloomsHarmonized Landsat Sentinel-2water qualitymachine learning
spellingShingle Neha Joshi
Armeen Ghoorkhanian
Jongmin Park
Kaiguang Zhao
Sami Khanal
A Machine Learning-Based Assessment of Proxies and Drivers of Harmful Algal Blooms in the Western Lake Erie Basin Using Satellite Remote Sensing
Remote Sensing
harmful algal blooms
Harmonized Landsat Sentinel-2
water quality
machine learning
title A Machine Learning-Based Assessment of Proxies and Drivers of Harmful Algal Blooms in the Western Lake Erie Basin Using Satellite Remote Sensing
title_full A Machine Learning-Based Assessment of Proxies and Drivers of Harmful Algal Blooms in the Western Lake Erie Basin Using Satellite Remote Sensing
title_fullStr A Machine Learning-Based Assessment of Proxies and Drivers of Harmful Algal Blooms in the Western Lake Erie Basin Using Satellite Remote Sensing
title_full_unstemmed A Machine Learning-Based Assessment of Proxies and Drivers of Harmful Algal Blooms in the Western Lake Erie Basin Using Satellite Remote Sensing
title_short A Machine Learning-Based Assessment of Proxies and Drivers of Harmful Algal Blooms in the Western Lake Erie Basin Using Satellite Remote Sensing
title_sort machine learning based assessment of proxies and drivers of harmful algal blooms in the western lake erie basin using satellite remote sensing
topic harmful algal blooms
Harmonized Landsat Sentinel-2
water quality
machine learning
url https://www.mdpi.com/2072-4292/17/13/2164
work_keys_str_mv AT nehajoshi amachinelearningbasedassessmentofproxiesanddriversofharmfulalgalbloomsinthewesternlakeeriebasinusingsatelliteremotesensing
AT armeenghoorkhanian amachinelearningbasedassessmentofproxiesanddriversofharmfulalgalbloomsinthewesternlakeeriebasinusingsatelliteremotesensing
AT jongminpark amachinelearningbasedassessmentofproxiesanddriversofharmfulalgalbloomsinthewesternlakeeriebasinusingsatelliteremotesensing
AT kaiguangzhao amachinelearningbasedassessmentofproxiesanddriversofharmfulalgalbloomsinthewesternlakeeriebasinusingsatelliteremotesensing
AT samikhanal amachinelearningbasedassessmentofproxiesanddriversofharmfulalgalbloomsinthewesternlakeeriebasinusingsatelliteremotesensing
AT nehajoshi machinelearningbasedassessmentofproxiesanddriversofharmfulalgalbloomsinthewesternlakeeriebasinusingsatelliteremotesensing
AT armeenghoorkhanian machinelearningbasedassessmentofproxiesanddriversofharmfulalgalbloomsinthewesternlakeeriebasinusingsatelliteremotesensing
AT jongminpark machinelearningbasedassessmentofproxiesanddriversofharmfulalgalbloomsinthewesternlakeeriebasinusingsatelliteremotesensing
AT kaiguangzhao machinelearningbasedassessmentofproxiesanddriversofharmfulalgalbloomsinthewesternlakeeriebasinusingsatelliteremotesensing
AT samikhanal machinelearningbasedassessmentofproxiesanddriversofharmfulalgalbloomsinthewesternlakeeriebasinusingsatelliteremotesensing