Data fusion-based improvements in empirical regression and machine learning for global daily ∼ 8 km resolution sea surface nitrate estimation and interpretation

Assessing sea surface nitrate (SSN) concentrations and dynamics is crucial for understanding marine ecosystem health, yet optical remote sensing of SSN remains challenging because of the lack of distinct spectral features. While various global-scale SSN regression and machine learning algorithms bas...

Full description

Saved in:
Bibliographic Details
Main Authors: Aifen Zhong, Difeng Wang, Fang Gong, Jingjing Huang, Zhuoqi Zheng, Xianqiang He, Qing Zhang, Qiankun Zhu
Format: Article
Language:English
Published: Elsevier 2025-09-01
Series:International Journal of Applied Earth Observations and Geoinformation
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1569843225004479
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849233609431449600
author Aifen Zhong
Difeng Wang
Fang Gong
Jingjing Huang
Zhuoqi Zheng
Xianqiang He
Qing Zhang
Qiankun Zhu
author_facet Aifen Zhong
Difeng Wang
Fang Gong
Jingjing Huang
Zhuoqi Zheng
Xianqiang He
Qing Zhang
Qiankun Zhu
author_sort Aifen Zhong
collection DOAJ
description Assessing sea surface nitrate (SSN) concentrations and dynamics is crucial for understanding marine ecosystem health, yet optical remote sensing of SSN remains challenging because of the lack of distinct spectral features. While various global-scale SSN regression and machine learning algorithms based on SSN-environment variable relationships have been developed, the prediction accuracy and spatiotemporal resolution of their applications continue to face limitations. Additionally, there has been relatively little reporting on the interannual variability of global SSN in previous studies. Here we aim to enhance the accuracy and spatial resolution of SSN retrievals by developing improved regression and machine learning models, enabling the generation of global daily ∼ 8 km SSN products from satellite and model data. To construct the empirical regression models, the global ocean was divided into five regions on the basis of the relationship between sea surface temperature (SST) and SSN: 80° S to 40° N, the North Pacific, the North Atlantic, the Arabian Sea, and the eastern equatorial Pacific. After adding SSN-related physical variables, high-accuracy regional empirical models are developed, with root mean square deviations (RMSDs) of 1.641, 2.701, 1.221, 1.298, and 2.379 μmol/kg for the studied regions. For the machine learning models, seven algorithms, namely, extremely randomized trees (ET), multilayer perceptron (MLP), stacking random forest (SRF), Gaussian process regression (GPR), support vector machine (SVM), gradient boosting decision tree (GBDT), and extreme gradient boosting (XGBoost) algorithms, were tested. After modeling, validation, and extensive tests using independent cruise dataset, the XGBoost model outperformed others (RMSD = 1.189 μmol/kg) and bypassed the need for regional segmentation. Mechanistic analysis revealed the driving variables influencing SSN in both regional empirical and XGBoost models, improving interpretability. Comparative validation confirmed that our models surpass traditional approaches in accuracy and applicability, demonstrating their potential to advance global SSN monitoring. Using XGBoost-derived products, we find a slight weak decreasing trend in SSN over 23 years. The proposed robust and explainable SSN retrieval models have the potential to assist in ocean environmental management.
format Article
id doaj-art-a05e68421f284ec4abdcafdbdc8b6a77
institution Kabale University
issn 1569-8432
language English
publishDate 2025-09-01
publisher Elsevier
record_format Article
series International Journal of Applied Earth Observations and Geoinformation
spelling doaj-art-a05e68421f284ec4abdcafdbdc8b6a772025-08-20T05:04:56ZengElsevierInternational Journal of Applied Earth Observations and Geoinformation1569-84322025-09-0114310480010.1016/j.jag.2025.104800Data fusion-based improvements in empirical regression and machine learning for global daily ∼ 8 km resolution sea surface nitrate estimation and interpretationAifen Zhong0Difeng Wang1Fang Gong2Jingjing Huang3Zhuoqi Zheng4Xianqiang He5Qing Zhang6Qiankun Zhu7State Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, ChinaState Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, China; Observation and Research Station for Marine Risk and Hazard Management at Daya Bay, Ministry of Natural Resources, Huizhou 516081, China; Corresponding author at: State Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, China.State Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, China; Observation and Research Station for Marine Risk and Hazard Management at Daya Bay, Ministry of Natural Resources, Huizhou 516081, ChinaState Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, China; Ocean College, Zhejiang University, Zhoushan 316021, ChinaState Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, China; Geography and Ocean Science College, Nanjing University, Nanjing 210023, ChinaState Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, ChinaState Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, China; Observation and Research Station for Marine Risk and Hazard Management at Daya Bay, Ministry of Natural Resources, Huizhou 516081, ChinaState Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, ChinaAssessing sea surface nitrate (SSN) concentrations and dynamics is crucial for understanding marine ecosystem health, yet optical remote sensing of SSN remains challenging because of the lack of distinct spectral features. While various global-scale SSN regression and machine learning algorithms based on SSN-environment variable relationships have been developed, the prediction accuracy and spatiotemporal resolution of their applications continue to face limitations. Additionally, there has been relatively little reporting on the interannual variability of global SSN in previous studies. Here we aim to enhance the accuracy and spatial resolution of SSN retrievals by developing improved regression and machine learning models, enabling the generation of global daily ∼ 8 km SSN products from satellite and model data. To construct the empirical regression models, the global ocean was divided into five regions on the basis of the relationship between sea surface temperature (SST) and SSN: 80° S to 40° N, the North Pacific, the North Atlantic, the Arabian Sea, and the eastern equatorial Pacific. After adding SSN-related physical variables, high-accuracy regional empirical models are developed, with root mean square deviations (RMSDs) of 1.641, 2.701, 1.221, 1.298, and 2.379 μmol/kg for the studied regions. For the machine learning models, seven algorithms, namely, extremely randomized trees (ET), multilayer perceptron (MLP), stacking random forest (SRF), Gaussian process regression (GPR), support vector machine (SVM), gradient boosting decision tree (GBDT), and extreme gradient boosting (XGBoost) algorithms, were tested. After modeling, validation, and extensive tests using independent cruise dataset, the XGBoost model outperformed others (RMSD = 1.189 μmol/kg) and bypassed the need for regional segmentation. Mechanistic analysis revealed the driving variables influencing SSN in both regional empirical and XGBoost models, improving interpretability. Comparative validation confirmed that our models surpass traditional approaches in accuracy and applicability, demonstrating their potential to advance global SSN monitoring. Using XGBoost-derived products, we find a slight weak decreasing trend in SSN over 23 years. The proposed robust and explainable SSN retrieval models have the potential to assist in ocean environmental management.http://www.sciencedirect.com/science/article/pii/S1569843225004479Sea surface nitrateRemote sensingEmpirical regressionExplainable machine learningGlobal ocean
spellingShingle Aifen Zhong
Difeng Wang
Fang Gong
Jingjing Huang
Zhuoqi Zheng
Xianqiang He
Qing Zhang
Qiankun Zhu
Data fusion-based improvements in empirical regression and machine learning for global daily ∼ 8 km resolution sea surface nitrate estimation and interpretation
International Journal of Applied Earth Observations and Geoinformation
Sea surface nitrate
Remote sensing
Empirical regression
Explainable machine learning
Global ocean
title Data fusion-based improvements in empirical regression and machine learning for global daily ∼ 8 km resolution sea surface nitrate estimation and interpretation
title_full Data fusion-based improvements in empirical regression and machine learning for global daily ∼ 8 km resolution sea surface nitrate estimation and interpretation
title_fullStr Data fusion-based improvements in empirical regression and machine learning for global daily ∼ 8 km resolution sea surface nitrate estimation and interpretation
title_full_unstemmed Data fusion-based improvements in empirical regression and machine learning for global daily ∼ 8 km resolution sea surface nitrate estimation and interpretation
title_short Data fusion-based improvements in empirical regression and machine learning for global daily ∼ 8 km resolution sea surface nitrate estimation and interpretation
title_sort data fusion based improvements in empirical regression and machine learning for global daily ∼ 8 km resolution sea surface nitrate estimation and interpretation
topic Sea surface nitrate
Remote sensing
Empirical regression
Explainable machine learning
Global ocean
url http://www.sciencedirect.com/science/article/pii/S1569843225004479
work_keys_str_mv AT aifenzhong datafusionbasedimprovementsinempiricalregressionandmachinelearningforglobaldaily8kmresolutionseasurfacenitrateestimationandinterpretation
AT difengwang datafusionbasedimprovementsinempiricalregressionandmachinelearningforglobaldaily8kmresolutionseasurfacenitrateestimationandinterpretation
AT fanggong datafusionbasedimprovementsinempiricalregressionandmachinelearningforglobaldaily8kmresolutionseasurfacenitrateestimationandinterpretation
AT jingjinghuang datafusionbasedimprovementsinempiricalregressionandmachinelearningforglobaldaily8kmresolutionseasurfacenitrateestimationandinterpretation
AT zhuoqizheng datafusionbasedimprovementsinempiricalregressionandmachinelearningforglobaldaily8kmresolutionseasurfacenitrateestimationandinterpretation
AT xianqianghe datafusionbasedimprovementsinempiricalregressionandmachinelearningforglobaldaily8kmresolutionseasurfacenitrateestimationandinterpretation
AT qingzhang datafusionbasedimprovementsinempiricalregressionandmachinelearningforglobaldaily8kmresolutionseasurfacenitrateestimationandinterpretation
AT qiankunzhu datafusionbasedimprovementsinempiricalregressionandmachinelearningforglobaldaily8kmresolutionseasurfacenitrateestimationandinterpretation