Data fusion-based improvements in empirical regression and machine learning for global daily ∼ 8 km resolution sea surface nitrate estimation and interpretation
Assessing sea surface nitrate (SSN) concentrations and dynamics is crucial for understanding marine ecosystem health, yet optical remote sensing of SSN remains challenging because of the lack of distinct spectral features. While various global-scale SSN regression and machine learning algorithms bas...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-09-01
|
| Series: | International Journal of Applied Earth Observations and Geoinformation |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S1569843225004479 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849233609431449600 |
|---|---|
| author | Aifen Zhong Difeng Wang Fang Gong Jingjing Huang Zhuoqi Zheng Xianqiang He Qing Zhang Qiankun Zhu |
| author_facet | Aifen Zhong Difeng Wang Fang Gong Jingjing Huang Zhuoqi Zheng Xianqiang He Qing Zhang Qiankun Zhu |
| author_sort | Aifen Zhong |
| collection | DOAJ |
| description | Assessing sea surface nitrate (SSN) concentrations and dynamics is crucial for understanding marine ecosystem health, yet optical remote sensing of SSN remains challenging because of the lack of distinct spectral features. While various global-scale SSN regression and machine learning algorithms based on SSN-environment variable relationships have been developed, the prediction accuracy and spatiotemporal resolution of their applications continue to face limitations. Additionally, there has been relatively little reporting on the interannual variability of global SSN in previous studies. Here we aim to enhance the accuracy and spatial resolution of SSN retrievals by developing improved regression and machine learning models, enabling the generation of global daily ∼ 8 km SSN products from satellite and model data. To construct the empirical regression models, the global ocean was divided into five regions on the basis of the relationship between sea surface temperature (SST) and SSN: 80° S to 40° N, the North Pacific, the North Atlantic, the Arabian Sea, and the eastern equatorial Pacific. After adding SSN-related physical variables, high-accuracy regional empirical models are developed, with root mean square deviations (RMSDs) of 1.641, 2.701, 1.221, 1.298, and 2.379 μmol/kg for the studied regions. For the machine learning models, seven algorithms, namely, extremely randomized trees (ET), multilayer perceptron (MLP), stacking random forest (SRF), Gaussian process regression (GPR), support vector machine (SVM), gradient boosting decision tree (GBDT), and extreme gradient boosting (XGBoost) algorithms, were tested. After modeling, validation, and extensive tests using independent cruise dataset, the XGBoost model outperformed others (RMSD = 1.189 μmol/kg) and bypassed the need for regional segmentation. Mechanistic analysis revealed the driving variables influencing SSN in both regional empirical and XGBoost models, improving interpretability. Comparative validation confirmed that our models surpass traditional approaches in accuracy and applicability, demonstrating their potential to advance global SSN monitoring. Using XGBoost-derived products, we find a slight weak decreasing trend in SSN over 23 years. The proposed robust and explainable SSN retrieval models have the potential to assist in ocean environmental management. |
| format | Article |
| id | doaj-art-a05e68421f284ec4abdcafdbdc8b6a77 |
| institution | Kabale University |
| issn | 1569-8432 |
| language | English |
| publishDate | 2025-09-01 |
| publisher | Elsevier |
| record_format | Article |
| series | International Journal of Applied Earth Observations and Geoinformation |
| spelling | doaj-art-a05e68421f284ec4abdcafdbdc8b6a772025-08-20T05:04:56ZengElsevierInternational Journal of Applied Earth Observations and Geoinformation1569-84322025-09-0114310480010.1016/j.jag.2025.104800Data fusion-based improvements in empirical regression and machine learning for global daily ∼ 8 km resolution sea surface nitrate estimation and interpretationAifen Zhong0Difeng Wang1Fang Gong2Jingjing Huang3Zhuoqi Zheng4Xianqiang He5Qing Zhang6Qiankun Zhu7State Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, ChinaState Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, China; Observation and Research Station for Marine Risk and Hazard Management at Daya Bay, Ministry of Natural Resources, Huizhou 516081, China; Corresponding author at: State Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, China.State Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, China; Observation and Research Station for Marine Risk and Hazard Management at Daya Bay, Ministry of Natural Resources, Huizhou 516081, ChinaState Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, China; Ocean College, Zhejiang University, Zhoushan 316021, ChinaState Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, China; Geography and Ocean Science College, Nanjing University, Nanjing 210023, ChinaState Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, ChinaState Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, China; Observation and Research Station for Marine Risk and Hazard Management at Daya Bay, Ministry of Natural Resources, Huizhou 516081, ChinaState Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, ChinaAssessing sea surface nitrate (SSN) concentrations and dynamics is crucial for understanding marine ecosystem health, yet optical remote sensing of SSN remains challenging because of the lack of distinct spectral features. While various global-scale SSN regression and machine learning algorithms based on SSN-environment variable relationships have been developed, the prediction accuracy and spatiotemporal resolution of their applications continue to face limitations. Additionally, there has been relatively little reporting on the interannual variability of global SSN in previous studies. Here we aim to enhance the accuracy and spatial resolution of SSN retrievals by developing improved regression and machine learning models, enabling the generation of global daily ∼ 8 km SSN products from satellite and model data. To construct the empirical regression models, the global ocean was divided into five regions on the basis of the relationship between sea surface temperature (SST) and SSN: 80° S to 40° N, the North Pacific, the North Atlantic, the Arabian Sea, and the eastern equatorial Pacific. After adding SSN-related physical variables, high-accuracy regional empirical models are developed, with root mean square deviations (RMSDs) of 1.641, 2.701, 1.221, 1.298, and 2.379 μmol/kg for the studied regions. For the machine learning models, seven algorithms, namely, extremely randomized trees (ET), multilayer perceptron (MLP), stacking random forest (SRF), Gaussian process regression (GPR), support vector machine (SVM), gradient boosting decision tree (GBDT), and extreme gradient boosting (XGBoost) algorithms, were tested. After modeling, validation, and extensive tests using independent cruise dataset, the XGBoost model outperformed others (RMSD = 1.189 μmol/kg) and bypassed the need for regional segmentation. Mechanistic analysis revealed the driving variables influencing SSN in both regional empirical and XGBoost models, improving interpretability. Comparative validation confirmed that our models surpass traditional approaches in accuracy and applicability, demonstrating their potential to advance global SSN monitoring. Using XGBoost-derived products, we find a slight weak decreasing trend in SSN over 23 years. The proposed robust and explainable SSN retrieval models have the potential to assist in ocean environmental management.http://www.sciencedirect.com/science/article/pii/S1569843225004479Sea surface nitrateRemote sensingEmpirical regressionExplainable machine learningGlobal ocean |
| spellingShingle | Aifen Zhong Difeng Wang Fang Gong Jingjing Huang Zhuoqi Zheng Xianqiang He Qing Zhang Qiankun Zhu Data fusion-based improvements in empirical regression and machine learning for global daily ∼ 8 km resolution sea surface nitrate estimation and interpretation International Journal of Applied Earth Observations and Geoinformation Sea surface nitrate Remote sensing Empirical regression Explainable machine learning Global ocean |
| title | Data fusion-based improvements in empirical regression and machine learning for global daily ∼ 8 km resolution sea surface nitrate estimation and interpretation |
| title_full | Data fusion-based improvements in empirical regression and machine learning for global daily ∼ 8 km resolution sea surface nitrate estimation and interpretation |
| title_fullStr | Data fusion-based improvements in empirical regression and machine learning for global daily ∼ 8 km resolution sea surface nitrate estimation and interpretation |
| title_full_unstemmed | Data fusion-based improvements in empirical regression and machine learning for global daily ∼ 8 km resolution sea surface nitrate estimation and interpretation |
| title_short | Data fusion-based improvements in empirical regression and machine learning for global daily ∼ 8 km resolution sea surface nitrate estimation and interpretation |
| title_sort | data fusion based improvements in empirical regression and machine learning for global daily ∼ 8 km resolution sea surface nitrate estimation and interpretation |
| topic | Sea surface nitrate Remote sensing Empirical regression Explainable machine learning Global ocean |
| url | http://www.sciencedirect.com/science/article/pii/S1569843225004479 |
| work_keys_str_mv | AT aifenzhong datafusionbasedimprovementsinempiricalregressionandmachinelearningforglobaldaily8kmresolutionseasurfacenitrateestimationandinterpretation AT difengwang datafusionbasedimprovementsinempiricalregressionandmachinelearningforglobaldaily8kmresolutionseasurfacenitrateestimationandinterpretation AT fanggong datafusionbasedimprovementsinempiricalregressionandmachinelearningforglobaldaily8kmresolutionseasurfacenitrateestimationandinterpretation AT jingjinghuang datafusionbasedimprovementsinempiricalregressionandmachinelearningforglobaldaily8kmresolutionseasurfacenitrateestimationandinterpretation AT zhuoqizheng datafusionbasedimprovementsinempiricalregressionandmachinelearningforglobaldaily8kmresolutionseasurfacenitrateestimationandinterpretation AT xianqianghe datafusionbasedimprovementsinempiricalregressionandmachinelearningforglobaldaily8kmresolutionseasurfacenitrateestimationandinterpretation AT qingzhang datafusionbasedimprovementsinempiricalregressionandmachinelearningforglobaldaily8kmresolutionseasurfacenitrateestimationandinterpretation AT qiankunzhu datafusionbasedimprovementsinempiricalregressionandmachinelearningforglobaldaily8kmresolutionseasurfacenitrateestimationandinterpretation |