Global de-trending significantly improves the accuracy of XGBoost-based county-level maize and soybean yield prediction in the Midwestern United States

The application of machine learning in crop yield prediction has gained considerable traction, yet uncertainties persist regarding the impact of the yield trends on these predictions and the differences between the detrending methods. In our study, we utilized extreme gradient boosting (XGBoost) to...

Full description

Saved in:
Bibliographic Details
Main Authors: Yuanchao Li, Hongwei Zeng, Miao Zhang, Bingfang Wu, Xingli Qin
Format: Article
Language:English
Published: Taylor & Francis Group 2024-12-01
Series:GIScience & Remote Sensing
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/15481603.2024.2349341
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850176554181591040
author Yuanchao Li
Hongwei Zeng
Miao Zhang
Bingfang Wu
Xingli Qin
author_facet Yuanchao Li
Hongwei Zeng
Miao Zhang
Bingfang Wu
Xingli Qin
author_sort Yuanchao Li
collection DOAJ
description The application of machine learning in crop yield prediction has gained considerable traction, yet uncertainties persist regarding the impact of the yield trends on these predictions and the differences between the detrending methods. In our study, we utilized extreme gradient boosting (XGBoost) to scrutinize the effects of no trend processing (NTP), input year as a feature (IYF), input average yield as a feature (IAYF), input linear yield as a feature (ILYF), and the global detrending method (GDT) on the yield prediction of maize and soybean in the Midwestern United States. Based on our findings, compared with that of NTP, the incorporation of the yield trend as a predictor in XGBoost significantly improved the accuracy and reduced the uncertainty of the yield prediction. Notably, GDT emerged as a standout performer, significantly reducing the average yield prediction error by 0.091 t/ha for soybean and 0.158 t/ha for maize with respect to NTP, and concurrently improving the determination coefficient (R2) by 20.6% and 19.6% for soybean and maize, respectively. Compared with IYF, IAYF, and ILYF, GDT showed substantial improvements ranging from 3.8% to 12.7% in R2 for soybean and 3.6% to 12.7% for maize. The SHapley Additive ExPlanations (SHAP) framework showed that the enhanced vegetation index (EVI), particularly during the soybean podding and maize dough formation stages, played a crucial role in understanding the variations in interannual yield variability. These findings confirmed the importance of GDT in crop yield prediction via machine learning and could be used to facilitate future advancements in machine learning applications for yield forecasting.
format Article
id doaj-art-137732e4501445318efb1e4d2ff7dd1b
institution OA Journals
issn 1548-1603
1943-7226
language English
publishDate 2024-12-01
publisher Taylor & Francis Group
record_format Article
series GIScience & Remote Sensing
spelling doaj-art-137732e4501445318efb1e4d2ff7dd1b2025-08-20T02:19:14ZengTaylor & Francis GroupGIScience & Remote Sensing1548-16031943-72262024-12-0161110.1080/15481603.2024.2349341Global de-trending significantly improves the accuracy of XGBoost-based county-level maize and soybean yield prediction in the Midwestern United StatesYuanchao Li0Hongwei Zeng1Miao Zhang2Bingfang Wu3Xingli Qin4State Key Laboratory of Remote Sensing Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaState Key Laboratory of Remote Sensing Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaState Key Laboratory of Remote Sensing Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaState Key Laboratory of Remote Sensing Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaState Key Laboratory of Remote Sensing Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaThe application of machine learning in crop yield prediction has gained considerable traction, yet uncertainties persist regarding the impact of the yield trends on these predictions and the differences between the detrending methods. In our study, we utilized extreme gradient boosting (XGBoost) to scrutinize the effects of no trend processing (NTP), input year as a feature (IYF), input average yield as a feature (IAYF), input linear yield as a feature (ILYF), and the global detrending method (GDT) on the yield prediction of maize and soybean in the Midwestern United States. Based on our findings, compared with that of NTP, the incorporation of the yield trend as a predictor in XGBoost significantly improved the accuracy and reduced the uncertainty of the yield prediction. Notably, GDT emerged as a standout performer, significantly reducing the average yield prediction error by 0.091 t/ha for soybean and 0.158 t/ha for maize with respect to NTP, and concurrently improving the determination coefficient (R2) by 20.6% and 19.6% for soybean and maize, respectively. Compared with IYF, IAYF, and ILYF, GDT showed substantial improvements ranging from 3.8% to 12.7% in R2 for soybean and 3.6% to 12.7% for maize. The SHapley Additive ExPlanations (SHAP) framework showed that the enhanced vegetation index (EVI), particularly during the soybean podding and maize dough formation stages, played a crucial role in understanding the variations in interannual yield variability. These findings confirmed the importance of GDT in crop yield prediction via machine learning and could be used to facilitate future advancements in machine learning applications for yield forecasting.https://www.tandfonline.com/doi/10.1080/15481603.2024.2349341Maize and soybeanYield detrendingYield predictionXgboostSHAPUS midwest
spellingShingle Yuanchao Li
Hongwei Zeng
Miao Zhang
Bingfang Wu
Xingli Qin
Global de-trending significantly improves the accuracy of XGBoost-based county-level maize and soybean yield prediction in the Midwestern United States
GIScience & Remote Sensing
Maize and soybean
Yield detrending
Yield prediction
Xgboost
SHAP
US midwest
title Global de-trending significantly improves the accuracy of XGBoost-based county-level maize and soybean yield prediction in the Midwestern United States
title_full Global de-trending significantly improves the accuracy of XGBoost-based county-level maize and soybean yield prediction in the Midwestern United States
title_fullStr Global de-trending significantly improves the accuracy of XGBoost-based county-level maize and soybean yield prediction in the Midwestern United States
title_full_unstemmed Global de-trending significantly improves the accuracy of XGBoost-based county-level maize and soybean yield prediction in the Midwestern United States
title_short Global de-trending significantly improves the accuracy of XGBoost-based county-level maize and soybean yield prediction in the Midwestern United States
title_sort global de trending significantly improves the accuracy of xgboost based county level maize and soybean yield prediction in the midwestern united states
topic Maize and soybean
Yield detrending
Yield prediction
Xgboost
SHAP
US midwest
url https://www.tandfonline.com/doi/10.1080/15481603.2024.2349341
work_keys_str_mv AT yuanchaoli globaldetrendingsignificantlyimprovestheaccuracyofxgboostbasedcountylevelmaizeandsoybeanyieldpredictioninthemidwesternunitedstates
AT hongweizeng globaldetrendingsignificantlyimprovestheaccuracyofxgboostbasedcountylevelmaizeandsoybeanyieldpredictioninthemidwesternunitedstates
AT miaozhang globaldetrendingsignificantlyimprovestheaccuracyofxgboostbasedcountylevelmaizeandsoybeanyieldpredictioninthemidwesternunitedstates
AT bingfangwu globaldetrendingsignificantlyimprovestheaccuracyofxgboostbasedcountylevelmaizeandsoybeanyieldpredictioninthemidwesternunitedstates
AT xingliqin globaldetrendingsignificantlyimprovestheaccuracyofxgboostbasedcountylevelmaizeandsoybeanyieldpredictioninthemidwesternunitedstates