Mapping global yields of four major crops at 5-minute resolution from 1982 to 2015 using multi-source data and machine learning

Abstract Accurate, historical, and continuous global crop yield data are essential for assessing risks to the global food system. However, existing datasets often have limited spatial and temporal resolution. Here, we introduce GlobalCropYield5min, a novel gridded dataset providing crop yield data f...

Full description

Saved in:
Bibliographic Details
Main Authors: Juan Cao, Zhao Zhang, Xiangzhong Luo, Yuchuan Luo, Jialu Xu, Jun Xie, Jichong Han, Fulu Tao
Format: Article
Language:English
Published: Nature Portfolio 2025-02-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-025-04650-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850029202077646848
author Juan Cao
Zhao Zhang
Xiangzhong Luo
Yuchuan Luo
Jialu Xu
Jun Xie
Jichong Han
Fulu Tao
author_facet Juan Cao
Zhao Zhang
Xiangzhong Luo
Yuchuan Luo
Jialu Xu
Jun Xie
Jichong Han
Fulu Tao
author_sort Juan Cao
collection DOAJ
description Abstract Accurate, historical, and continuous global crop yield data are essential for assessing risks to the global food system. However, existing datasets often have limited spatial and temporal resolution. Here, we introduce GlobalCropYield5min, a novel gridded dataset providing crop yield data for major crops — including maize, rice, wheat, and soybean — from 1982 to 2015, with a spatial resolution of 5 arc-minutes. We developed three machine learning (ML) models for each country and crop, using crop statistics from approximately 12,000 administrative units, along with satellite data, climate variables, soil properties, agricultural practices, and climate modes. The optimal predictors and ML model were selected to estimate annual crop yield for each 5 × 5 arc-minute grid cell. Results show good model performance, with R2 ranging from 0.70 to 0.95, and RMSE (NRMSE) from 0.16 t/ha (5%) to 1.1 t/ha (20%). GlobalCropYield5min outperforms other global yield datasets in spatial resolution, temporal coverage, and accuracy. This dataset is crucial for investigating climate-crop yield interactions and managing agricultural disaster risks.
format Article
id doaj-art-3d64ab4060c0458fbabd64b0bed2567f
institution DOAJ
issn 2052-4463
language English
publishDate 2025-02-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj-art-3d64ab4060c0458fbabd64b0bed2567f2025-08-20T02:59:35ZengNature PortfolioScientific Data2052-44632025-02-0112111510.1038/s41597-025-04650-4Mapping global yields of four major crops at 5-minute resolution from 1982 to 2015 using multi-source data and machine learningJuan Cao0Zhao Zhang1Xiangzhong Luo2Yuchuan Luo3Jialu Xu4Jun Xie5Jichong Han6Fulu Tao7School of National Safety and Emergency Management, Beijing Normal UniversitySchool of National Safety and Emergency Management, Beijing Normal UniversityDepartment of Geography, National University of SingaporeSchool of National Safety and Emergency Management, Beijing Normal UniversitySchool of National Safety and Emergency Management, Beijing Normal UniversitySchool of National Safety and Emergency Management, Beijing Normal UniversitySchool of National Safety and Emergency Management, Beijing Normal UniversityKey Laboratory of Land Surface Pattern and Simulation, Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of SciencesAbstract Accurate, historical, and continuous global crop yield data are essential for assessing risks to the global food system. However, existing datasets often have limited spatial and temporal resolution. Here, we introduce GlobalCropYield5min, a novel gridded dataset providing crop yield data for major crops — including maize, rice, wheat, and soybean — from 1982 to 2015, with a spatial resolution of 5 arc-minutes. We developed three machine learning (ML) models for each country and crop, using crop statistics from approximately 12,000 administrative units, along with satellite data, climate variables, soil properties, agricultural practices, and climate modes. The optimal predictors and ML model were selected to estimate annual crop yield for each 5 × 5 arc-minute grid cell. Results show good model performance, with R2 ranging from 0.70 to 0.95, and RMSE (NRMSE) from 0.16 t/ha (5%) to 1.1 t/ha (20%). GlobalCropYield5min outperforms other global yield datasets in spatial resolution, temporal coverage, and accuracy. This dataset is crucial for investigating climate-crop yield interactions and managing agricultural disaster risks.https://doi.org/10.1038/s41597-025-04650-4
spellingShingle Juan Cao
Zhao Zhang
Xiangzhong Luo
Yuchuan Luo
Jialu Xu
Jun Xie
Jichong Han
Fulu Tao
Mapping global yields of four major crops at 5-minute resolution from 1982 to 2015 using multi-source data and machine learning
Scientific Data
title Mapping global yields of four major crops at 5-minute resolution from 1982 to 2015 using multi-source data and machine learning
title_full Mapping global yields of four major crops at 5-minute resolution from 1982 to 2015 using multi-source data and machine learning
title_fullStr Mapping global yields of four major crops at 5-minute resolution from 1982 to 2015 using multi-source data and machine learning
title_full_unstemmed Mapping global yields of four major crops at 5-minute resolution from 1982 to 2015 using multi-source data and machine learning
title_short Mapping global yields of four major crops at 5-minute resolution from 1982 to 2015 using multi-source data and machine learning
title_sort mapping global yields of four major crops at 5 minute resolution from 1982 to 2015 using multi source data and machine learning
url https://doi.org/10.1038/s41597-025-04650-4
work_keys_str_mv AT juancao mappingglobalyieldsoffourmajorcropsat5minuteresolutionfrom1982to2015usingmultisourcedataandmachinelearning
AT zhaozhang mappingglobalyieldsoffourmajorcropsat5minuteresolutionfrom1982to2015usingmultisourcedataandmachinelearning
AT xiangzhongluo mappingglobalyieldsoffourmajorcropsat5minuteresolutionfrom1982to2015usingmultisourcedataandmachinelearning
AT yuchuanluo mappingglobalyieldsoffourmajorcropsat5minuteresolutionfrom1982to2015usingmultisourcedataandmachinelearning
AT jialuxu mappingglobalyieldsoffourmajorcropsat5minuteresolutionfrom1982to2015usingmultisourcedataandmachinelearning
AT junxie mappingglobalyieldsoffourmajorcropsat5minuteresolutionfrom1982to2015usingmultisourcedataandmachinelearning
AT jichonghan mappingglobalyieldsoffourmajorcropsat5minuteresolutionfrom1982to2015usingmultisourcedataandmachinelearning
AT fulutao mappingglobalyieldsoffourmajorcropsat5minuteresolutionfrom1982to2015usingmultisourcedataandmachinelearning