AutoGEEval: A Multimodal and Automated Evaluation Framework for Geospatial Code Generation on GEE with Large Language Models

Geospatial code generation is emerging as a key direction in the integration of artificial intelligence and geoscientific analysis. However, there remains a lack of standardized tools for automatic evaluation in this domain. To address this gap, we propose AutoGEEval, the first multimodal, unit-leve...

Full description

Saved in:
Bibliographic Details
Main Authors: Huayi Wu, Zhangxiao Shen, Shuyang Hou, Jianyuan Liang, Haoyue Jiao, Yaxian Qing, Xiaopu Zhang, Xu Li, Zhipeng Gui, Xuefeng Guan, Longgang Xiang
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:ISPRS International Journal of Geo-Information
Subjects:
Online Access:https://www.mdpi.com/2220-9964/14/7/256
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850078228261109760
author Huayi Wu
Zhangxiao Shen
Shuyang Hou
Jianyuan Liang
Haoyue Jiao
Yaxian Qing
Xiaopu Zhang
Xu Li
Zhipeng Gui
Xuefeng Guan
Longgang Xiang
author_facet Huayi Wu
Zhangxiao Shen
Shuyang Hou
Jianyuan Liang
Haoyue Jiao
Yaxian Qing
Xiaopu Zhang
Xu Li
Zhipeng Gui
Xuefeng Guan
Longgang Xiang
author_sort Huayi Wu
collection DOAJ
description Geospatial code generation is emerging as a key direction in the integration of artificial intelligence and geoscientific analysis. However, there remains a lack of standardized tools for automatic evaluation in this domain. To address this gap, we propose AutoGEEval, the first multimodal, unit-level automated evaluation framework for geospatial code generation tasks on the Google Earth Engine (GEE) platform powered by large language models (LLMs). Built upon the GEE Python API, AutoGEEval establishes a benchmark suite (AutoGEEval-Bench) comprising 1325 test cases that span 26 GEE data types. The framework integrates both question generation and answer verification components to enable an end-to-end automated evaluation pipeline—from function invocation to execution validation. AutoGEEval supports multidimensional quantitative analysis of model outputs in terms of accuracy, resource consumption, execution efficiency, and error types. We evaluate 18 state-of-the-art LLMs—including general-purpose, reasoning-augmented, code-centric, and geoscience-specialized models—revealing their performance characteristics and potential optimization pathways in GEE code generation. This work provides a unified protocol and foundational resource for the development and assessment of geospatial code generation models, advancing the frontier of automated natural language to domain-specific code translation.
format Article
id doaj-art-7943443fbb034512bf295b9e311f03e6
institution DOAJ
issn 2220-9964
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series ISPRS International Journal of Geo-Information
spelling doaj-art-7943443fbb034512bf295b9e311f03e62025-08-20T02:45:37ZengMDPI AGISPRS International Journal of Geo-Information2220-99642025-06-0114725610.3390/ijgi14070256AutoGEEval: A Multimodal and Automated Evaluation Framework for Geospatial Code Generation on GEE with Large Language ModelsHuayi Wu0Zhangxiao Shen1Shuyang Hou2Jianyuan Liang3Haoyue Jiao4Yaxian Qing5Xiaopu Zhang6Xu Li7Zhipeng Gui8Xuefeng Guan9Longgang Xiang10State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, ChinaSchool of Resource and Environmental Sciences, Wuhan University, Wuhan 430079, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, ChinaSchool of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, ChinaGeospatial code generation is emerging as a key direction in the integration of artificial intelligence and geoscientific analysis. However, there remains a lack of standardized tools for automatic evaluation in this domain. To address this gap, we propose AutoGEEval, the first multimodal, unit-level automated evaluation framework for geospatial code generation tasks on the Google Earth Engine (GEE) platform powered by large language models (LLMs). Built upon the GEE Python API, AutoGEEval establishes a benchmark suite (AutoGEEval-Bench) comprising 1325 test cases that span 26 GEE data types. The framework integrates both question generation and answer verification components to enable an end-to-end automated evaluation pipeline—from function invocation to execution validation. AutoGEEval supports multidimensional quantitative analysis of model outputs in terms of accuracy, resource consumption, execution efficiency, and error types. We evaluate 18 state-of-the-art LLMs—including general-purpose, reasoning-augmented, code-centric, and geoscience-specialized models—revealing their performance characteristics and potential optimization pathways in GEE code generation. This work provides a unified protocol and foundational resource for the development and assessment of geospatial code generation models, advancing the frontier of automated natural language to domain-specific code translation.https://www.mdpi.com/2220-9964/14/7/256geospatial code generationlarge language modelsGoogle Earth Engineautomated evaluationunit test benchmark
spellingShingle Huayi Wu
Zhangxiao Shen
Shuyang Hou
Jianyuan Liang
Haoyue Jiao
Yaxian Qing
Xiaopu Zhang
Xu Li
Zhipeng Gui
Xuefeng Guan
Longgang Xiang
AutoGEEval: A Multimodal and Automated Evaluation Framework for Geospatial Code Generation on GEE with Large Language Models
ISPRS International Journal of Geo-Information
geospatial code generation
large language models
Google Earth Engine
automated evaluation
unit test benchmark
title AutoGEEval: A Multimodal and Automated Evaluation Framework for Geospatial Code Generation on GEE with Large Language Models
title_full AutoGEEval: A Multimodal and Automated Evaluation Framework for Geospatial Code Generation on GEE with Large Language Models
title_fullStr AutoGEEval: A Multimodal and Automated Evaluation Framework for Geospatial Code Generation on GEE with Large Language Models
title_full_unstemmed AutoGEEval: A Multimodal and Automated Evaluation Framework for Geospatial Code Generation on GEE with Large Language Models
title_short AutoGEEval: A Multimodal and Automated Evaluation Framework for Geospatial Code Generation on GEE with Large Language Models
title_sort autogeeval a multimodal and automated evaluation framework for geospatial code generation on gee with large language models
topic geospatial code generation
large language models
Google Earth Engine
automated evaluation
unit test benchmark
url https://www.mdpi.com/2220-9964/14/7/256
work_keys_str_mv AT huayiwu autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels
AT zhangxiaoshen autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels
AT shuyanghou autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels
AT jianyuanliang autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels
AT haoyuejiao autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels
AT yaxianqing autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels
AT xiaopuzhang autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels
AT xuli autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels
AT zhipenggui autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels
AT xuefengguan autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels
AT longgangxiang autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels