AutoGEEval: A Multimodal and Automated Evaluation Framework for Geospatial Code Generation on GEE with Large Language Models
Geospatial code generation is emerging as a key direction in the integration of artificial intelligence and geoscientific analysis. However, there remains a lack of standardized tools for automatic evaluation in this domain. To address this gap, we propose AutoGEEval, the first multimodal, unit-leve...
Saved in:
| Main Authors: | , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-06-01
|
| Series: | ISPRS International Journal of Geo-Information |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2220-9964/14/7/256 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850078228261109760 |
|---|---|
| author | Huayi Wu Zhangxiao Shen Shuyang Hou Jianyuan Liang Haoyue Jiao Yaxian Qing Xiaopu Zhang Xu Li Zhipeng Gui Xuefeng Guan Longgang Xiang |
| author_facet | Huayi Wu Zhangxiao Shen Shuyang Hou Jianyuan Liang Haoyue Jiao Yaxian Qing Xiaopu Zhang Xu Li Zhipeng Gui Xuefeng Guan Longgang Xiang |
| author_sort | Huayi Wu |
| collection | DOAJ |
| description | Geospatial code generation is emerging as a key direction in the integration of artificial intelligence and geoscientific analysis. However, there remains a lack of standardized tools for automatic evaluation in this domain. To address this gap, we propose AutoGEEval, the first multimodal, unit-level automated evaluation framework for geospatial code generation tasks on the Google Earth Engine (GEE) platform powered by large language models (LLMs). Built upon the GEE Python API, AutoGEEval establishes a benchmark suite (AutoGEEval-Bench) comprising 1325 test cases that span 26 GEE data types. The framework integrates both question generation and answer verification components to enable an end-to-end automated evaluation pipeline—from function invocation to execution validation. AutoGEEval supports multidimensional quantitative analysis of model outputs in terms of accuracy, resource consumption, execution efficiency, and error types. We evaluate 18 state-of-the-art LLMs—including general-purpose, reasoning-augmented, code-centric, and geoscience-specialized models—revealing their performance characteristics and potential optimization pathways in GEE code generation. This work provides a unified protocol and foundational resource for the development and assessment of geospatial code generation models, advancing the frontier of automated natural language to domain-specific code translation. |
| format | Article |
| id | doaj-art-7943443fbb034512bf295b9e311f03e6 |
| institution | DOAJ |
| issn | 2220-9964 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | ISPRS International Journal of Geo-Information |
| spelling | doaj-art-7943443fbb034512bf295b9e311f03e62025-08-20T02:45:37ZengMDPI AGISPRS International Journal of Geo-Information2220-99642025-06-0114725610.3390/ijgi14070256AutoGEEval: A Multimodal and Automated Evaluation Framework for Geospatial Code Generation on GEE with Large Language ModelsHuayi Wu0Zhangxiao Shen1Shuyang Hou2Jianyuan Liang3Haoyue Jiao4Yaxian Qing5Xiaopu Zhang6Xu Li7Zhipeng Gui8Xuefeng Guan9Longgang Xiang10State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, ChinaSchool of Resource and Environmental Sciences, Wuhan University, Wuhan 430079, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, ChinaSchool of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, ChinaGeospatial code generation is emerging as a key direction in the integration of artificial intelligence and geoscientific analysis. However, there remains a lack of standardized tools for automatic evaluation in this domain. To address this gap, we propose AutoGEEval, the first multimodal, unit-level automated evaluation framework for geospatial code generation tasks on the Google Earth Engine (GEE) platform powered by large language models (LLMs). Built upon the GEE Python API, AutoGEEval establishes a benchmark suite (AutoGEEval-Bench) comprising 1325 test cases that span 26 GEE data types. The framework integrates both question generation and answer verification components to enable an end-to-end automated evaluation pipeline—from function invocation to execution validation. AutoGEEval supports multidimensional quantitative analysis of model outputs in terms of accuracy, resource consumption, execution efficiency, and error types. We evaluate 18 state-of-the-art LLMs—including general-purpose, reasoning-augmented, code-centric, and geoscience-specialized models—revealing their performance characteristics and potential optimization pathways in GEE code generation. This work provides a unified protocol and foundational resource for the development and assessment of geospatial code generation models, advancing the frontier of automated natural language to domain-specific code translation.https://www.mdpi.com/2220-9964/14/7/256geospatial code generationlarge language modelsGoogle Earth Engineautomated evaluationunit test benchmark |
| spellingShingle | Huayi Wu Zhangxiao Shen Shuyang Hou Jianyuan Liang Haoyue Jiao Yaxian Qing Xiaopu Zhang Xu Li Zhipeng Gui Xuefeng Guan Longgang Xiang AutoGEEval: A Multimodal and Automated Evaluation Framework for Geospatial Code Generation on GEE with Large Language Models ISPRS International Journal of Geo-Information geospatial code generation large language models Google Earth Engine automated evaluation unit test benchmark |
| title | AutoGEEval: A Multimodal and Automated Evaluation Framework for Geospatial Code Generation on GEE with Large Language Models |
| title_full | AutoGEEval: A Multimodal and Automated Evaluation Framework for Geospatial Code Generation on GEE with Large Language Models |
| title_fullStr | AutoGEEval: A Multimodal and Automated Evaluation Framework for Geospatial Code Generation on GEE with Large Language Models |
| title_full_unstemmed | AutoGEEval: A Multimodal and Automated Evaluation Framework for Geospatial Code Generation on GEE with Large Language Models |
| title_short | AutoGEEval: A Multimodal and Automated Evaluation Framework for Geospatial Code Generation on GEE with Large Language Models |
| title_sort | autogeeval a multimodal and automated evaluation framework for geospatial code generation on gee with large language models |
| topic | geospatial code generation large language models Google Earth Engine automated evaluation unit test benchmark |
| url | https://www.mdpi.com/2220-9964/14/7/256 |
| work_keys_str_mv | AT huayiwu autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels AT zhangxiaoshen autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels AT shuyanghou autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels AT jianyuanliang autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels AT haoyuejiao autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels AT yaxianqing autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels AT xiaopuzhang autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels AT xuli autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels AT zhipenggui autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels AT xuefengguan autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels AT longgangxiang autogeevalamultimodalandautomatedevaluationframeworkforgeospatialcodegenerationongeewithlargelanguagemodels |