Design and application of a semantic-driven geospatial modeling knowledge graph based on large language models

While leveraging large language models (LLMs) for intelligent geospatial modeling has garnered significant attention, the limited domain-specific knowledge of LLMs often leads to inefficient or unreliable geo-analysis model generation. Crowdsourced geoprocessing scripts encapsulate extensive expert...

Full description

Saved in:
Bibliographic Details
Main Authors: Jianyuan Liang, Shuyang Hou, Anqi Zhao, Qingyang Xu, Longgang Xiang, Rui Li, Huayi Wu
Format: Article
Language:English
Published: Taylor & Francis Group 2025-04-01
Series:Geo-spatial Information Science
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/10095020.2025.2483884
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850267313262034944
author Jianyuan Liang
Shuyang Hou
Anqi Zhao
Qingyang Xu
Longgang Xiang
Rui Li
Huayi Wu
author_facet Jianyuan Liang
Shuyang Hou
Anqi Zhao
Qingyang Xu
Longgang Xiang
Rui Li
Huayi Wu
author_sort Jianyuan Liang
collection DOAJ
description While leveraging large language models (LLMs) for intelligent geospatial modeling has garnered significant attention, the limited domain-specific knowledge of LLMs often leads to inefficient or unreliable geo-analysis model generation. Crowdsourced geoprocessing scripts encapsulate extensive expert knowledge for different geospatial modeling tasks, where code snippets are strategically combined into functional steps to build application-specific modeling processes. However, extracting these modeling processes from heterogeneous geoprocessing scripts and integrating them for reuse remains challenging due to the complexity of code interdependencies, the heterogeneity of scripting approaches, and the need for domain-specific customization. To address this, we propose S-GMKG, a knowledge graph that systematically extracts and integrates modeling processes from scripts as structured semantic units. Two strategies are introduced: a skeleton-based extraction method and a knowledge-enhanced chain of thought (CoT) approach, which facilitate automated modeling process extraction for S-GMKG via prompt engineering. Furthermore, a self-canonicalization and knowledge augmentation process is proposed to refine the S-GMKG. Consequently, S-GMKG serves as a robust external knowledge source to provide interpretable, graph-based modeling solutions and synergizes with LLMs for geospatial tasks. We implemented the S-GMKG using 4820 geoprocessing scripts and evaluated it across various LLMs. Results indicate that most scripts in the S-GMKG can be represented as modeling processes with 3–7 functional steps, with the proposed strategies achieving 3.2%–14.5% higher recall rates in relationship identification for these functional steps. Case studies in two distinct scenarios demonstrate the practicality of S-GMKG, particularly in collaborating with LLMs to generate code for geospatial modeling.
format Article
id doaj-art-3f5ce10d992b4ce1b53d82a73bd38e1d
institution OA Journals
issn 1009-5020
1993-5153
language English
publishDate 2025-04-01
publisher Taylor & Francis Group
record_format Article
series Geo-spatial Information Science
spelling doaj-art-3f5ce10d992b4ce1b53d82a73bd38e1d2025-08-20T01:53:52ZengTaylor & Francis GroupGeo-spatial Information Science1009-50201993-51532025-04-0112010.1080/10095020.2025.2483884Design and application of a semantic-driven geospatial modeling knowledge graph based on large language modelsJianyuan Liang0Shuyang Hou1Anqi Zhao2Qingyang Xu3Longgang Xiang4Rui Li5Huayi Wu6State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, ChinaWhile leveraging large language models (LLMs) for intelligent geospatial modeling has garnered significant attention, the limited domain-specific knowledge of LLMs often leads to inefficient or unreliable geo-analysis model generation. Crowdsourced geoprocessing scripts encapsulate extensive expert knowledge for different geospatial modeling tasks, where code snippets are strategically combined into functional steps to build application-specific modeling processes. However, extracting these modeling processes from heterogeneous geoprocessing scripts and integrating them for reuse remains challenging due to the complexity of code interdependencies, the heterogeneity of scripting approaches, and the need for domain-specific customization. To address this, we propose S-GMKG, a knowledge graph that systematically extracts and integrates modeling processes from scripts as structured semantic units. Two strategies are introduced: a skeleton-based extraction method and a knowledge-enhanced chain of thought (CoT) approach, which facilitate automated modeling process extraction for S-GMKG via prompt engineering. Furthermore, a self-canonicalization and knowledge augmentation process is proposed to refine the S-GMKG. Consequently, S-GMKG serves as a robust external knowledge source to provide interpretable, graph-based modeling solutions and synergizes with LLMs for geospatial tasks. We implemented the S-GMKG using 4820 geoprocessing scripts and evaluated it across various LLMs. Results indicate that most scripts in the S-GMKG can be represented as modeling processes with 3–7 functional steps, with the proposed strategies achieving 3.2%–14.5% higher recall rates in relationship identification for these functional steps. Case studies in two distinct scenarios demonstrate the practicality of S-GMKG, particularly in collaborating with LLMs to generate code for geospatial modeling.https://www.tandfonline.com/doi/10.1080/10095020.2025.2483884Large language model (LLM)domain knowledgeknowledge graph constructiongeospatial modelingintelligent generation
spellingShingle Jianyuan Liang
Shuyang Hou
Anqi Zhao
Qingyang Xu
Longgang Xiang
Rui Li
Huayi Wu
Design and application of a semantic-driven geospatial modeling knowledge graph based on large language models
Geo-spatial Information Science
Large language model (LLM)
domain knowledge
knowledge graph construction
geospatial modeling
intelligent generation
title Design and application of a semantic-driven geospatial modeling knowledge graph based on large language models
title_full Design and application of a semantic-driven geospatial modeling knowledge graph based on large language models
title_fullStr Design and application of a semantic-driven geospatial modeling knowledge graph based on large language models
title_full_unstemmed Design and application of a semantic-driven geospatial modeling knowledge graph based on large language models
title_short Design and application of a semantic-driven geospatial modeling knowledge graph based on large language models
title_sort design and application of a semantic driven geospatial modeling knowledge graph based on large language models
topic Large language model (LLM)
domain knowledge
knowledge graph construction
geospatial modeling
intelligent generation
url https://www.tandfonline.com/doi/10.1080/10095020.2025.2483884
work_keys_str_mv AT jianyuanliang designandapplicationofasemanticdrivengeospatialmodelingknowledgegraphbasedonlargelanguagemodels
AT shuyanghou designandapplicationofasemanticdrivengeospatialmodelingknowledgegraphbasedonlargelanguagemodels
AT anqizhao designandapplicationofasemanticdrivengeospatialmodelingknowledgegraphbasedonlargelanguagemodels
AT qingyangxu designandapplicationofasemanticdrivengeospatialmodelingknowledgegraphbasedonlargelanguagemodels
AT longgangxiang designandapplicationofasemanticdrivengeospatialmodelingknowledgegraphbasedonlargelanguagemodels
AT ruili designandapplicationofasemanticdrivengeospatialmodelingknowledgegraphbasedonlargelanguagemodels
AT huayiwu designandapplicationofasemanticdrivengeospatialmodelingknowledgegraphbasedonlargelanguagemodels