Optimizing context-based location extraction by tuning open-source LLMs with RAG

Text data such as news from media include different types of geographic information, represented by location, that indicates the whereabout of events or phenomena. Extracting the geographic locations from text within their contexts is challenging, even with Natural Language Processing (NLP) tools an...

Full description

Saved in:
Bibliographic Details
Main Authors: Zifu Wang, Yahya Masri, Anusha Srirenganathan Malarvizhi, Tayven Stover, Samir Ahmed, David Wong, Yongyao Jiang, Yun Li, Mathieu Bere, Daniel Rothbart, Dieter Pfoser, David Marshall, Chaowei Yang
Format: Article
Language:English
Published: Taylor & Francis Group 2025-08-01
Series:International Journal of Digital Earth
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/17538947.2025.2521786
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849224290089566208
author Zifu Wang
Yahya Masri
Anusha Srirenganathan Malarvizhi
Tayven Stover
Samir Ahmed
David Wong
Yongyao Jiang
Yun Li
Mathieu Bere
Daniel Rothbart
Dieter Pfoser
David Marshall
Chaowei Yang
author_facet Zifu Wang
Yahya Masri
Anusha Srirenganathan Malarvizhi
Tayven Stover
Samir Ahmed
David Wong
Yongyao Jiang
Yun Li
Mathieu Bere
Daniel Rothbart
Dieter Pfoser
David Marshall
Chaowei Yang
author_sort Zifu Wang
collection DOAJ
description Text data such as news from media include different types of geographic information, represented by location, that indicates the whereabout of events or phenomena. Extracting the geographic locations from text within their contexts is challenging, even with Natural Language Processing (NLP) tools and the latest Large Language Models (LLMs). We propose to optimize LLMs using Retrieval-Augmented Generation (RAG) and prompt-tuning methods, such as zero-shot and instruction-based prompting to improve the precision of extracting location information from news. Using Sudan conflict as an example, we extracted the corresponding locations and dates for conflict incidents. We compared runtime and accuracy of using various open-source LLMs, different hyperparameter settings, with and without RAG. Traditional Named Entity Recognition (NER), zero-shot prompting, instruction-based prompting, few-shot prompting, chain-of-thought (CoT) prompting, and RAG-based tuning were compared using an evaluation matrix. RAG-based tuning delivered the highest F1 score (>0.9) for extracting and associating location data with conflict incidents. This research highlights the benefits of using RAG for multi-incident context-based location extraction and provides insights into optimizing LLMs through prompt-tuning, hyperparameter adjustment, and model selection for location extraction tasks. The results can also be used to extract context-based locations or relevant information from text-based documents of other applications.
format Article
id doaj-art-54abc2e2125f4c7a95f934a506bb7aec
institution Kabale University
issn 1753-8947
1753-8955
language English
publishDate 2025-08-01
publisher Taylor & Francis Group
record_format Article
series International Journal of Digital Earth
spelling doaj-art-54abc2e2125f4c7a95f934a506bb7aec2025-08-25T11:28:46ZengTaylor & Francis GroupInternational Journal of Digital Earth1753-89471753-89552025-08-0118110.1080/17538947.2025.2521786Optimizing context-based location extraction by tuning open-source LLMs with RAGZifu Wang0Yahya Masri1Anusha Srirenganathan Malarvizhi2Tayven Stover3Samir Ahmed4David Wong5Yongyao Jiang6Yun Li7Mathieu Bere8Daniel Rothbart9Dieter Pfoser10David Marshall11Chaowei Yang12Department of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USACarter School for Peace & Conflict Resolution, George Mason University, Fairfax, VA, USACarter School for Peace & Conflict Resolution, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, George Mason University, Fairfax, VA, USACarter School for Peace & Conflict Resolution, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USAText data such as news from media include different types of geographic information, represented by location, that indicates the whereabout of events or phenomena. Extracting the geographic locations from text within their contexts is challenging, even with Natural Language Processing (NLP) tools and the latest Large Language Models (LLMs). We propose to optimize LLMs using Retrieval-Augmented Generation (RAG) and prompt-tuning methods, such as zero-shot and instruction-based prompting to improve the precision of extracting location information from news. Using Sudan conflict as an example, we extracted the corresponding locations and dates for conflict incidents. We compared runtime and accuracy of using various open-source LLMs, different hyperparameter settings, with and without RAG. Traditional Named Entity Recognition (NER), zero-shot prompting, instruction-based prompting, few-shot prompting, chain-of-thought (CoT) prompting, and RAG-based tuning were compared using an evaluation matrix. RAG-based tuning delivered the highest F1 score (>0.9) for extracting and associating location data with conflict incidents. This research highlights the benefits of using RAG for multi-incident context-based location extraction and provides insights into optimizing LLMs through prompt-tuning, hyperparameter adjustment, and model selection for location extraction tasks. The results can also be used to extract context-based locations or relevant information from text-based documents of other applications.https://www.tandfonline.com/doi/10.1080/17538947.2025.2521786Context-based location extractionlarge language modelretrieval augmented generationnatural language processingSudan conflictmedia
spellingShingle Zifu Wang
Yahya Masri
Anusha Srirenganathan Malarvizhi
Tayven Stover
Samir Ahmed
David Wong
Yongyao Jiang
Yun Li
Mathieu Bere
Daniel Rothbart
Dieter Pfoser
David Marshall
Chaowei Yang
Optimizing context-based location extraction by tuning open-source LLMs with RAG
International Journal of Digital Earth
Context-based location extraction
large language model
retrieval augmented generation
natural language processing
Sudan conflict
media
title Optimizing context-based location extraction by tuning open-source LLMs with RAG
title_full Optimizing context-based location extraction by tuning open-source LLMs with RAG
title_fullStr Optimizing context-based location extraction by tuning open-source LLMs with RAG
title_full_unstemmed Optimizing context-based location extraction by tuning open-source LLMs with RAG
title_short Optimizing context-based location extraction by tuning open-source LLMs with RAG
title_sort optimizing context based location extraction by tuning open source llms with rag
topic Context-based location extraction
large language model
retrieval augmented generation
natural language processing
Sudan conflict
media
url https://www.tandfonline.com/doi/10.1080/17538947.2025.2521786
work_keys_str_mv AT zifuwang optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag
AT yahyamasri optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag
AT anushasrirenganathanmalarvizhi optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag
AT tayvenstover optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag
AT samirahmed optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag
AT davidwong optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag
AT yongyaojiang optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag
AT yunli optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag
AT mathieubere optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag
AT danielrothbart optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag
AT dieterpfoser optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag
AT davidmarshall optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag
AT chaoweiyang optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag