Optimizing context-based location extraction by tuning open-source LLMs with RAG

Text data such as news from media include different types of geographic information, represented by location, that indicates the whereabout of events or phenomena. Extracting the geographic locations from text within their contexts is challenging, even with Natural Language Processing (NLP) tools an...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zifu Wang, Yahya Masri, Anusha Srirenganathan Malarvizhi, Tayven Stover, Samir Ahmed, David Wong, Yongyao Jiang, Yun Li, Mathieu Bere, Daniel Rothbart, Dieter Pfoser, David Marshall, Chaowei Yang
Format:	Article
Language:	English
Published:	Taylor & Francis Group 2025-08-01
Series:	International Journal of Digital Earth
Subjects:	Context-based location extraction large language model retrieval augmented generation natural language processing Sudan conflict media
Online Access:	https://www.tandfonline.com/doi/10.1080/17538947.2025.2521786
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849224290089566208
author	Zifu Wang Yahya Masri Anusha Srirenganathan Malarvizhi Tayven Stover Samir Ahmed David Wong Yongyao Jiang Yun Li Mathieu Bere Daniel Rothbart Dieter Pfoser David Marshall Chaowei Yang
author_facet	Zifu Wang Yahya Masri Anusha Srirenganathan Malarvizhi Tayven Stover Samir Ahmed David Wong Yongyao Jiang Yun Li Mathieu Bere Daniel Rothbart Dieter Pfoser David Marshall Chaowei Yang
author_sort	Zifu Wang
collection	DOAJ
description	Text data such as news from media include different types of geographic information, represented by location, that indicates the whereabout of events or phenomena. Extracting the geographic locations from text within their contexts is challenging, even with Natural Language Processing (NLP) tools and the latest Large Language Models (LLMs). We propose to optimize LLMs using Retrieval-Augmented Generation (RAG) and prompt-tuning methods, such as zero-shot and instruction-based prompting to improve the precision of extracting location information from news. Using Sudan conflict as an example, we extracted the corresponding locations and dates for conflict incidents. We compared runtime and accuracy of using various open-source LLMs, different hyperparameter settings, with and without RAG. Traditional Named Entity Recognition (NER), zero-shot prompting, instruction-based prompting, few-shot prompting, chain-of-thought (CoT) prompting, and RAG-based tuning were compared using an evaluation matrix. RAG-based tuning delivered the highest F1 score (>0.9) for extracting and associating location data with conflict incidents. This research highlights the benefits of using RAG for multi-incident context-based location extraction and provides insights into optimizing LLMs through prompt-tuning, hyperparameter adjustment, and model selection for location extraction tasks. The results can also be used to extract context-based locations or relevant information from text-based documents of other applications.
format	Article
id	doaj-art-54abc2e2125f4c7a95f934a506bb7aec
institution	Kabale University
issn	1753-8947 1753-8955
language	English
publishDate	2025-08-01
publisher	Taylor & Francis Group
record_format	Article
series	International Journal of Digital Earth
spelling	doaj-art-54abc2e2125f4c7a95f934a506bb7aec2025-08-25T11:28:46ZengTaylor & Francis GroupInternational Journal of Digital Earth1753-89471753-89552025-08-0118110.1080/17538947.2025.2521786Optimizing context-based location extraction by tuning open-source LLMs with RAGZifu Wang0Yahya Masri1Anusha Srirenganathan Malarvizhi2Tayven Stover3Samir Ahmed4David Wong5Yongyao Jiang6Yun Li7Mathieu Bere8Daniel Rothbart9Dieter Pfoser10David Marshall11Chaowei Yang12Department of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USACarter School for Peace & Conflict Resolution, George Mason University, Fairfax, VA, USACarter School for Peace & Conflict Resolution, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, George Mason University, Fairfax, VA, USACarter School for Peace & Conflict Resolution, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USAText data such as news from media include different types of geographic information, represented by location, that indicates the whereabout of events or phenomena. Extracting the geographic locations from text within their contexts is challenging, even with Natural Language Processing (NLP) tools and the latest Large Language Models (LLMs). We propose to optimize LLMs using Retrieval-Augmented Generation (RAG) and prompt-tuning methods, such as zero-shot and instruction-based prompting to improve the precision of extracting location information from news. Using Sudan conflict as an example, we extracted the corresponding locations and dates for conflict incidents. We compared runtime and accuracy of using various open-source LLMs, different hyperparameter settings, with and without RAG. Traditional Named Entity Recognition (NER), zero-shot prompting, instruction-based prompting, few-shot prompting, chain-of-thought (CoT) prompting, and RAG-based tuning were compared using an evaluation matrix. RAG-based tuning delivered the highest F1 score (>0.9) for extracting and associating location data with conflict incidents. This research highlights the benefits of using RAG for multi-incident context-based location extraction and provides insights into optimizing LLMs through prompt-tuning, hyperparameter adjustment, and model selection for location extraction tasks. The results can also be used to extract context-based locations or relevant information from text-based documents of other applications.https://www.tandfonline.com/doi/10.1080/17538947.2025.2521786Context-based location extractionlarge language modelretrieval augmented generationnatural language processingSudan conflictmedia
spellingShingle	Zifu Wang Yahya Masri Anusha Srirenganathan Malarvizhi Tayven Stover Samir Ahmed David Wong Yongyao Jiang Yun Li Mathieu Bere Daniel Rothbart Dieter Pfoser David Marshall Chaowei Yang Optimizing context-based location extraction by tuning open-source LLMs with RAG International Journal of Digital Earth Context-based location extraction large language model retrieval augmented generation natural language processing Sudan conflict media
title	Optimizing context-based location extraction by tuning open-source LLMs with RAG
title_full	Optimizing context-based location extraction by tuning open-source LLMs with RAG
title_fullStr	Optimizing context-based location extraction by tuning open-source LLMs with RAG
title_full_unstemmed	Optimizing context-based location extraction by tuning open-source LLMs with RAG
title_short	Optimizing context-based location extraction by tuning open-source LLMs with RAG
title_sort	optimizing context based location extraction by tuning open source llms with rag
topic	Context-based location extraction large language model retrieval augmented generation natural language processing Sudan conflict media
url	https://www.tandfonline.com/doi/10.1080/17538947.2025.2521786
work_keys_str_mv	AT zifuwang optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT yahyamasri optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT anushasrirenganathanmalarvizhi optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT tayvenstover optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT samirahmed optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT davidwong optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT yongyaojiang optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT yunli optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT mathieubere optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT danielrothbart optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT dieterpfoser optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT davidmarshall optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT chaoweiyang optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag

Optimizing context-based location extraction by tuning open-source LLMs with RAG

Similar Items