Optimizing context-based location extraction by tuning open-source LLMs with RAG
Text data such as news from media include different types of geographic information, represented by location, that indicates the whereabout of events or phenomena. Extracting the geographic locations from text within their contexts is challenging, even with Natural Language Processing (NLP) tools an...
Saved in:
| Main Authors: | , , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Taylor & Francis Group
2025-08-01
|
| Series: | International Journal of Digital Earth |
| Subjects: | |
| Online Access: | https://www.tandfonline.com/doi/10.1080/17538947.2025.2521786 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849224290089566208 |
|---|---|
| author | Zifu Wang Yahya Masri Anusha Srirenganathan Malarvizhi Tayven Stover Samir Ahmed David Wong Yongyao Jiang Yun Li Mathieu Bere Daniel Rothbart Dieter Pfoser David Marshall Chaowei Yang |
| author_facet | Zifu Wang Yahya Masri Anusha Srirenganathan Malarvizhi Tayven Stover Samir Ahmed David Wong Yongyao Jiang Yun Li Mathieu Bere Daniel Rothbart Dieter Pfoser David Marshall Chaowei Yang |
| author_sort | Zifu Wang |
| collection | DOAJ |
| description | Text data such as news from media include different types of geographic information, represented by location, that indicates the whereabout of events or phenomena. Extracting the geographic locations from text within their contexts is challenging, even with Natural Language Processing (NLP) tools and the latest Large Language Models (LLMs). We propose to optimize LLMs using Retrieval-Augmented Generation (RAG) and prompt-tuning methods, such as zero-shot and instruction-based prompting to improve the precision of extracting location information from news. Using Sudan conflict as an example, we extracted the corresponding locations and dates for conflict incidents. We compared runtime and accuracy of using various open-source LLMs, different hyperparameter settings, with and without RAG. Traditional Named Entity Recognition (NER), zero-shot prompting, instruction-based prompting, few-shot prompting, chain-of-thought (CoT) prompting, and RAG-based tuning were compared using an evaluation matrix. RAG-based tuning delivered the highest F1 score (>0.9) for extracting and associating location data with conflict incidents. This research highlights the benefits of using RAG for multi-incident context-based location extraction and provides insights into optimizing LLMs through prompt-tuning, hyperparameter adjustment, and model selection for location extraction tasks. The results can also be used to extract context-based locations or relevant information from text-based documents of other applications. |
| format | Article |
| id | doaj-art-54abc2e2125f4c7a95f934a506bb7aec |
| institution | Kabale University |
| issn | 1753-8947 1753-8955 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | Taylor & Francis Group |
| record_format | Article |
| series | International Journal of Digital Earth |
| spelling | doaj-art-54abc2e2125f4c7a95f934a506bb7aec2025-08-25T11:28:46ZengTaylor & Francis GroupInternational Journal of Digital Earth1753-89471753-89552025-08-0118110.1080/17538947.2025.2521786Optimizing context-based location extraction by tuning open-source LLMs with RAGZifu Wang0Yahya Masri1Anusha Srirenganathan Malarvizhi2Tayven Stover3Samir Ahmed4David Wong5Yongyao Jiang6Yun Li7Mathieu Bere8Daniel Rothbart9Dieter Pfoser10David Marshall11Chaowei Yang12Department of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USACarter School for Peace & Conflict Resolution, George Mason University, Fairfax, VA, USACarter School for Peace & Conflict Resolution, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, George Mason University, Fairfax, VA, USACarter School for Peace & Conflict Resolution, George Mason University, Fairfax, VA, USADepartment of Geography and Geoinformation Science, NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, USAText data such as news from media include different types of geographic information, represented by location, that indicates the whereabout of events or phenomena. Extracting the geographic locations from text within their contexts is challenging, even with Natural Language Processing (NLP) tools and the latest Large Language Models (LLMs). We propose to optimize LLMs using Retrieval-Augmented Generation (RAG) and prompt-tuning methods, such as zero-shot and instruction-based prompting to improve the precision of extracting location information from news. Using Sudan conflict as an example, we extracted the corresponding locations and dates for conflict incidents. We compared runtime and accuracy of using various open-source LLMs, different hyperparameter settings, with and without RAG. Traditional Named Entity Recognition (NER), zero-shot prompting, instruction-based prompting, few-shot prompting, chain-of-thought (CoT) prompting, and RAG-based tuning were compared using an evaluation matrix. RAG-based tuning delivered the highest F1 score (>0.9) for extracting and associating location data with conflict incidents. This research highlights the benefits of using RAG for multi-incident context-based location extraction and provides insights into optimizing LLMs through prompt-tuning, hyperparameter adjustment, and model selection for location extraction tasks. The results can also be used to extract context-based locations or relevant information from text-based documents of other applications.https://www.tandfonline.com/doi/10.1080/17538947.2025.2521786Context-based location extractionlarge language modelretrieval augmented generationnatural language processingSudan conflictmedia |
| spellingShingle | Zifu Wang Yahya Masri Anusha Srirenganathan Malarvizhi Tayven Stover Samir Ahmed David Wong Yongyao Jiang Yun Li Mathieu Bere Daniel Rothbart Dieter Pfoser David Marshall Chaowei Yang Optimizing context-based location extraction by tuning open-source LLMs with RAG International Journal of Digital Earth Context-based location extraction large language model retrieval augmented generation natural language processing Sudan conflict media |
| title | Optimizing context-based location extraction by tuning open-source LLMs with RAG |
| title_full | Optimizing context-based location extraction by tuning open-source LLMs with RAG |
| title_fullStr | Optimizing context-based location extraction by tuning open-source LLMs with RAG |
| title_full_unstemmed | Optimizing context-based location extraction by tuning open-source LLMs with RAG |
| title_short | Optimizing context-based location extraction by tuning open-source LLMs with RAG |
| title_sort | optimizing context based location extraction by tuning open source llms with rag |
| topic | Context-based location extraction large language model retrieval augmented generation natural language processing Sudan conflict media |
| url | https://www.tandfonline.com/doi/10.1080/17538947.2025.2521786 |
| work_keys_str_mv | AT zifuwang optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT yahyamasri optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT anushasrirenganathanmalarvizhi optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT tayvenstover optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT samirahmed optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT davidwong optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT yongyaojiang optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT yunli optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT mathieubere optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT danielrothbart optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT dieterpfoser optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT davidmarshall optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag AT chaoweiyang optimizingcontextbasedlocationextractionbytuningopensourcellmswithrag |