Beyond extraction accuracy: addressing the quality of geographical named entity through advanced recognition and correction models using a modified BERT framework

In the realm of geospatial services and applications, the accuracy of address information is of utmost importance. Traditional methods of data collection, being both labor-intensive and costly, have prompted researchers to turn to Volunteered Geographic Information (VGI) for the extraction of Geogra...

Full description

Saved in:
Bibliographic Details
Main Authors: Liuchang Xu, Jiajun Zhang, Chengkun Zhang, Xinyu Zheng, Zhenhong Du, Xingyu Xue
Format: Article
Language:English
Published: Taylor & Francis Group 2025-05-01
Series:Geo-spatial Information Science
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/10095020.2024.2354229
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849397669640798208
author Liuchang Xu
Jiajun Zhang
Chengkun Zhang
Xinyu Zheng
Zhenhong Du
Xingyu Xue
author_facet Liuchang Xu
Jiajun Zhang
Chengkun Zhang
Xinyu Zheng
Zhenhong Du
Xingyu Xue
author_sort Liuchang Xu
collection DOAJ
description In the realm of geospatial services and applications, the accuracy of address information is of utmost importance. Traditional methods of data collection, being both labor-intensive and costly, have prompted researchers to turn to Volunteered Geographic Information (VGI) for the extraction of Geographical Named Entity (GNE).Notwithstanding, prior studies have predominantly concentrated on enhancing extraction accuracy, while often overlooking the critical aspect of GNE quality. This study addresses this gap by employing a multifaceted approach. Initially, a Geographical Named Entity Semantic Model (GNESM) was constructed by improving the BERT framework and conducting ablation experiments on multiple influencing factors to verify its feasibility. Based on GNESM, a Geographical Named Entity Recognition Model (GNERM) was constructed by incremental pre-training with social media text data and fine-tuning to achieve a recognition accuracy of 90.9%. Subsequently, a Geographical Named Entity Error Correction Model (GNEECM) was constructed by training GNESM with standard GNE data and incorporating error detection and correction modules, achieving a remarkable accuracy of 96.6% in error detection and correction tasks. The experimental results convincingly demonstrate that the proposed identification and correction methods outperform all compared methods. Through the identification and correction process, this study successfully obtained high-quality GNE data, providing a reference for expanding standard address libraries and subsequent research on geographic named entity.
format Article
id doaj-art-3071b78a29644f78a4b95d94cd979ba2
institution Kabale University
issn 1009-5020
1993-5153
language English
publishDate 2025-05-01
publisher Taylor & Francis Group
record_format Article
series Geo-spatial Information Science
spelling doaj-art-3071b78a29644f78a4b95d94cd979ba22025-08-20T03:38:55ZengTaylor & Francis GroupGeo-spatial Information Science1009-50201993-51532025-05-012831195121310.1080/10095020.2024.2354229Beyond extraction accuracy: addressing the quality of geographical named entity through advanced recognition and correction models using a modified BERT frameworkLiuchang Xu0Jiajun Zhang1Chengkun Zhang2Xinyu Zheng3Zhenhong Du4Xingyu Xue5School of Mathematics and Computer Science, Zhejiang Agriculture and Forestry University, Hangzhou, ChinaSchool of Mathematics and Computer Science, Zhejiang Agriculture and Forestry University, Hangzhou, ChinaSchool of Earth Sciences, Zhejiang University, Hangzhou, ChinaSchool of Mathematics and Computer Science, Zhejiang Agriculture and Forestry University, Hangzhou, ChinaSchool of Earth Sciences, Zhejiang University, Hangzhou, ChinaSchool of Mathematics and Computer Science, Zhejiang Agriculture and Forestry University, Hangzhou, ChinaIn the realm of geospatial services and applications, the accuracy of address information is of utmost importance. Traditional methods of data collection, being both labor-intensive and costly, have prompted researchers to turn to Volunteered Geographic Information (VGI) for the extraction of Geographical Named Entity (GNE).Notwithstanding, prior studies have predominantly concentrated on enhancing extraction accuracy, while often overlooking the critical aspect of GNE quality. This study addresses this gap by employing a multifaceted approach. Initially, a Geographical Named Entity Semantic Model (GNESM) was constructed by improving the BERT framework and conducting ablation experiments on multiple influencing factors to verify its feasibility. Based on GNESM, a Geographical Named Entity Recognition Model (GNERM) was constructed by incremental pre-training with social media text data and fine-tuning to achieve a recognition accuracy of 90.9%. Subsequently, a Geographical Named Entity Error Correction Model (GNEECM) was constructed by training GNESM with standard GNE data and incorporating error detection and correction modules, achieving a remarkable accuracy of 96.6% in error detection and correction tasks. The experimental results convincingly demonstrate that the proposed identification and correction methods outperform all compared methods. Through the identification and correction process, this study successfully obtained high-quality GNE data, providing a reference for expanding standard address libraries and subsequent research on geographic named entity.https://www.tandfonline.com/doi/10.1080/10095020.2024.2354229Volunteered Geographic Information(VGI)BERTgeographical named entitygeographical named entity recognitionincremental pre-traininggeographical named entity error correction
spellingShingle Liuchang Xu
Jiajun Zhang
Chengkun Zhang
Xinyu Zheng
Zhenhong Du
Xingyu Xue
Beyond extraction accuracy: addressing the quality of geographical named entity through advanced recognition and correction models using a modified BERT framework
Geo-spatial Information Science
Volunteered Geographic Information(VGI)
BERT
geographical named entity
geographical named entity recognition
incremental pre-training
geographical named entity error correction
title Beyond extraction accuracy: addressing the quality of geographical named entity through advanced recognition and correction models using a modified BERT framework
title_full Beyond extraction accuracy: addressing the quality of geographical named entity through advanced recognition and correction models using a modified BERT framework
title_fullStr Beyond extraction accuracy: addressing the quality of geographical named entity through advanced recognition and correction models using a modified BERT framework
title_full_unstemmed Beyond extraction accuracy: addressing the quality of geographical named entity through advanced recognition and correction models using a modified BERT framework
title_short Beyond extraction accuracy: addressing the quality of geographical named entity through advanced recognition and correction models using a modified BERT framework
title_sort beyond extraction accuracy addressing the quality of geographical named entity through advanced recognition and correction models using a modified bert framework
topic Volunteered Geographic Information(VGI)
BERT
geographical named entity
geographical named entity recognition
incremental pre-training
geographical named entity error correction
url https://www.tandfonline.com/doi/10.1080/10095020.2024.2354229
work_keys_str_mv AT liuchangxu beyondextractionaccuracyaddressingthequalityofgeographicalnamedentitythroughadvancedrecognitionandcorrectionmodelsusingamodifiedbertframework
AT jiajunzhang beyondextractionaccuracyaddressingthequalityofgeographicalnamedentitythroughadvancedrecognitionandcorrectionmodelsusingamodifiedbertframework
AT chengkunzhang beyondextractionaccuracyaddressingthequalityofgeographicalnamedentitythroughadvancedrecognitionandcorrectionmodelsusingamodifiedbertframework
AT xinyuzheng beyondextractionaccuracyaddressingthequalityofgeographicalnamedentitythroughadvancedrecognitionandcorrectionmodelsusingamodifiedbertframework
AT zhenhongdu beyondextractionaccuracyaddressingthequalityofgeographicalnamedentitythroughadvancedrecognitionandcorrectionmodelsusingamodifiedbertframework
AT xingyuxue beyondextractionaccuracyaddressingthequalityofgeographicalnamedentitythroughadvancedrecognitionandcorrectionmodelsusingamodifiedbertframework