Robust multi-source geographic entities matching by maximizing geometric and semantic similarity

Abstract Geographic entity matching is an important means for multi-source spatial data fusion and information association and sharing. Corresponding matching methods have been designed by existing studies for different types of entity data characteristics, such as line and area. However, these appr...

Full description

Saved in:
Bibliographic Details
Main Authors: YuHan Yan, PengDa Wu, Yong Yin, PeiPei Guo
Format: Article
Language:English
Published: Nature Portfolio 2024-12-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-79812-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841559459339436032
author YuHan Yan
PengDa Wu
Yong Yin
PeiPei Guo
author_facet YuHan Yan
PengDa Wu
Yong Yin
PeiPei Guo
author_sort YuHan Yan
collection DOAJ
description Abstract Geographic entity matching is an important means for multi-source spatial data fusion and information association and sharing. Corresponding matching methods have been designed by existing studies for different types of entity data characteristics, such as line and area. However, these approaches are often limited in the generalization ability for matching heterogeneous data from multiple sources and the accuracy for complex pattern matching. To resolve these problems, robust multi-source geographic entities matching by maximizing geometric and semantic similarity is proposed. First, the entire entity is segmented based on shape features, and the partitioned feature segments are extracted as matching primitives; Second, feature segments are grouped into patterns, encompassing three major categories and fourteen subcategories; Following this, pattern matching is performed based on spatial similarity metric such as maximum projection distance, etc.; Finally, the spatial matches are detected and refined through semantic similarity calculation. The proposed method is tested using two datasets from regions in southeast and northwest China. The experimental results demonstrate that our method can be effectively applied to both area and line entity matching with strong generalization and application capability and significantly improved matching accuracy. Specifically, nine feature segment matching patterns for matching area entities and six for line entities are utilized, and the precision and recall are nearly 90%.
format Article
id doaj-art-ae68a4c552d243c1aa3594ad610ae6f2
institution Kabale University
issn 2045-2322
language English
publishDate 2024-12-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-ae68a4c552d243c1aa3594ad610ae6f22025-01-05T12:28:14ZengNature PortfolioScientific Reports2045-23222024-12-0114111710.1038/s41598-024-79812-2Robust multi-source geographic entities matching by maximizing geometric and semantic similarityYuHan Yan0PengDa Wu1Yong Yin2PeiPei Guo3Department of Geographic Information System, Chinese Academy of Surveying and mappingDepartment of Geographic Information System, Chinese Academy of Surveying and mappingDepartment of Geographic Information System, Chinese Academy of Surveying and mappingDepartment of Geographic Information System, Chinese Academy of Surveying and mappingAbstract Geographic entity matching is an important means for multi-source spatial data fusion and information association and sharing. Corresponding matching methods have been designed by existing studies for different types of entity data characteristics, such as line and area. However, these approaches are often limited in the generalization ability for matching heterogeneous data from multiple sources and the accuracy for complex pattern matching. To resolve these problems, robust multi-source geographic entities matching by maximizing geometric and semantic similarity is proposed. First, the entire entity is segmented based on shape features, and the partitioned feature segments are extracted as matching primitives; Second, feature segments are grouped into patterns, encompassing three major categories and fourteen subcategories; Following this, pattern matching is performed based on spatial similarity metric such as maximum projection distance, etc.; Finally, the spatial matches are detected and refined through semantic similarity calculation. The proposed method is tested using two datasets from regions in southeast and northwest China. The experimental results demonstrate that our method can be effectively applied to both area and line entity matching with strong generalization and application capability and significantly improved matching accuracy. Specifically, nine feature segment matching patterns for matching area entities and six for line entities are utilized, and the precision and recall are nearly 90%.https://doi.org/10.1038/s41598-024-79812-2Area entityLine entityFeature segmentPattern recognitionSemantic similarity
spellingShingle YuHan Yan
PengDa Wu
Yong Yin
PeiPei Guo
Robust multi-source geographic entities matching by maximizing geometric and semantic similarity
Scientific Reports
Area entity
Line entity
Feature segment
Pattern recognition
Semantic similarity
title Robust multi-source geographic entities matching by maximizing geometric and semantic similarity
title_full Robust multi-source geographic entities matching by maximizing geometric and semantic similarity
title_fullStr Robust multi-source geographic entities matching by maximizing geometric and semantic similarity
title_full_unstemmed Robust multi-source geographic entities matching by maximizing geometric and semantic similarity
title_short Robust multi-source geographic entities matching by maximizing geometric and semantic similarity
title_sort robust multi source geographic entities matching by maximizing geometric and semantic similarity
topic Area entity
Line entity
Feature segment
Pattern recognition
Semantic similarity
url https://doi.org/10.1038/s41598-024-79812-2
work_keys_str_mv AT yuhanyan robustmultisourcegeographicentitiesmatchingbymaximizinggeometricandsemanticsimilarity
AT pengdawu robustmultisourcegeographicentitiesmatchingbymaximizinggeometricandsemanticsimilarity
AT yongyin robustmultisourcegeographicentitiesmatchingbymaximizinggeometricandsemanticsimilarity
AT peipeiguo robustmultisourcegeographicentitiesmatchingbymaximizinggeometricandsemanticsimilarity