Research on graph-based heterogeneous data integration method

Various departments of the enterprise implement decentralized management of data, and the chimney-style system construction causes data to be scattered in heterogeneous databases. Heterogeneous data poses a series of challenges to the current data integration work. In order to solve the problem of d...

Full description

Saved in:
Bibliographic Details
Main Authors: HUANG Yuezhen, YANG Fen, TIAN Feng, ZHANG Chengye, LI Yuchan
Format: Article
Language:zho
Published: China InfoCom Media Group 2025-01-01
Series:大数据
Subjects:
Online Access:http://www.j-bigdataresearch.com.cn/zh/article/doi/10.11959/j.issn.2096-0271.2025002/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850030311560183808
author HUANG Yuezhen
YANG Fen
TIAN Feng
ZHANG Chengye
LI Yuchan
author_facet HUANG Yuezhen
YANG Fen
TIAN Feng
ZHANG Chengye
LI Yuchan
author_sort HUANG Yuezhen
collection DOAJ
description Various departments of the enterprise implement decentralized management of data, and the chimney-style system construction causes data to be scattered in heterogeneous databases. Heterogeneous data poses a series of challenges to the current data integration work. In order to solve the problem of data aggregation and fusion of enterprise heterogeneous systems, an end-to-end data integration framework based on graph was proposed. Firstly, the table and field entity relationships were constructed into a network graph based on the primary and foreign key relationships of the relational data model. The table names and field names were regarded as different types of entities in the graph. Then, input the constructed graph into the graph neural network, and the vector representation of each node in the graph was obtained through graph convolution. Based on the node vectors, the node mapping relationship of any two graphs that need to be matched can be calculated. After aligning the tables and fields of the graph, the next step was to standardize the field values, meaning that the value of each cell was mapped to a standard value. Finally, engineer the above results into executable query statements for the database to achieve heterogeneous data fusion. Through verification on real data within the enterprise, the experimental results show that the framework proposed in the paper can improve the development efficiency of data integration, and the model is not limited by business fields and has strong portability.
format Article
id doaj-art-e55b71185e5d485d8f6c305dd4c6577b
institution DOAJ
issn 2096-0271
language zho
publishDate 2025-01-01
publisher China InfoCom Media Group
record_format Article
series 大数据
spelling doaj-art-e55b71185e5d485d8f6c305dd4c6577b2025-08-20T02:59:15ZzhoChina InfoCom Media Group大数据2096-02712025-01-0111213581058899Research on graph-based heterogeneous data integration methodHUANG YuezhenYANG FenTIAN FengZHANG ChengyeLI YuchanVarious departments of the enterprise implement decentralized management of data, and the chimney-style system construction causes data to be scattered in heterogeneous databases. Heterogeneous data poses a series of challenges to the current data integration work. In order to solve the problem of data aggregation and fusion of enterprise heterogeneous systems, an end-to-end data integration framework based on graph was proposed. Firstly, the table and field entity relationships were constructed into a network graph based on the primary and foreign key relationships of the relational data model. The table names and field names were regarded as different types of entities in the graph. Then, input the constructed graph into the graph neural network, and the vector representation of each node in the graph was obtained through graph convolution. Based on the node vectors, the node mapping relationship of any two graphs that need to be matched can be calculated. After aligning the tables and fields of the graph, the next step was to standardize the field values, meaning that the value of each cell was mapped to a standard value. Finally, engineer the above results into executable query statements for the database to achieve heterogeneous data fusion. Through verification on real data within the enterprise, the experimental results show that the framework proposed in the paper can improve the development efficiency of data integration, and the model is not limited by business fields and has strong portability.http://www.j-bigdataresearch.com.cn/zh/article/doi/10.11959/j.issn.2096-0271.2025002/data integrationdata fusionheterogeneous dataschema matchingentity alignmentgraph neural
spellingShingle HUANG Yuezhen
YANG Fen
TIAN Feng
ZHANG Chengye
LI Yuchan
Research on graph-based heterogeneous data integration method
大数据
data integration
data fusion
heterogeneous data
schema matching
entity alignment
graph neural
title Research on graph-based heterogeneous data integration method
title_full Research on graph-based heterogeneous data integration method
title_fullStr Research on graph-based heterogeneous data integration method
title_full_unstemmed Research on graph-based heterogeneous data integration method
title_short Research on graph-based heterogeneous data integration method
title_sort research on graph based heterogeneous data integration method
topic data integration
data fusion
heterogeneous data
schema matching
entity alignment
graph neural
url http://www.j-bigdataresearch.com.cn/zh/article/doi/10.11959/j.issn.2096-0271.2025002/
work_keys_str_mv AT huangyuezhen researchongraphbasedheterogeneousdataintegrationmethod
AT yangfen researchongraphbasedheterogeneousdataintegrationmethod
AT tianfeng researchongraphbasedheterogeneousdataintegrationmethod
AT zhangchengye researchongraphbasedheterogeneousdataintegrationmethod
AT liyuchan researchongraphbasedheterogeneousdataintegrationmethod