Temporal record linkage for heterogeneous big data records

Temporal Record Linkage (TRL) or Temporal Entity Matching (TEM) is the process of identifying records/entities that refer to the same real-world object in different lifetime states. TRL is a well-known problem in different data engineering contexts e.g. data analysis, data warehousing, data mining,...

Full description

Saved in:
Bibliographic Details
Main Authors: Reham I. Abdel Monem, Ehab E. Hassanein, Ali Z. El Qutaany
Format: Article
Language:English
Published: Elsevier 2025-06-01
Series:Egyptian Informatics Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1110866525000350
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850201663119294464
author Reham I. Abdel Monem
Ehab E. Hassanein
Ali Z. El Qutaany
author_facet Reham I. Abdel Monem
Ehab E. Hassanein
Ali Z. El Qutaany
author_sort Reham I. Abdel Monem
collection DOAJ
description Temporal Record Linkage (TRL) or Temporal Entity Matching (TEM) is the process of identifying records/entities that refer to the same real-world object in different lifetime states. TRL is a well-known problem in different data engineering contexts e.g. data analysis, data warehousing, data mining, and/or machine learning to identify entities denoting the same real-world object over time. Unlike traditional record linkage which considers differences between records of the same entity as contradictions; temporal record linkage considers such differences as normal entity growth over time. Existing frameworks which are limited to, No model, Decay, Disprob, Mixed, and Agreement First Dynamic Second (AFDS) which deal with temporal record linkage achieve high accuracy but with high computation cost. They condition the presence of the time dimension to detect similar entities that refer to the same real-world object. In this research, we present a framework called Tracking Similar Entities in Heterogeneous Temporal Records (TSE-HTR) to track similar entities in heterogeneous, big, low-quality, and temporal data regardless of the presence of the time dimension. It introduces data cleansing and state ranking modules to detect anomalies within similar entities, find the final and accurate set of them, and explain anomalies to the users or domain experts in a comprehensible manner that not only offers increased business intelligence but also opens opportunities for improved solutions. It presents to the user the records of different states of the same real-world object ranked according to different quality measures like completeness, validity, and accuracy. Performance evaluation of the proposed framework against existing frameworks over real and big data shows a great improvement in both effectiveness and efficiency.
format Article
id doaj-art-af2d76adefc7419f93c06da681c26a7f
institution OA Journals
issn 1110-8665
language English
publishDate 2025-06-01
publisher Elsevier
record_format Article
series Egyptian Informatics Journal
spelling doaj-art-af2d76adefc7419f93c06da681c26a7f2025-08-20T02:11:57ZengElsevierEgyptian Informatics Journal1110-86652025-06-013010064210.1016/j.eij.2025.100642Temporal record linkage for heterogeneous big data recordsReham I. Abdel Monem0Ehab E. Hassanein1Ali Z. El Qutaany2Corresponding author.; Information Systems Department, Faculty of Computers and Artificial Intelligence, Cairo University, Cairo, EgyptInformation Systems Department, Faculty of Computers and Artificial Intelligence, Cairo University, Cairo, EgyptInformation Systems Department, Faculty of Computers and Artificial Intelligence, Cairo University, Cairo, EgyptTemporal Record Linkage (TRL) or Temporal Entity Matching (TEM) is the process of identifying records/entities that refer to the same real-world object in different lifetime states. TRL is a well-known problem in different data engineering contexts e.g. data analysis, data warehousing, data mining, and/or machine learning to identify entities denoting the same real-world object over time. Unlike traditional record linkage which considers differences between records of the same entity as contradictions; temporal record linkage considers such differences as normal entity growth over time. Existing frameworks which are limited to, No model, Decay, Disprob, Mixed, and Agreement First Dynamic Second (AFDS) which deal with temporal record linkage achieve high accuracy but with high computation cost. They condition the presence of the time dimension to detect similar entities that refer to the same real-world object. In this research, we present a framework called Tracking Similar Entities in Heterogeneous Temporal Records (TSE-HTR) to track similar entities in heterogeneous, big, low-quality, and temporal data regardless of the presence of the time dimension. It introduces data cleansing and state ranking modules to detect anomalies within similar entities, find the final and accurate set of them, and explain anomalies to the users or domain experts in a comprehensible manner that not only offers increased business intelligence but also opens opportunities for improved solutions. It presents to the user the records of different states of the same real-world object ranked according to different quality measures like completeness, validity, and accuracy. Performance evaluation of the proposed framework against existing frameworks over real and big data shows a great improvement in both effectiveness and efficiency.http://www.sciencedirect.com/science/article/pii/S1110866525000350Lifetime states featuresFeature engineeringBlockingTemporal modelData cleansingAnomaly explanation
spellingShingle Reham I. Abdel Monem
Ehab E. Hassanein
Ali Z. El Qutaany
Temporal record linkage for heterogeneous big data records
Egyptian Informatics Journal
Lifetime states features
Feature engineering
Blocking
Temporal model
Data cleansing
Anomaly explanation
title Temporal record linkage for heterogeneous big data records
title_full Temporal record linkage for heterogeneous big data records
title_fullStr Temporal record linkage for heterogeneous big data records
title_full_unstemmed Temporal record linkage for heterogeneous big data records
title_short Temporal record linkage for heterogeneous big data records
title_sort temporal record linkage for heterogeneous big data records
topic Lifetime states features
Feature engineering
Blocking
Temporal model
Data cleansing
Anomaly explanation
url http://www.sciencedirect.com/science/article/pii/S1110866525000350
work_keys_str_mv AT rehamiabdelmonem temporalrecordlinkageforheterogeneousbigdatarecords
AT ehabehassanein temporalrecordlinkageforheterogeneousbigdatarecords
AT alizelqutaany temporalrecordlinkageforheterogeneousbigdatarecords