HAML-IRL: Overcoming the Imbalanced Record Linkage Problem Using Hybrid Active Machine Learning

Traditional active machine learning (AML) methods employed in Record Linkage (RL) or Entity Resolution (ER) tasks often struggle with model stability, slow convergence, and handling imbalanced data. Our study introduces a novel hybrid Active Machine Learning approach to address RL, overcoming the ch...

Full description

Saved in:
Bibliographic Details
Main Authors: Mourad Jabrane, Mouad JBEL, Imad HAFIDI, Yassir ROCHD
Format: Article
Language:English
Published: Scientific Research Support Fund of Jordan (SRSF) and Princess Sumaya University for Technology (PSUT) 2025-04-01
Series:Jordanian Journal of Computers and Information Technology
Subjects:
Online Access:http://www.ejmanager.com/fulltextpdf.php?mno=220477
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849688232644575232
author Mourad Jabrane
Mouad JBEL
Imad HAFIDI
Yassir ROCHD
author_facet Mourad Jabrane
Mouad JBEL
Imad HAFIDI
Yassir ROCHD
author_sort Mourad Jabrane
collection DOAJ
description Traditional active machine learning (AML) methods employed in Record Linkage (RL) or Entity Resolution (ER) tasks often struggle with model stability, slow convergence, and handling imbalanced data. Our study introduces a novel hybrid Active Machine Learning approach to address RL, overcoming the challenges of limited labeled data and imbalanced classes. By combining and balancing informativeness, which selects record pairs to reduce model uncertainty, and representativeness, which ensures the chosen pairs reflect the overall dataset patterns, our hybrid approach, called Hybrid Active Machine Learning for Imbalanced Record Linkage (HAML-IRL), demonstrates significant advancements.HAML-IRL achieves an average 12% improvement in F1-scores across eleven real-world datasets, including structured, textual, and dirty data, when compared to state-of-the-art AML methods. Our approach also requires up to 60% - 85% fewer labeled samples dependening on the datasets, accelerates model convergence, and offers superior stability across iterations, making it a robust and efficient solution for real-world record linkage tasks. [JJCIT 2025; 11(2.000): 151-169]
format Article
id doaj-art-4cff365170dc49da8ddb77572b5274ef
institution DOAJ
issn 2413-9351
2415-1076
language English
publishDate 2025-04-01
publisher Scientific Research Support Fund of Jordan (SRSF) and Princess Sumaya University for Technology (PSUT)
record_format Article
series Jordanian Journal of Computers and Information Technology
spelling doaj-art-4cff365170dc49da8ddb77572b5274ef2025-08-20T03:22:04ZengScientific Research Support Fund of Jordan (SRSF) and Princess Sumaya University for Technology (PSUT)Jordanian Journal of Computers and Information Technology2413-93512415-10762025-04-0111215116910.5455/jjcit.71-1726277421220477HAML-IRL: Overcoming the Imbalanced Record Linkage Problem Using Hybrid Active Machine LearningMourad Jabrane0Mouad JBEL1Imad HAFIDI2Yassir ROCHD3mourad.jabrane@usms.ac.ma mouad.jbel@usms.ac.ma i.hafidi@usms.ma y.rochd@usms.maTraditional active machine learning (AML) methods employed in Record Linkage (RL) or Entity Resolution (ER) tasks often struggle with model stability, slow convergence, and handling imbalanced data. Our study introduces a novel hybrid Active Machine Learning approach to address RL, overcoming the challenges of limited labeled data and imbalanced classes. By combining and balancing informativeness, which selects record pairs to reduce model uncertainty, and representativeness, which ensures the chosen pairs reflect the overall dataset patterns, our hybrid approach, called Hybrid Active Machine Learning for Imbalanced Record Linkage (HAML-IRL), demonstrates significant advancements.HAML-IRL achieves an average 12% improvement in F1-scores across eleven real-world datasets, including structured, textual, and dirty data, when compared to state-of-the-art AML methods. Our approach also requires up to 60% - 85% fewer labeled samples dependening on the datasets, accelerates model convergence, and offers superior stability across iterations, making it a robust and efficient solution for real-world record linkage tasks. [JJCIT 2025; 11(2.000): 151-169]http://www.ejmanager.com/fulltextpdf.php?mno=220477record linkageentity resolutionactive machine learninghybrid query
spellingShingle Mourad Jabrane
Mouad JBEL
Imad HAFIDI
Yassir ROCHD
HAML-IRL: Overcoming the Imbalanced Record Linkage Problem Using Hybrid Active Machine Learning
Jordanian Journal of Computers and Information Technology
record linkage
entity resolution
active machine learning
hybrid query
title HAML-IRL: Overcoming the Imbalanced Record Linkage Problem Using Hybrid Active Machine Learning
title_full HAML-IRL: Overcoming the Imbalanced Record Linkage Problem Using Hybrid Active Machine Learning
title_fullStr HAML-IRL: Overcoming the Imbalanced Record Linkage Problem Using Hybrid Active Machine Learning
title_full_unstemmed HAML-IRL: Overcoming the Imbalanced Record Linkage Problem Using Hybrid Active Machine Learning
title_short HAML-IRL: Overcoming the Imbalanced Record Linkage Problem Using Hybrid Active Machine Learning
title_sort haml irl overcoming the imbalanced record linkage problem using hybrid active machine learning
topic record linkage
entity resolution
active machine learning
hybrid query
url http://www.ejmanager.com/fulltextpdf.php?mno=220477
work_keys_str_mv AT mouradjabrane hamlirlovercomingtheimbalancedrecordlinkageproblemusinghybridactivemachinelearning
AT mouadjbel hamlirlovercomingtheimbalancedrecordlinkageproblemusinghybridactivemachinelearning
AT imadhafidi hamlirlovercomingtheimbalancedrecordlinkageproblemusinghybridactivemachinelearning
AT yassirrochd hamlirlovercomingtheimbalancedrecordlinkageproblemusinghybridactivemachinelearning