HAML-IRL: Overcoming the Imbalanced Record Linkage Problem Using Hybrid Active Machine Learning
Traditional active machine learning (AML) methods employed in Record Linkage (RL) or Entity Resolution (ER) tasks often struggle with model stability, slow convergence, and handling imbalanced data. Our study introduces a novel hybrid Active Machine Learning approach to address RL, overcoming the ch...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Scientific Research Support Fund of Jordan (SRSF) and Princess Sumaya University for Technology (PSUT)
2025-04-01
|
| Series: | Jordanian Journal of Computers and Information Technology |
| Subjects: | |
| Online Access: | http://www.ejmanager.com/fulltextpdf.php?mno=220477 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849688232644575232 |
|---|---|
| author | Mourad Jabrane Mouad JBEL Imad HAFIDI Yassir ROCHD |
| author_facet | Mourad Jabrane Mouad JBEL Imad HAFIDI Yassir ROCHD |
| author_sort | Mourad Jabrane |
| collection | DOAJ |
| description | Traditional active machine learning (AML) methods employed in Record Linkage (RL) or Entity Resolution (ER) tasks often struggle with model stability, slow convergence, and handling imbalanced data. Our study introduces a novel hybrid Active Machine Learning approach to address RL, overcoming the challenges of limited labeled data and imbalanced classes. By combining and balancing informativeness, which selects record pairs to reduce model uncertainty, and representativeness, which ensures the chosen pairs reflect the overall dataset patterns, our hybrid approach, called Hybrid Active Machine Learning for Imbalanced Record Linkage (HAML-IRL), demonstrates significant advancements.HAML-IRL achieves an average 12% improvement in F1-scores across eleven real-world datasets, including structured, textual, and dirty data, when compared to state-of-the-art AML methods. Our approach also requires up to 60% - 85% fewer labeled samples dependening on the datasets, accelerates model convergence, and offers superior stability across iterations, making it a robust and efficient solution for real-world record linkage tasks. [JJCIT 2025; 11(2.000): 151-169] |
| format | Article |
| id | doaj-art-4cff365170dc49da8ddb77572b5274ef |
| institution | DOAJ |
| issn | 2413-9351 2415-1076 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | Scientific Research Support Fund of Jordan (SRSF) and Princess Sumaya University for Technology (PSUT) |
| record_format | Article |
| series | Jordanian Journal of Computers and Information Technology |
| spelling | doaj-art-4cff365170dc49da8ddb77572b5274ef2025-08-20T03:22:04ZengScientific Research Support Fund of Jordan (SRSF) and Princess Sumaya University for Technology (PSUT)Jordanian Journal of Computers and Information Technology2413-93512415-10762025-04-0111215116910.5455/jjcit.71-1726277421220477HAML-IRL: Overcoming the Imbalanced Record Linkage Problem Using Hybrid Active Machine LearningMourad Jabrane0Mouad JBEL1Imad HAFIDI2Yassir ROCHD3mourad.jabrane@usms.ac.ma mouad.jbel@usms.ac.ma i.hafidi@usms.ma y.rochd@usms.maTraditional active machine learning (AML) methods employed in Record Linkage (RL) or Entity Resolution (ER) tasks often struggle with model stability, slow convergence, and handling imbalanced data. Our study introduces a novel hybrid Active Machine Learning approach to address RL, overcoming the challenges of limited labeled data and imbalanced classes. By combining and balancing informativeness, which selects record pairs to reduce model uncertainty, and representativeness, which ensures the chosen pairs reflect the overall dataset patterns, our hybrid approach, called Hybrid Active Machine Learning for Imbalanced Record Linkage (HAML-IRL), demonstrates significant advancements.HAML-IRL achieves an average 12% improvement in F1-scores across eleven real-world datasets, including structured, textual, and dirty data, when compared to state-of-the-art AML methods. Our approach also requires up to 60% - 85% fewer labeled samples dependening on the datasets, accelerates model convergence, and offers superior stability across iterations, making it a robust and efficient solution for real-world record linkage tasks. [JJCIT 2025; 11(2.000): 151-169]http://www.ejmanager.com/fulltextpdf.php?mno=220477record linkageentity resolutionactive machine learninghybrid query |
| spellingShingle | Mourad Jabrane Mouad JBEL Imad HAFIDI Yassir ROCHD HAML-IRL: Overcoming the Imbalanced Record Linkage Problem Using Hybrid Active Machine Learning Jordanian Journal of Computers and Information Technology record linkage entity resolution active machine learning hybrid query |
| title | HAML-IRL: Overcoming the Imbalanced Record Linkage Problem Using Hybrid Active Machine Learning |
| title_full | HAML-IRL: Overcoming the Imbalanced Record Linkage Problem Using Hybrid Active Machine Learning |
| title_fullStr | HAML-IRL: Overcoming the Imbalanced Record Linkage Problem Using Hybrid Active Machine Learning |
| title_full_unstemmed | HAML-IRL: Overcoming the Imbalanced Record Linkage Problem Using Hybrid Active Machine Learning |
| title_short | HAML-IRL: Overcoming the Imbalanced Record Linkage Problem Using Hybrid Active Machine Learning |
| title_sort | haml irl overcoming the imbalanced record linkage problem using hybrid active machine learning |
| topic | record linkage entity resolution active machine learning hybrid query |
| url | http://www.ejmanager.com/fulltextpdf.php?mno=220477 |
| work_keys_str_mv | AT mouradjabrane hamlirlovercomingtheimbalancedrecordlinkageproblemusinghybridactivemachinelearning AT mouadjbel hamlirlovercomingtheimbalancedrecordlinkageproblemusinghybridactivemachinelearning AT imadhafidi hamlirlovercomingtheimbalancedrecordlinkageproblemusinghybridactivemachinelearning AT yassirrochd hamlirlovercomingtheimbalancedrecordlinkageproblemusinghybridactivemachinelearning |