Use of Data Mining for Intelligent Evaluation of Imputation Methods

In real-world situations, researchers frequently face the difficulty of missing values (MV), i.e., values not observed in a data set. Data imputation techniques allow the estimation of MV using different algorithms, by means of which important data can be imputed for a particular instance. Most of t...

Full description

Saved in:
Bibliographic Details
Main Authors: David Red, Carlos R. Primorac
Format: Article
Language:English
Published: Universidad Internacional de La Rioja (UNIR) 2025-06-01
Series:International Journal of Interactive Multimedia and Artificial Intelligence
Subjects:
Online Access:https://www.ijimai.org/journal/bibcite/reference/3291
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850252718771273728
author David Red
Carlos R. Primorac
author_facet David Red
Carlos R. Primorac
author_sort David Red
collection DOAJ
description In real-world situations, researchers frequently face the difficulty of missing values (MV), i.e., values not observed in a data set. Data imputation techniques allow the estimation of MV using different algorithms, by means of which important data can be imputed for a particular instance. Most of the literature in this field deals with different imputation methods. However, few studies deal with a comparative evaluation of the different methods as to provide more appropriate guidelines for the selection of the method to be applied to impute data for specific situations. The objective of this work is to show a methodology for evaluating the performance of imputation methods by means of new metrics derived from data mining processes, using quality metrics of data mining models. We started from the complete dataset that was amputated with different amputation mechanisms to generate 63 datasets with MV; these were imputed using Median, k-NN, k-Means and Hot-Deck imputation methods. The performance of the imputation methods was evaluated using new metrics derived from quality metrics of the data mining processes, performed with the original full file and with the imputed files. This evaluation is not based on measuring the error when imputing (usual operation), but on considering the similarity of the values of the quality metrics of the data mining processes obtained with the original file and with the imputed files. The results show that –globally considered and according to the new proposed metric, the imputation methods that showed the best performance were k-NN and k-Means. An additional advantage of the proposed methodology is that it provides predictive data mining models that can be used a posteriori.
format Article
id doaj-art-21c4eb07cbad4117affe574a4f253f18
institution OA Journals
issn 1989-1660
language English
publishDate 2025-06-01
publisher Universidad Internacional de La Rioja (UNIR)
record_format Article
series International Journal of Interactive Multimedia and Artificial Intelligence
spelling doaj-art-21c4eb07cbad4117affe574a4f253f182025-08-20T01:57:35ZengUniversidad Internacional de La Rioja (UNIR)International Journal of Interactive Multimedia and Artificial Intelligence1989-16602025-06-0193829510.9781/ijimai.2023.03.002ijimai.2023.03.002Use of Data Mining for Intelligent Evaluation of Imputation MethodsDavid RedCarlos R. PrimoracIn real-world situations, researchers frequently face the difficulty of missing values (MV), i.e., values not observed in a data set. Data imputation techniques allow the estimation of MV using different algorithms, by means of which important data can be imputed for a particular instance. Most of the literature in this field deals with different imputation methods. However, few studies deal with a comparative evaluation of the different methods as to provide more appropriate guidelines for the selection of the method to be applied to impute data for specific situations. The objective of this work is to show a methodology for evaluating the performance of imputation methods by means of new metrics derived from data mining processes, using quality metrics of data mining models. We started from the complete dataset that was amputated with different amputation mechanisms to generate 63 datasets with MV; these were imputed using Median, k-NN, k-Means and Hot-Deck imputation methods. The performance of the imputation methods was evaluated using new metrics derived from quality metrics of the data mining processes, performed with the original full file and with the imputed files. This evaluation is not based on measuring the error when imputing (usual operation), but on considering the similarity of the values of the quality metrics of the data mining processes obtained with the original file and with the imputed files. The results show that –globally considered and according to the new proposed metric, the imputation methods that showed the best performance were k-NN and k-Means. An additional advantage of the proposed methodology is that it provides predictive data mining models that can be used a posteriori.https://www.ijimai.org/journal/bibcite/reference/3291computer scienceimputationdata mininginterdisciplinary applicationsperformance evaluation
spellingShingle David Red
Carlos R. Primorac
Use of Data Mining for Intelligent Evaluation of Imputation Methods
International Journal of Interactive Multimedia and Artificial Intelligence
computer science
imputation
data mining
interdisciplinary applications
performance evaluation
title Use of Data Mining for Intelligent Evaluation of Imputation Methods
title_full Use of Data Mining for Intelligent Evaluation of Imputation Methods
title_fullStr Use of Data Mining for Intelligent Evaluation of Imputation Methods
title_full_unstemmed Use of Data Mining for Intelligent Evaluation of Imputation Methods
title_short Use of Data Mining for Intelligent Evaluation of Imputation Methods
title_sort use of data mining for intelligent evaluation of imputation methods
topic computer science
imputation
data mining
interdisciplinary applications
performance evaluation
url https://www.ijimai.org/journal/bibcite/reference/3291
work_keys_str_mv AT davidred useofdataminingforintelligentevaluationofimputationmethods
AT carlosrprimorac useofdataminingforintelligentevaluationofimputationmethods