A new hybrid method for data analysis when a significant percentage of data is missing

This article aims to compare the efficiency of different imputation methods with missing data. In this way we use mean, median, Expected-Maximization (EM), regression imputation(RI) and multiple imputations (MI) to replace missing data.In fact, we employ three proposed combination methods, namely EM...

Full description

Saved in:
Bibliographic Details
Main Authors: Behrouz Fathi-Vajargah, Ahmad Nouraldin
Format: Article
Language:English
Published: University of Mohaghegh Ardabili 2024-12-01
Series:Journal of Hyperstructures
Subjects:
Online Access:https://jhs.uma.ac.ir/article_3534_e8b573ee79ad84dc2a9cd6f296b7afb8.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849427786866884608
author Behrouz Fathi-Vajargah
Ahmad Nouraldin
author_facet Behrouz Fathi-Vajargah
Ahmad Nouraldin
author_sort Behrouz Fathi-Vajargah
collection DOAJ
description This article aims to compare the efficiency of different imputation methods with missing data. In this way we use mean, median, Expected-Maximization (EM), regression imputation(RI) and multiple imputations (MI) to replace missing data.In fact, we employ three proposed combination methods, namely EM imputation with MI imputation (EMMI), EM imputation with regression imputation (EMR), and regression imputation with MIimputation (MI). In this paper, we compare these methods using an example study of Waterborne Container Trade by the US Customs Port (2000-2017) where the methods with different missing percent-ages. Several criteria, are used to compare estimations efficiency, such as mean, Standard Deviation (SD), and Mean Squared Error (MSE). The results show that the efficiency of composite imputation methods in almost all situations, in terms of MSE, RMI imputation method outperforms other methods. Nevertheless, when the missing percentage is small, the EMR imputation method performs better. In terms of the SD criterion, we find that the MI method is better than the other methods, where the RMI method is good when the missing percentage is large. When the missing percentage is in the range (40-50%), the EMR and RMI imputation methods give a better MSE.
format Article
id doaj-art-eb7eca76176842f79e7c40169f4d114c
institution Kabale University
issn 2251-8436
2322-1666
language English
publishDate 2024-12-01
publisher University of Mohaghegh Ardabili
record_format Article
series Journal of Hyperstructures
spelling doaj-art-eb7eca76176842f79e7c40169f4d114c2025-08-20T03:28:54ZengUniversity of Mohaghegh ArdabiliJournal of Hyperstructures2251-84362322-16662024-12-0113229730410.22098/jhs.2024.15095.10153534A new hybrid method for data analysis when a significant percentage of data is missingBehrouz Fathi-Vajargah0Ahmad Nouraldin1Department of Statistics, Faculty of Mathematical Sciences, University of Guilan, Rasht, IranDep. of Applied Maths, University of Guilan, Rasht, IranThis article aims to compare the efficiency of different imputation methods with missing data. In this way we use mean, median, Expected-Maximization (EM), regression imputation(RI) and multiple imputations (MI) to replace missing data.In fact, we employ three proposed combination methods, namely EM imputation with MI imputation (EMMI), EM imputation with regression imputation (EMR), and regression imputation with MIimputation (MI). In this paper, we compare these methods using an example study of Waterborne Container Trade by the US Customs Port (2000-2017) where the methods with different missing percent-ages. Several criteria, are used to compare estimations efficiency, such as mean, Standard Deviation (SD), and Mean Squared Error (MSE). The results show that the efficiency of composite imputation methods in almost all situations, in terms of MSE, RMI imputation method outperforms other methods. Nevertheless, when the missing percentage is small, the EMR imputation method performs better. In terms of the SD criterion, we find that the MI method is better than the other methods, where the RMI method is good when the missing percentage is large. When the missing percentage is in the range (40-50%), the EMR and RMI imputation methods give a better MSE.https://jhs.uma.ac.ir/article_3534_e8b573ee79ad84dc2a9cd6f296b7afb8.pdfmissing dataimputationmean square errormeanstandard deviation
spellingShingle Behrouz Fathi-Vajargah
Ahmad Nouraldin
A new hybrid method for data analysis when a significant percentage of data is missing
Journal of Hyperstructures
missing data
imputation
mean square error
mean
standard deviation
title A new hybrid method for data analysis when a significant percentage of data is missing
title_full A new hybrid method for data analysis when a significant percentage of data is missing
title_fullStr A new hybrid method for data analysis when a significant percentage of data is missing
title_full_unstemmed A new hybrid method for data analysis when a significant percentage of data is missing
title_short A new hybrid method for data analysis when a significant percentage of data is missing
title_sort new hybrid method for data analysis when a significant percentage of data is missing
topic missing data
imputation
mean square error
mean
standard deviation
url https://jhs.uma.ac.ir/article_3534_e8b573ee79ad84dc2a9cd6f296b7afb8.pdf
work_keys_str_mv AT behrouzfathivajargah anewhybridmethodfordataanalysiswhenasignificantpercentageofdataismissing
AT ahmadnouraldin anewhybridmethodfordataanalysiswhenasignificantpercentageofdataismissing
AT behrouzfathivajargah newhybridmethodfordataanalysiswhenasignificantpercentageofdataismissing
AT ahmadnouraldin newhybridmethodfordataanalysiswhenasignificantpercentageofdataismissing