Missing-Values Adjustment for Mixed-Type Data

We propose a new method of single imputation, reconstruction, and estimation of nonreported, incorrect, implausible, or excluded values in more than one field of the record. In particular, we will be concerned with data sets involving a mixture of numeric, ordinal, binary, and categorical variables....

Full description

Saved in:
Bibliographic Details
Main Authors: Agostino Tarsitano, Marianna Falcone
Format: Article
Language:English
Published: Wiley 2011-01-01
Series:Journal of Probability and Statistics
Online Access:http://dx.doi.org/10.1155/2011/290380
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850157715241828352
author Agostino Tarsitano
Marianna Falcone
author_facet Agostino Tarsitano
Marianna Falcone
author_sort Agostino Tarsitano
collection DOAJ
description We propose a new method of single imputation, reconstruction, and estimation of nonreported, incorrect, implausible, or excluded values in more than one field of the record. In particular, we will be concerned with data sets involving a mixture of numeric, ordinal, binary, and categorical variables. Our technique is a variation of the popular nearest neighbor hot deck imputation (NNHDI) where “nearest” is defined in terms of a global distance obtained as a convex combination of the distance matrices computed for the various types of variables. We address the problem of proper weighting of the partial distance matrices in order to reflect their significance, reliability, and statistical adequacy. Performance of several weighting schemes is compared under a variety of settings in coordination with imputation of the least power mean of the Box-Cox transformation applied to the values of the donors. Through analysis of simulated and actual data sets, we will show that this approach is appropriate. Our main contribution has been to demonstrate that mixed data may optimally be combined to allow the accurate reconstruction of missing values in the target variable even when some data are absent from the other fields of the record.
format Article
id doaj-art-346f4d96d64e44d4a390c91c0b0fbce8
institution OA Journals
issn 1687-952X
1687-9538
language English
publishDate 2011-01-01
publisher Wiley
record_format Article
series Journal of Probability and Statistics
spelling doaj-art-346f4d96d64e44d4a390c91c0b0fbce82025-08-20T02:24:04ZengWileyJournal of Probability and Statistics1687-952X1687-95382011-01-01201110.1155/2011/290380290380Missing-Values Adjustment for Mixed-Type DataAgostino Tarsitano0Marianna Falcone1Dipartimento di Economia e Statistica, Università della Calabria, Via Pietro Bucci, Cubo 1C, 87036 Rende (Cosenza), ItalyDipartimento di Economia e Statistica, Università della Calabria, Via Pietro Bucci, Cubo 1C, 87036 Rende (Cosenza), ItalyWe propose a new method of single imputation, reconstruction, and estimation of nonreported, incorrect, implausible, or excluded values in more than one field of the record. In particular, we will be concerned with data sets involving a mixture of numeric, ordinal, binary, and categorical variables. Our technique is a variation of the popular nearest neighbor hot deck imputation (NNHDI) where “nearest” is defined in terms of a global distance obtained as a convex combination of the distance matrices computed for the various types of variables. We address the problem of proper weighting of the partial distance matrices in order to reflect their significance, reliability, and statistical adequacy. Performance of several weighting schemes is compared under a variety of settings in coordination with imputation of the least power mean of the Box-Cox transformation applied to the values of the donors. Through analysis of simulated and actual data sets, we will show that this approach is appropriate. Our main contribution has been to demonstrate that mixed data may optimally be combined to allow the accurate reconstruction of missing values in the target variable even when some data are absent from the other fields of the record.http://dx.doi.org/10.1155/2011/290380
spellingShingle Agostino Tarsitano
Marianna Falcone
Missing-Values Adjustment for Mixed-Type Data
Journal of Probability and Statistics
title Missing-Values Adjustment for Mixed-Type Data
title_full Missing-Values Adjustment for Mixed-Type Data
title_fullStr Missing-Values Adjustment for Mixed-Type Data
title_full_unstemmed Missing-Values Adjustment for Mixed-Type Data
title_short Missing-Values Adjustment for Mixed-Type Data
title_sort missing values adjustment for mixed type data
url http://dx.doi.org/10.1155/2011/290380
work_keys_str_mv AT agostinotarsitano missingvaluesadjustmentformixedtypedata
AT mariannafalcone missingvaluesadjustmentformixedtypedata