Missing-Values Adjustment for Mixed-Type Data
We propose a new method of single imputation, reconstruction, and estimation of nonreported, incorrect, implausible, or excluded values in more than one field of the record. In particular, we will be concerned with data sets involving a mixture of numeric, ordinal, binary, and categorical variables....
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Wiley
2011-01-01
|
| Series: | Journal of Probability and Statistics |
| Online Access: | http://dx.doi.org/10.1155/2011/290380 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850157715241828352 |
|---|---|
| author | Agostino Tarsitano Marianna Falcone |
| author_facet | Agostino Tarsitano Marianna Falcone |
| author_sort | Agostino Tarsitano |
| collection | DOAJ |
| description | We propose a new method of single imputation, reconstruction, and estimation of nonreported, incorrect, implausible, or excluded values in more than one field of the record. In particular, we will be concerned with
data sets involving a mixture of numeric, ordinal, binary, and categorical variables. Our technique is a variation of the popular nearest neighbor hot deck imputation (NNHDI) where “nearest” is defined in terms of a global distance obtained as a convex combination of the distance matrices computed for the various types of variables. We address the problem of proper weighting of the partial distance matrices in order to reflect their significance, reliability, and statistical adequacy. Performance of several weighting schemes is compared under a variety of settings in coordination with imputation of the least power mean of the Box-Cox transformation applied to the values of the donors. Through analysis of simulated and actual data sets, we will show that this approach is appropriate. Our main contribution has been to demonstrate that mixed data may optimally be combined to allow the accurate reconstruction of missing values in the target variable even when some data are absent from the other fields of the record. |
| format | Article |
| id | doaj-art-346f4d96d64e44d4a390c91c0b0fbce8 |
| institution | OA Journals |
| issn | 1687-952X 1687-9538 |
| language | English |
| publishDate | 2011-01-01 |
| publisher | Wiley |
| record_format | Article |
| series | Journal of Probability and Statistics |
| spelling | doaj-art-346f4d96d64e44d4a390c91c0b0fbce82025-08-20T02:24:04ZengWileyJournal of Probability and Statistics1687-952X1687-95382011-01-01201110.1155/2011/290380290380Missing-Values Adjustment for Mixed-Type DataAgostino Tarsitano0Marianna Falcone1Dipartimento di Economia e Statistica, Università della Calabria, Via Pietro Bucci, Cubo 1C, 87036 Rende (Cosenza), ItalyDipartimento di Economia e Statistica, Università della Calabria, Via Pietro Bucci, Cubo 1C, 87036 Rende (Cosenza), ItalyWe propose a new method of single imputation, reconstruction, and estimation of nonreported, incorrect, implausible, or excluded values in more than one field of the record. In particular, we will be concerned with data sets involving a mixture of numeric, ordinal, binary, and categorical variables. Our technique is a variation of the popular nearest neighbor hot deck imputation (NNHDI) where “nearest” is defined in terms of a global distance obtained as a convex combination of the distance matrices computed for the various types of variables. We address the problem of proper weighting of the partial distance matrices in order to reflect their significance, reliability, and statistical adequacy. Performance of several weighting schemes is compared under a variety of settings in coordination with imputation of the least power mean of the Box-Cox transformation applied to the values of the donors. Through analysis of simulated and actual data sets, we will show that this approach is appropriate. Our main contribution has been to demonstrate that mixed data may optimally be combined to allow the accurate reconstruction of missing values in the target variable even when some data are absent from the other fields of the record.http://dx.doi.org/10.1155/2011/290380 |
| spellingShingle | Agostino Tarsitano Marianna Falcone Missing-Values Adjustment for Mixed-Type Data Journal of Probability and Statistics |
| title | Missing-Values Adjustment for Mixed-Type Data |
| title_full | Missing-Values Adjustment for Mixed-Type Data |
| title_fullStr | Missing-Values Adjustment for Mixed-Type Data |
| title_full_unstemmed | Missing-Values Adjustment for Mixed-Type Data |
| title_short | Missing-Values Adjustment for Mixed-Type Data |
| title_sort | missing values adjustment for mixed type data |
| url | http://dx.doi.org/10.1155/2011/290380 |
| work_keys_str_mv | AT agostinotarsitano missingvaluesadjustmentformixedtypedata AT mariannafalcone missingvaluesadjustmentformixedtypedata |