Efficient Multiple Imputation for Diverse Data in Python and R: MIDASpy and rMIDAS

This paper introduces software packages for efficiently imputing missing data using deep learning methods in Python (MIDASpy) and R (rMIDAS). The packages implement a recently developed approach to multiple imputation known as MIDAS, which involves introducing additional missing values into the dat...

Full description

Saved in:
Bibliographic Details
Main Authors: Ranjit Lall, Thomas Robinson
Format: Article
Language:English
Published: Foundation for Open Access Statistics 2023-10-01
Series:Journal of Statistical Software
Subjects:
Online Access:https://www.jstatsoft.org/index.php/jss/article/view/4379
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850102325402664960
author Ranjit Lall
Thomas Robinson
author_facet Ranjit Lall
Thomas Robinson
author_sort Ranjit Lall
collection DOAJ
description This paper introduces software packages for efficiently imputing missing data using deep learning methods in Python (MIDASpy) and R (rMIDAS). The packages implement a recently developed approach to multiple imputation known as MIDAS, which involves introducing additional missing values into the dataset, attempting to reconstruct these values with a type of unsupervised neural network known as a denoising autoencoder, and using the resulting model to draw imputations of originally missing data. These steps are executed by a fast and flexible algorithm that expands both the quantity and the range of data that can be analyzed with multiple imputation. To help users optimize the algorithm for their particular application, MIDASpy and rMIDAS offer a host of user-friendly tools for calibrating and validating the imputation model. We provide a detailed guide to these functionalities and demonstrate their usage on a large real dataset.
format Article
id doaj-art-126f3c2b46284d60b714aedaea02dc19
institution DOAJ
issn 1548-7660
language English
publishDate 2023-10-01
publisher Foundation for Open Access Statistics
record_format Article
series Journal of Statistical Software
spelling doaj-art-126f3c2b46284d60b714aedaea02dc192025-08-20T02:39:47ZengFoundation for Open Access StatisticsJournal of Statistical Software1548-76602023-10-01107110.18637/jss.v107.i09Efficient Multiple Imputation for Diverse Data in Python and R: MIDASpy and rMIDASRanjit Lall0Thomas Robinson1University of OxfordLondon School of Economics and Political Science This paper introduces software packages for efficiently imputing missing data using deep learning methods in Python (MIDASpy) and R (rMIDAS). The packages implement a recently developed approach to multiple imputation known as MIDAS, which involves introducing additional missing values into the dataset, attempting to reconstruct these values with a type of unsupervised neural network known as a denoising autoencoder, and using the resulting model to draw imputations of originally missing data. These steps are executed by a fast and flexible algorithm that expands both the quantity and the range of data that can be analyzed with multiple imputation. To help users optimize the algorithm for their particular application, MIDASpy and rMIDAS offer a host of user-friendly tools for calibrating and validating the imputation model. We provide a detailed guide to these functionalities and demonstrate their usage on a large real dataset. https://www.jstatsoft.org/index.php/jss/article/view/4379missing datamultiple imputationmachine learningPythonR
spellingShingle Ranjit Lall
Thomas Robinson
Efficient Multiple Imputation for Diverse Data in Python and R: MIDASpy and rMIDAS
Journal of Statistical Software
missing data
multiple imputation
machine learning
Python
R
title Efficient Multiple Imputation for Diverse Data in Python and R: MIDASpy and rMIDAS
title_full Efficient Multiple Imputation for Diverse Data in Python and R: MIDASpy and rMIDAS
title_fullStr Efficient Multiple Imputation for Diverse Data in Python and R: MIDASpy and rMIDAS
title_full_unstemmed Efficient Multiple Imputation for Diverse Data in Python and R: MIDASpy and rMIDAS
title_short Efficient Multiple Imputation for Diverse Data in Python and R: MIDASpy and rMIDAS
title_sort efficient multiple imputation for diverse data in python and r midaspy and rmidas
topic missing data
multiple imputation
machine learning
Python
R
url https://www.jstatsoft.org/index.php/jss/article/view/4379
work_keys_str_mv AT ranjitlall efficientmultipleimputationfordiversedatainpythonandrmidaspyandrmidas
AT thomasrobinson efficientmultipleimputationfordiversedatainpythonandrmidaspyandrmidas