Unsupervised data imputation with multiple importance sampling variational autoencoders

Abstract Recently, deep latent variable models have made significant progress in dealing with missing data problems, benefiting from their ability to capture intricate and non-linear relationships within the data. In this work, we further investigate the potential of Variational Autoencoders (VAEs)...

Full description

Saved in:

Bibliographic Details
Main Authors:	Shenfen Kuang, Yewen Huang, Jie Song
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-01-01
Series:	Scientific Reports
Subjects:	Missing data Variational autoencoders Multiple importance sampling Resampling
Online Access:	https://doi.org/10.1038/s41598-025-87641-0
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832571809679015936
author	Shenfen Kuang Yewen Huang Jie Song
author_facet	Shenfen Kuang Yewen Huang Jie Song
author_sort	Shenfen Kuang
collection	DOAJ
description	Abstract Recently, deep latent variable models have made significant progress in dealing with missing data problems, benefiting from their ability to capture intricate and non-linear relationships within the data. In this work, we further investigate the potential of Variational Autoencoders (VAEs) in addressing the uncertainty associated with missing data via a multiple importance sampling strategy. We propose a Missing data Multiple Importance Sampling Variational Auto-Encoder (MMISVAE) method to effectively model incomplete data. Our approach consists of a learning step and an imputation step. During the learning step, the mixture components are represented by multiple separate encoder networks, which are later combined through simple averaging to enhance the latent representation capabilities of the VAEs when dealing with incomplete data. The statistical model and variational distributions are iteratively updated by maximizing the Multiple Importance Sampling Evidence Lower Bound (MISELBO) on the joint log-likelihood. In the imputation step, missing data is estimated using conditional expectation through multiple importance resampling. We propose an efficient imputation algorithm that broadens the scope of Missing data Importance Weighted Auto-Encoder (MIWAE) by incorporating multiple proposal probability distributions and the resampling schema. One notable characteristic of our method is the complete unsupervised nature of both the learning and imputation processes. Through comprehensive experimental analysis, we present evidence of the effectiveness of our method in improving the imputation accuracy of incomplete data when compared to current state-of-the-art VAEs-based methods.
format	Article
id	doaj-art-a28e1acd67334d0b871b2a10db7baf8c
institution	Kabale University
issn	2045-2322
language	English
publishDate	2025-01-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-a28e1acd67334d0b871b2a10db7baf8c2025-02-02T12:19:41ZengNature PortfolioScientific Reports2045-23222025-01-0115111610.1038/s41598-025-87641-0Unsupervised data imputation with multiple importance sampling variational autoencodersShenfen Kuang0Yewen Huang1Jie Song2School of Mathematics and Statistics, Shaoguan UniversitySchool of Electronics and Information, Guangdong Polytechnic Normal UniversitySchool of Mathematics and Statistics, Shaoguan UniversityAbstract Recently, deep latent variable models have made significant progress in dealing with missing data problems, benefiting from their ability to capture intricate and non-linear relationships within the data. In this work, we further investigate the potential of Variational Autoencoders (VAEs) in addressing the uncertainty associated with missing data via a multiple importance sampling strategy. We propose a Missing data Multiple Importance Sampling Variational Auto-Encoder (MMISVAE) method to effectively model incomplete data. Our approach consists of a learning step and an imputation step. During the learning step, the mixture components are represented by multiple separate encoder networks, which are later combined through simple averaging to enhance the latent representation capabilities of the VAEs when dealing with incomplete data. The statistical model and variational distributions are iteratively updated by maximizing the Multiple Importance Sampling Evidence Lower Bound (MISELBO) on the joint log-likelihood. In the imputation step, missing data is estimated using conditional expectation through multiple importance resampling. We propose an efficient imputation algorithm that broadens the scope of Missing data Importance Weighted Auto-Encoder (MIWAE) by incorporating multiple proposal probability distributions and the resampling schema. One notable characteristic of our method is the complete unsupervised nature of both the learning and imputation processes. Through comprehensive experimental analysis, we present evidence of the effectiveness of our method in improving the imputation accuracy of incomplete data when compared to current state-of-the-art VAEs-based methods.https://doi.org/10.1038/s41598-025-87641-0Missing dataVariational autoencodersMultiple importance samplingResampling
spellingShingle	Shenfen Kuang Yewen Huang Jie Song Unsupervised data imputation with multiple importance sampling variational autoencoders Scientific Reports Missing data Variational autoencoders Multiple importance sampling Resampling
title	Unsupervised data imputation with multiple importance sampling variational autoencoders
title_full	Unsupervised data imputation with multiple importance sampling variational autoencoders
title_fullStr	Unsupervised data imputation with multiple importance sampling variational autoencoders
title_full_unstemmed	Unsupervised data imputation with multiple importance sampling variational autoencoders
title_short	Unsupervised data imputation with multiple importance sampling variational autoencoders
title_sort	unsupervised data imputation with multiple importance sampling variational autoencoders
topic	Missing data Variational autoencoders Multiple importance sampling Resampling
url	https://doi.org/10.1038/s41598-025-87641-0
work_keys_str_mv	AT shenfenkuang unsuperviseddataimputationwithmultipleimportancesamplingvariationalautoencoders AT yewenhuang unsuperviseddataimputationwithmultipleimportancesamplingvariationalautoencoders AT jiesong unsuperviseddataimputationwithmultipleimportancesamplingvariationalautoencoders

Unsupervised data imputation with multiple importance sampling variational autoencoders

Similar Items