Biplot visualisations of the differences between multiple imputation techniques for simulated categorical data

Abstract Proper handling of missing data is a necessity for all data driven research. Multiple imputation is considered as a superior approach to handle missing data. This manuscript compares four ready-to-use R packages for multiple imputation of missing multivariate categorical data. The selected...

Full description

Saved in:
Bibliographic Details
Main Author: Johané Nienkemper-Swanepoel
Format: Article
Language:English
Published: Springer 2025-07-01
Series:Discover Data
Subjects:
Online Access:https://doi.org/10.1007/s44248-025-00063-1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849331592137277440
author Johané Nienkemper-Swanepoel
author_facet Johané Nienkemper-Swanepoel
author_sort Johané Nienkemper-Swanepoel
collection DOAJ
description Abstract Proper handling of missing data is a necessity for all data driven research. Multiple imputation is considered as a superior approach to handle missing data. This manuscript compares four ready-to-use R packages for multiple imputation of missing multivariate categorical data. The selected methods provide a variety of approaches to investigate the possible effect of congenial imputation and analysis models when compared to other imputation methods. The focus is on the evaluation of multivariate visualisations of multiple imputation techniques, by specifically using multiple correspondence analysis biplots. Simulated multivariate categorical data sets are used to compare the visualisations of complete and incomplete biplot representations. An unbiased unified visualisation method, the GPAbin biplot, is used to obtain a combined multivariate visualisation of multiple imputed data sets. This visualisation approach combines configurations by means of generalised orthogonal Procrustes analysis (GPA) and applying Rubin's rules (-bin) on the aligned configurations. Biplot visualisation enables the investigation of associations of samples and variables by evaluating the discerning patterns that arise due to the proximities of the coordinates. Differences between the visualisations of the various multiple imputation strategies can provide guidance on the suitability of the chosen imputation methods. Evaluation measures related to the distances between coordinates in the biplots are used to compare the visualisations and establish the performance of the four imputation methods. This manuscript shows how relevant visualisations can provide insight and an intuition on the appropriateness of the applied imputation approach. The findings will guide users to select an appropriate multiple imputation strategy based on the underlying data characteristics.
format Article
id doaj-art-1efab0f3d8ff4a86ae1c21bb2b93fdfb
institution Kabale University
issn 2731-6955
language English
publishDate 2025-07-01
publisher Springer
record_format Article
series Discover Data
spelling doaj-art-1efab0f3d8ff4a86ae1c21bb2b93fdfb2025-08-20T03:46:29ZengSpringerDiscover Data2731-69552025-07-013111910.1007/s44248-025-00063-1Biplot visualisations of the differences between multiple imputation techniques for simulated categorical dataJohané Nienkemper-Swanepoel0Centre for Multi-Dimensional Data Visualisation (MuViSU), Department of Statistics and Actuarial Science, Stellenbosch UniversityAbstract Proper handling of missing data is a necessity for all data driven research. Multiple imputation is considered as a superior approach to handle missing data. This manuscript compares four ready-to-use R packages for multiple imputation of missing multivariate categorical data. The selected methods provide a variety of approaches to investigate the possible effect of congenial imputation and analysis models when compared to other imputation methods. The focus is on the evaluation of multivariate visualisations of multiple imputation techniques, by specifically using multiple correspondence analysis biplots. Simulated multivariate categorical data sets are used to compare the visualisations of complete and incomplete biplot representations. An unbiased unified visualisation method, the GPAbin biplot, is used to obtain a combined multivariate visualisation of multiple imputed data sets. This visualisation approach combines configurations by means of generalised orthogonal Procrustes analysis (GPA) and applying Rubin's rules (-bin) on the aligned configurations. Biplot visualisation enables the investigation of associations of samples and variables by evaluating the discerning patterns that arise due to the proximities of the coordinates. Differences between the visualisations of the various multiple imputation strategies can provide guidance on the suitability of the chosen imputation methods. Evaluation measures related to the distances between coordinates in the biplots are used to compare the visualisations and establish the performance of the four imputation methods. This manuscript shows how relevant visualisations can provide insight and an intuition on the appropriateness of the applied imputation approach. The findings will guide users to select an appropriate multiple imputation strategy based on the underlying data characteristics.https://doi.org/10.1007/s44248-025-00063-1BiplotsGPAbinGeneralised orthogonal Procrustes analysisMultiple imputationMultiple correspondence analysis
spellingShingle Johané Nienkemper-Swanepoel
Biplot visualisations of the differences between multiple imputation techniques for simulated categorical data
Discover Data
Biplots
GPAbin
Generalised orthogonal Procrustes analysis
Multiple imputation
Multiple correspondence analysis
title Biplot visualisations of the differences between multiple imputation techniques for simulated categorical data
title_full Biplot visualisations of the differences between multiple imputation techniques for simulated categorical data
title_fullStr Biplot visualisations of the differences between multiple imputation techniques for simulated categorical data
title_full_unstemmed Biplot visualisations of the differences between multiple imputation techniques for simulated categorical data
title_short Biplot visualisations of the differences between multiple imputation techniques for simulated categorical data
title_sort biplot visualisations of the differences between multiple imputation techniques for simulated categorical data
topic Biplots
GPAbin
Generalised orthogonal Procrustes analysis
Multiple imputation
Multiple correspondence analysis
url https://doi.org/10.1007/s44248-025-00063-1
work_keys_str_mv AT johanenienkemperswanepoel biplotvisualisationsofthedifferencesbetweenmultipleimputationtechniquesforsimulatedcategoricaldata