Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

<p><strong>Background</strong></p><p>The purpose of this simulation study is to assess the performance of multiple imputation compared to complete case analysis when assumptions of missing data mechanisms are violated.</p><p><strong>Methods</strong&...

Full description

Saved in:
Bibliographic Details
Main Authors: Sander MJ van Kuijk, Wolfgang Viechtbauer, Louis L Peeters, Luc Smits
Format: Article
Language:English
Published: Milano University Press 2016-03-01
Series:Epidemiology, Biostatistics and Public Health
Online Access:http://ebph.it/article/view/11598
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850246398239309824
author Sander MJ van Kuijk
Wolfgang Viechtbauer
Louis L Peeters
Luc Smits
author_facet Sander MJ van Kuijk
Wolfgang Viechtbauer
Louis L Peeters
Luc Smits
author_sort Sander MJ van Kuijk
collection DOAJ
description <p><strong>Background</strong></p><p>The purpose of this simulation study is to assess the performance of multiple imputation compared to complete case analysis when assumptions of missing data mechanisms are violated.</p><p><strong>Methods</strong></p><p>The authors performed a stochastic simulation study to assess the performance of Complete Case (CC) analysis and Multiple Imputation (MI) with different missing data mechanisms (missing completely at random (MCAR), at random (MAR), and not at random (MNAR)). The study focused on the point estimation of regression coefficients and standard errors.</p><p><strong>Results</strong></p><p>When data were MAR conditional on Y, CC analysis resulted in biased regression coefficients; they were all underestimated in our scenarios. In these scenarios, analysis after MI gave correct estimates. Yet, in case of MNAR MI yielded biased regression coefficients, while CC analysis performed well.<br /><strong>Conclusion</strong></p>The authors demonstrated that MI was only superior to CC analysis in case of MCAR or MAR. In some scenarios CC may be superior over MI. Often it is not feasible to identify the reason why data in a given dataset are missing. Therefore, emphasis should be put on reporting the extent of missing values, the method used to address them, and the assumptions that were made about the mechanism that caused missing data.
format Article
id doaj-art-0734afceec1342f29aa429577b72778d
institution OA Journals
issn 2282-0930
language English
publishDate 2016-03-01
publisher Milano University Press
record_format Article
series Epidemiology, Biostatistics and Public Health
spelling doaj-art-0734afceec1342f29aa429577b72778d2025-08-20T01:59:12ZengMilano University PressEpidemiology, Biostatistics and Public Health2282-09302016-03-0113110.2427/1159810616Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation studySander MJ van Kuijk0Wolfgang Viechtbauer1Louis L Peeters2Luc Smits3Department of Clinical Epidemiology and Medical Technology Assessment, Maastricht University Medical Centre, and department of Epidemiology, Maastricht University, Maastricht, the Netherlands.Department of Statistics and Methodology, Maastricht University, Maastricht, the Netherlands.Department of Obstetrics & Gynaecology, University Medical Centre Utrecht, Utrecht, the Netherlands.Department of Epidemiology, Maastricht University, Maastricht, the Netherlands.<p><strong>Background</strong></p><p>The purpose of this simulation study is to assess the performance of multiple imputation compared to complete case analysis when assumptions of missing data mechanisms are violated.</p><p><strong>Methods</strong></p><p>The authors performed a stochastic simulation study to assess the performance of Complete Case (CC) analysis and Multiple Imputation (MI) with different missing data mechanisms (missing completely at random (MCAR), at random (MAR), and not at random (MNAR)). The study focused on the point estimation of regression coefficients and standard errors.</p><p><strong>Results</strong></p><p>When data were MAR conditional on Y, CC analysis resulted in biased regression coefficients; they were all underestimated in our scenarios. In these scenarios, analysis after MI gave correct estimates. Yet, in case of MNAR MI yielded biased regression coefficients, while CC analysis performed well.<br /><strong>Conclusion</strong></p>The authors demonstrated that MI was only superior to CC analysis in case of MCAR or MAR. In some scenarios CC may be superior over MI. Often it is not feasible to identify the reason why data in a given dataset are missing. Therefore, emphasis should be put on reporting the extent of missing values, the method used to address them, and the assumptions that were made about the mechanism that caused missing data.http://ebph.it/article/view/11598
spellingShingle Sander MJ van Kuijk
Wolfgang Viechtbauer
Louis L Peeters
Luc Smits
Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study
Epidemiology, Biostatistics and Public Health
title Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study
title_full Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study
title_fullStr Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study
title_full_unstemmed Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study
title_short Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study
title_sort bias in regression coefficient estimates when assumptions for handling missing data are violated a simulation study
url http://ebph.it/article/view/11598
work_keys_str_mv AT sandermjvankuijk biasinregressioncoefficientestimateswhenassumptionsforhandlingmissingdataareviolatedasimulationstudy
AT wolfgangviechtbauer biasinregressioncoefficientestimateswhenassumptionsforhandlingmissingdataareviolatedasimulationstudy
AT louislpeeters biasinregressioncoefficientestimateswhenassumptionsforhandlingmissingdataareviolatedasimulationstudy
AT lucsmits biasinregressioncoefficientestimateswhenassumptionsforhandlingmissingdataareviolatedasimulationstudy