Effects of missing data imputation methods on univariate blood pressure time series data analysis and forecasting with ARIMA and LSTM

Abstract Background Missing observations within the univariate time series are common in real-life and cause analytical problems in the flow of the analysis. Imputation of missing values is an inevitable step in every incomplete univariate time series. Most of the existing studies focus on comparing...

Full description

Saved in:
Bibliographic Details
Main Authors: Nicholas Niako, Jesus D. Melgarejo, Gladys E. Maestre, Kristina P. Vatcheva
Format: Article
Language:English
Published: BMC 2024-12-01
Series:BMC Medical Research Methodology
Subjects:
Online Access:https://doi.org/10.1186/s12874-024-02448-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846101080729976832
author Nicholas Niako
Jesus D. Melgarejo
Gladys E. Maestre
Kristina P. Vatcheva
author_facet Nicholas Niako
Jesus D. Melgarejo
Gladys E. Maestre
Kristina P. Vatcheva
author_sort Nicholas Niako
collection DOAJ
description Abstract Background Missing observations within the univariate time series are common in real-life and cause analytical problems in the flow of the analysis. Imputation of missing values is an inevitable step in every incomplete univariate time series. Most of the existing studies focus on comparing the distributions of imputed data. There is a gap of knowledge on how different imputation methods for univariate time series affect the forecasting performance of time series models. We evaluated the prediction performance of autoregressive integrated moving average (ARIMA) and long short-term memory (LSTM) network models on imputed time series data using ten different imputation techniques. Methods Missing values were generated under missing completely at random (MCAR) mechanism at 10%, 15%, 25%, and 35% rates of missingness using complete data of 24-h ambulatory diastolic blood pressure readings. The performance of the mean, Kalman filtering, linear, spline, and Stineman interpolations, exponentially weighted moving average (EWMA), simple moving average (SMA), k-nearest neighborhood (KNN), and last-observation-carried-forward (LOCF) imputation techniques on the time series structure and the prediction performance of the LSTM and ARIMA models were compared on imputed and original data. Results All imputation techniques either increased or decreased the data autocorrelation and with this affected the forecasting performance of the ARIMA and LSTM algorithms. The best imputation technique did not guarantee better predictions obtained on the imputed data. The mean imputation, LOCF, KNN, Stineman, and cubic spline interpolations methods performed better for a small rate of missingness. Interpolation with EWMA and Kalman filtering yielded consistent performances across all scenarios of missingness. Disregarding the imputation methods, the LSTM resulted with a slightly better predictive accuracy among the best performing ARIMA and LSTM models; otherwise, the results varied. In our small sample, ARIMA tended to perform better on data with higher autocorrelation. Conclusions We recommend to the researchers that they consider Kalman smoothing techniques, interpolation techniques (linear, spline, and Stineman), moving average techniques (SMA and EWMA) for imputing univariate time series data as they perform well on both data distribution and forecasting with ARIMA and LSTM models. The LSTM slightly outperforms ARIMA models, however, for small samples, ARIMA is simpler and faster to execute.
format Article
id doaj-art-d64214ffafd64f3988d465fa8fcf1807
institution Kabale University
issn 1471-2288
language English
publishDate 2024-12-01
publisher BMC
record_format Article
series BMC Medical Research Methodology
spelling doaj-art-d64214ffafd64f3988d465fa8fcf18072024-12-29T12:37:13ZengBMCBMC Medical Research Methodology1471-22882024-12-0124113210.1186/s12874-024-02448-3Effects of missing data imputation methods on univariate blood pressure time series data analysis and forecasting with ARIMA and LSTMNicholas Niako0Jesus D. Melgarejo1Gladys E. Maestre2Kristina P. Vatcheva3School of Mathematical & Statistical Sciences, University of Texas Rio Grande Valley, One West University BoulevardRio Grande Valley Alzheimer’s Disease Resource Center for Minority Aging Research (RGV AD-RCMAR), The University of Texas Rio Grande Valley School of MedicineRio Grande Valley Alzheimer’s Disease Resource Center for Minority Aging Research (RGV AD-RCMAR), The University of Texas Rio Grande Valley School of MedicineSchool of Mathematical & Statistical Sciences, University of Texas Rio Grande Valley, One West University BoulevardAbstract Background Missing observations within the univariate time series are common in real-life and cause analytical problems in the flow of the analysis. Imputation of missing values is an inevitable step in every incomplete univariate time series. Most of the existing studies focus on comparing the distributions of imputed data. There is a gap of knowledge on how different imputation methods for univariate time series affect the forecasting performance of time series models. We evaluated the prediction performance of autoregressive integrated moving average (ARIMA) and long short-term memory (LSTM) network models on imputed time series data using ten different imputation techniques. Methods Missing values were generated under missing completely at random (MCAR) mechanism at 10%, 15%, 25%, and 35% rates of missingness using complete data of 24-h ambulatory diastolic blood pressure readings. The performance of the mean, Kalman filtering, linear, spline, and Stineman interpolations, exponentially weighted moving average (EWMA), simple moving average (SMA), k-nearest neighborhood (KNN), and last-observation-carried-forward (LOCF) imputation techniques on the time series structure and the prediction performance of the LSTM and ARIMA models were compared on imputed and original data. Results All imputation techniques either increased or decreased the data autocorrelation and with this affected the forecasting performance of the ARIMA and LSTM algorithms. The best imputation technique did not guarantee better predictions obtained on the imputed data. The mean imputation, LOCF, KNN, Stineman, and cubic spline interpolations methods performed better for a small rate of missingness. Interpolation with EWMA and Kalman filtering yielded consistent performances across all scenarios of missingness. Disregarding the imputation methods, the LSTM resulted with a slightly better predictive accuracy among the best performing ARIMA and LSTM models; otherwise, the results varied. In our small sample, ARIMA tended to perform better on data with higher autocorrelation. Conclusions We recommend to the researchers that they consider Kalman smoothing techniques, interpolation techniques (linear, spline, and Stineman), moving average techniques (SMA and EWMA) for imputing univariate time series data as they perform well on both data distribution and forecasting with ARIMA and LSTM models. The LSTM slightly outperforms ARIMA models, however, for small samples, ARIMA is simpler and faster to execute.https://doi.org/10.1186/s12874-024-02448-3Univariate time seriesMissing data imputationARIMALSTMForecastingAmbulatory blood pressure
spellingShingle Nicholas Niako
Jesus D. Melgarejo
Gladys E. Maestre
Kristina P. Vatcheva
Effects of missing data imputation methods on univariate blood pressure time series data analysis and forecasting with ARIMA and LSTM
BMC Medical Research Methodology
Univariate time series
Missing data imputation
ARIMA
LSTM
Forecasting
Ambulatory blood pressure
title Effects of missing data imputation methods on univariate blood pressure time series data analysis and forecasting with ARIMA and LSTM
title_full Effects of missing data imputation methods on univariate blood pressure time series data analysis and forecasting with ARIMA and LSTM
title_fullStr Effects of missing data imputation methods on univariate blood pressure time series data analysis and forecasting with ARIMA and LSTM
title_full_unstemmed Effects of missing data imputation methods on univariate blood pressure time series data analysis and forecasting with ARIMA and LSTM
title_short Effects of missing data imputation methods on univariate blood pressure time series data analysis and forecasting with ARIMA and LSTM
title_sort effects of missing data imputation methods on univariate blood pressure time series data analysis and forecasting with arima and lstm
topic Univariate time series
Missing data imputation
ARIMA
LSTM
Forecasting
Ambulatory blood pressure
url https://doi.org/10.1186/s12874-024-02448-3
work_keys_str_mv AT nicholasniako effectsofmissingdataimputationmethodsonunivariatebloodpressuretimeseriesdataanalysisandforecastingwitharimaandlstm
AT jesusdmelgarejo effectsofmissingdataimputationmethodsonunivariatebloodpressuretimeseriesdataanalysisandforecastingwitharimaandlstm
AT gladysemaestre effectsofmissingdataimputationmethodsonunivariatebloodpressuretimeseriesdataanalysisandforecastingwitharimaandlstm
AT kristinapvatcheva effectsofmissingdataimputationmethodsonunivariatebloodpressuretimeseriesdataanalysisandforecastingwitharimaandlstm