Effects of missing data imputation methods on univariate blood pressure time series data analysis and forecasting with ARIMA and LSTM
Abstract Background Missing observations within the univariate time series are common in real-life and cause analytical problems in the flow of the analysis. Imputation of missing values is an inevitable step in every incomplete univariate time series. Most of the existing studies focus on comparing...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2024-12-01
|
| Series: | BMC Medical Research Methodology |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12874-024-02448-3 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846101080729976832 |
|---|---|
| author | Nicholas Niako Jesus D. Melgarejo Gladys E. Maestre Kristina P. Vatcheva |
| author_facet | Nicholas Niako Jesus D. Melgarejo Gladys E. Maestre Kristina P. Vatcheva |
| author_sort | Nicholas Niako |
| collection | DOAJ |
| description | Abstract Background Missing observations within the univariate time series are common in real-life and cause analytical problems in the flow of the analysis. Imputation of missing values is an inevitable step in every incomplete univariate time series. Most of the existing studies focus on comparing the distributions of imputed data. There is a gap of knowledge on how different imputation methods for univariate time series affect the forecasting performance of time series models. We evaluated the prediction performance of autoregressive integrated moving average (ARIMA) and long short-term memory (LSTM) network models on imputed time series data using ten different imputation techniques. Methods Missing values were generated under missing completely at random (MCAR) mechanism at 10%, 15%, 25%, and 35% rates of missingness using complete data of 24-h ambulatory diastolic blood pressure readings. The performance of the mean, Kalman filtering, linear, spline, and Stineman interpolations, exponentially weighted moving average (EWMA), simple moving average (SMA), k-nearest neighborhood (KNN), and last-observation-carried-forward (LOCF) imputation techniques on the time series structure and the prediction performance of the LSTM and ARIMA models were compared on imputed and original data. Results All imputation techniques either increased or decreased the data autocorrelation and with this affected the forecasting performance of the ARIMA and LSTM algorithms. The best imputation technique did not guarantee better predictions obtained on the imputed data. The mean imputation, LOCF, KNN, Stineman, and cubic spline interpolations methods performed better for a small rate of missingness. Interpolation with EWMA and Kalman filtering yielded consistent performances across all scenarios of missingness. Disregarding the imputation methods, the LSTM resulted with a slightly better predictive accuracy among the best performing ARIMA and LSTM models; otherwise, the results varied. In our small sample, ARIMA tended to perform better on data with higher autocorrelation. Conclusions We recommend to the researchers that they consider Kalman smoothing techniques, interpolation techniques (linear, spline, and Stineman), moving average techniques (SMA and EWMA) for imputing univariate time series data as they perform well on both data distribution and forecasting with ARIMA and LSTM models. The LSTM slightly outperforms ARIMA models, however, for small samples, ARIMA is simpler and faster to execute. |
| format | Article |
| id | doaj-art-d64214ffafd64f3988d465fa8fcf1807 |
| institution | Kabale University |
| issn | 1471-2288 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | BMC |
| record_format | Article |
| series | BMC Medical Research Methodology |
| spelling | doaj-art-d64214ffafd64f3988d465fa8fcf18072024-12-29T12:37:13ZengBMCBMC Medical Research Methodology1471-22882024-12-0124113210.1186/s12874-024-02448-3Effects of missing data imputation methods on univariate blood pressure time series data analysis and forecasting with ARIMA and LSTMNicholas Niako0Jesus D. Melgarejo1Gladys E. Maestre2Kristina P. Vatcheva3School of Mathematical & Statistical Sciences, University of Texas Rio Grande Valley, One West University BoulevardRio Grande Valley Alzheimer’s Disease Resource Center for Minority Aging Research (RGV AD-RCMAR), The University of Texas Rio Grande Valley School of MedicineRio Grande Valley Alzheimer’s Disease Resource Center for Minority Aging Research (RGV AD-RCMAR), The University of Texas Rio Grande Valley School of MedicineSchool of Mathematical & Statistical Sciences, University of Texas Rio Grande Valley, One West University BoulevardAbstract Background Missing observations within the univariate time series are common in real-life and cause analytical problems in the flow of the analysis. Imputation of missing values is an inevitable step in every incomplete univariate time series. Most of the existing studies focus on comparing the distributions of imputed data. There is a gap of knowledge on how different imputation methods for univariate time series affect the forecasting performance of time series models. We evaluated the prediction performance of autoregressive integrated moving average (ARIMA) and long short-term memory (LSTM) network models on imputed time series data using ten different imputation techniques. Methods Missing values were generated under missing completely at random (MCAR) mechanism at 10%, 15%, 25%, and 35% rates of missingness using complete data of 24-h ambulatory diastolic blood pressure readings. The performance of the mean, Kalman filtering, linear, spline, and Stineman interpolations, exponentially weighted moving average (EWMA), simple moving average (SMA), k-nearest neighborhood (KNN), and last-observation-carried-forward (LOCF) imputation techniques on the time series structure and the prediction performance of the LSTM and ARIMA models were compared on imputed and original data. Results All imputation techniques either increased or decreased the data autocorrelation and with this affected the forecasting performance of the ARIMA and LSTM algorithms. The best imputation technique did not guarantee better predictions obtained on the imputed data. The mean imputation, LOCF, KNN, Stineman, and cubic spline interpolations methods performed better for a small rate of missingness. Interpolation with EWMA and Kalman filtering yielded consistent performances across all scenarios of missingness. Disregarding the imputation methods, the LSTM resulted with a slightly better predictive accuracy among the best performing ARIMA and LSTM models; otherwise, the results varied. In our small sample, ARIMA tended to perform better on data with higher autocorrelation. Conclusions We recommend to the researchers that they consider Kalman smoothing techniques, interpolation techniques (linear, spline, and Stineman), moving average techniques (SMA and EWMA) for imputing univariate time series data as they perform well on both data distribution and forecasting with ARIMA and LSTM models. The LSTM slightly outperforms ARIMA models, however, for small samples, ARIMA is simpler and faster to execute.https://doi.org/10.1186/s12874-024-02448-3Univariate time seriesMissing data imputationARIMALSTMForecastingAmbulatory blood pressure |
| spellingShingle | Nicholas Niako Jesus D. Melgarejo Gladys E. Maestre Kristina P. Vatcheva Effects of missing data imputation methods on univariate blood pressure time series data analysis and forecasting with ARIMA and LSTM BMC Medical Research Methodology Univariate time series Missing data imputation ARIMA LSTM Forecasting Ambulatory blood pressure |
| title | Effects of missing data imputation methods on univariate blood pressure time series data analysis and forecasting with ARIMA and LSTM |
| title_full | Effects of missing data imputation methods on univariate blood pressure time series data analysis and forecasting with ARIMA and LSTM |
| title_fullStr | Effects of missing data imputation methods on univariate blood pressure time series data analysis and forecasting with ARIMA and LSTM |
| title_full_unstemmed | Effects of missing data imputation methods on univariate blood pressure time series data analysis and forecasting with ARIMA and LSTM |
| title_short | Effects of missing data imputation methods on univariate blood pressure time series data analysis and forecasting with ARIMA and LSTM |
| title_sort | effects of missing data imputation methods on univariate blood pressure time series data analysis and forecasting with arima and lstm |
| topic | Univariate time series Missing data imputation ARIMA LSTM Forecasting Ambulatory blood pressure |
| url | https://doi.org/10.1186/s12874-024-02448-3 |
| work_keys_str_mv | AT nicholasniako effectsofmissingdataimputationmethodsonunivariatebloodpressuretimeseriesdataanalysisandforecastingwitharimaandlstm AT jesusdmelgarejo effectsofmissingdataimputationmethodsonunivariatebloodpressuretimeseriesdataanalysisandforecastingwitharimaandlstm AT gladysemaestre effectsofmissingdataimputationmethodsonunivariatebloodpressuretimeseriesdataanalysisandforecastingwitharimaandlstm AT kristinapvatcheva effectsofmissingdataimputationmethodsonunivariatebloodpressuretimeseriesdataanalysisandforecastingwitharimaandlstm |