Predictive interpretable analytics models for forecasting healthcare costs using open healthcare data

Healthcare expenditure, a considerable proportion of national budgets, has risen rapidly. Consequently, considerable research is devoted to controlling healthcare costs. Many efforts are underway to improve medical price transparency. Price transparency will help patients become better informed, all...

Full description

Saved in:
Bibliographic Details
Main Authors: A. Ravishankar Rao, Raunak Jain, Mrityunjai Singh, Rahul Garg
Format: Article
Language:English
Published: Elsevier 2024-12-01
Series:Healthcare Analytics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772442524000534
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850249243707572224
author A. Ravishankar Rao
Raunak Jain
Mrityunjai Singh
Rahul Garg
author_facet A. Ravishankar Rao
Raunak Jain
Mrityunjai Singh
Rahul Garg
author_sort A. Ravishankar Rao
collection DOAJ
description Healthcare expenditure, a considerable proportion of national budgets, has risen rapidly. Consequently, considerable research is devoted to controlling healthcare costs. Many efforts are underway to improve medical price transparency. Price transparency will help patients become better informed, allowing them to shop for care they can afford, eventually leading to efficiency in healthcare markets. This first requires medical pricing data to be made available publicly. Since the raw pricing data can be large and cover multiple conditions, it is necessary to provide an engine to process the data to facilitate its usage and understanding. We recommend creating computational models that predict healthcare costs for various patient conditions and demographics. Patients and providers can interrogate the underlying data to understand the variation of healthcare costs concerning medical conditions and demographic variables of interest, including age. We demonstrate our approach by creating predictive models using recent machine learning techniques. We analyzed anonymous patient data from the New York State Statewide Planning and Research Cooperative System, consisting of 2.34 million records from 2019. We built models to predict costs from over two dozen patient variables, including diagnosis codes, severity of illness, age, and other demographic variables. We investigated three models: regression, decision trees, and random forests. These models are explainable. We analyzed features to determine those that were predictive of total costs. We determined that the diagnosis code, severity of illness, and length of stay were good predictors of total costs, whereas race and gender are not useful in predicting total costs. We obtained the best performance using a catboost regressor, which yielded an R2 score of 0.85, better than the values reported in the literature.
format Article
id doaj-art-e81465635eda411bae3d764b09b98542
institution OA Journals
issn 2772-4425
language English
publishDate 2024-12-01
publisher Elsevier
record_format Article
series Healthcare Analytics
spelling doaj-art-e81465635eda411bae3d764b09b985422025-08-20T01:58:31ZengElsevierHealthcare Analytics2772-44252024-12-01610035110.1016/j.health.2024.100351Predictive interpretable analytics models for forecasting healthcare costs using open healthcare dataA. Ravishankar Rao0Raunak Jain1Mrityunjai Singh2Rahul Garg3Fairleigh Dickinson University, Teaneck, NJ, USA; Corresponding author.Indian Institute of Technology, Delhi, IndiaIndian Institute of Technology, Delhi, IndiaIndian Institute of Technology, Delhi, IndiaHealthcare expenditure, a considerable proportion of national budgets, has risen rapidly. Consequently, considerable research is devoted to controlling healthcare costs. Many efforts are underway to improve medical price transparency. Price transparency will help patients become better informed, allowing them to shop for care they can afford, eventually leading to efficiency in healthcare markets. This first requires medical pricing data to be made available publicly. Since the raw pricing data can be large and cover multiple conditions, it is necessary to provide an engine to process the data to facilitate its usage and understanding. We recommend creating computational models that predict healthcare costs for various patient conditions and demographics. Patients and providers can interrogate the underlying data to understand the variation of healthcare costs concerning medical conditions and demographic variables of interest, including age. We demonstrate our approach by creating predictive models using recent machine learning techniques. We analyzed anonymous patient data from the New York State Statewide Planning and Research Cooperative System, consisting of 2.34 million records from 2019. We built models to predict costs from over two dozen patient variables, including diagnosis codes, severity of illness, age, and other demographic variables. We investigated three models: regression, decision trees, and random forests. These models are explainable. We analyzed features to determine those that were predictive of total costs. We determined that the diagnosis code, severity of illness, and length of stay were good predictors of total costs, whereas race and gender are not useful in predicting total costs. We obtained the best performance using a catboost regressor, which yielded an R2 score of 0.85, better than the values reported in the literature.http://www.sciencedirect.com/science/article/pii/S2772442524000534Machine learningArtificial intelligencePredictive analyticsHealth informaticsRegressionCatboost
spellingShingle A. Ravishankar Rao
Raunak Jain
Mrityunjai Singh
Rahul Garg
Predictive interpretable analytics models for forecasting healthcare costs using open healthcare data
Healthcare Analytics
Machine learning
Artificial intelligence
Predictive analytics
Health informatics
Regression
Catboost
title Predictive interpretable analytics models for forecasting healthcare costs using open healthcare data
title_full Predictive interpretable analytics models for forecasting healthcare costs using open healthcare data
title_fullStr Predictive interpretable analytics models for forecasting healthcare costs using open healthcare data
title_full_unstemmed Predictive interpretable analytics models for forecasting healthcare costs using open healthcare data
title_short Predictive interpretable analytics models for forecasting healthcare costs using open healthcare data
title_sort predictive interpretable analytics models for forecasting healthcare costs using open healthcare data
topic Machine learning
Artificial intelligence
Predictive analytics
Health informatics
Regression
Catboost
url http://www.sciencedirect.com/science/article/pii/S2772442524000534
work_keys_str_mv AT aravishankarrao predictiveinterpretableanalyticsmodelsforforecastinghealthcarecostsusingopenhealthcaredata
AT raunakjain predictiveinterpretableanalyticsmodelsforforecastinghealthcarecostsusingopenhealthcaredata
AT mrityunjaisingh predictiveinterpretableanalyticsmodelsforforecastinghealthcarecostsusingopenhealthcaredata
AT rahulgarg predictiveinterpretableanalyticsmodelsforforecastinghealthcarecostsusingopenhealthcaredata