Evaluating healthcare quality and inequities using generative AI: a simulation study of potentially inappropriate medication among older adults analyzed via the framework analysis of individual heterogeneity and discriminatory accuracy (AIHDA)

Abstract Background Regular monitoring of healthcare quality and equity is crucial for informing decision-makers and clinicians. This study explores the application of generative AI, more specifically large language models (LLMs), to facilitate standardized monitoring of healthcare quality using the...

Full description

Saved in:
Bibliographic Details
Main Authors: Johan Öberg, Raquel Perez-Vicente, Martin Lindström, Patrik Midlöv, Juan Merlo
Format: Article
Language:English
Published: Springer 2025-07-01
Series:Discover Artificial Intelligence
Subjects:
Online Access:https://doi.org/10.1007/s44163-025-00444-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849764301394411520
author Johan Öberg
Raquel Perez-Vicente
Martin Lindström
Patrik Midlöv
Juan Merlo
author_facet Johan Öberg
Raquel Perez-Vicente
Martin Lindström
Patrik Midlöv
Juan Merlo
author_sort Johan Öberg
collection DOAJ
description Abstract Background Regular monitoring of healthcare quality and equity is crucial for informing decision-makers and clinicians. This study explores the application of generative AI, more specifically large language models (LLMs), to facilitate standardized monitoring of healthcare quality using the established framework Analysis of Individual Heterogeneity and Discriminatory Accuracy (AIHDA). The study investigates whether a customized GPT can effectively apply the AIHDA-framework to assess healthcare quality in a simulated dataset. Population and methods Using simulated data modelled on real-world healthcare information, we evaluated the quality indicator of potentially inappropriate medication (PIM). A customized GPT built on ChatGPT 4o was prompted via the principle TREF (Task, Requirement, Expectation, Format) to perform the analysis. Results were compared to a traditional analysis performed with Stata to evaluate accuracy and reliability. Results The GPT successfully conducted the AIHDA analysis, producing results equal to those of the Stata analysis. The GPT provides useful visualizations and structured reports as well as interactive dialog with the end-user in real-time. However, occasional variations in the results occurred in some iterations of the analysis, highlighting potential issues with reliability. The analysis requires close supervision, as the GPT presents both errors and correct results with confidence. Conclusions Generative AI and LLMs show promise in supporting standardized monitoring of healthcare quality and equity using the AIHDA-framework. It enables accessible analysis but requires oversight to address limitations such as occasional inaccuracies. Future and more reliable models of LLMs and local deployment on secure servers may further enhance the utility for routine healthcare monitoring.
format Article
id doaj-art-6e5d9f2052a84be3bcfd2a7fc2bd974d
institution DOAJ
issn 2731-0809
language English
publishDate 2025-07-01
publisher Springer
record_format Article
series Discover Artificial Intelligence
spelling doaj-art-6e5d9f2052a84be3bcfd2a7fc2bd974d2025-08-20T03:05:10ZengSpringerDiscover Artificial Intelligence2731-08092025-07-015112010.1007/s44163-025-00444-0Evaluating healthcare quality and inequities using generative AI: a simulation study of potentially inappropriate medication among older adults analyzed via the framework analysis of individual heterogeneity and discriminatory accuracy (AIHDA)Johan Öberg0Raquel Perez-Vicente1Martin Lindström2Patrik Midlöv3Juan Merlo4Unit for Social Epidemiology, Faculty of Medicine, Lund UniversityUnit for Social Epidemiology, Faculty of Medicine, Lund UniversityCentre for Primary Health Care Research, Region SkåneCentre for Primary Health Care Research, Region SkåneUnit for Social Epidemiology, Faculty of Medicine, Lund UniversityAbstract Background Regular monitoring of healthcare quality and equity is crucial for informing decision-makers and clinicians. This study explores the application of generative AI, more specifically large language models (LLMs), to facilitate standardized monitoring of healthcare quality using the established framework Analysis of Individual Heterogeneity and Discriminatory Accuracy (AIHDA). The study investigates whether a customized GPT can effectively apply the AIHDA-framework to assess healthcare quality in a simulated dataset. Population and methods Using simulated data modelled on real-world healthcare information, we evaluated the quality indicator of potentially inappropriate medication (PIM). A customized GPT built on ChatGPT 4o was prompted via the principle TREF (Task, Requirement, Expectation, Format) to perform the analysis. Results were compared to a traditional analysis performed with Stata to evaluate accuracy and reliability. Results The GPT successfully conducted the AIHDA analysis, producing results equal to those of the Stata analysis. The GPT provides useful visualizations and structured reports as well as interactive dialog with the end-user in real-time. However, occasional variations in the results occurred in some iterations of the analysis, highlighting potential issues with reliability. The analysis requires close supervision, as the GPT presents both errors and correct results with confidence. Conclusions Generative AI and LLMs show promise in supporting standardized monitoring of healthcare quality and equity using the AIHDA-framework. It enables accessible analysis but requires oversight to address limitations such as occasional inaccuracies. Future and more reliable models of LLMs and local deployment on secure servers may further enhance the utility for routine healthcare monitoring.https://doi.org/10.1007/s44163-025-00444-0Social epidemiologyHealth care quality assessmentHealth services evaluationEpidemiological methods
spellingShingle Johan Öberg
Raquel Perez-Vicente
Martin Lindström
Patrik Midlöv
Juan Merlo
Evaluating healthcare quality and inequities using generative AI: a simulation study of potentially inappropriate medication among older adults analyzed via the framework analysis of individual heterogeneity and discriminatory accuracy (AIHDA)
Discover Artificial Intelligence
Social epidemiology
Health care quality assessment
Health services evaluation
Epidemiological methods
title Evaluating healthcare quality and inequities using generative AI: a simulation study of potentially inappropriate medication among older adults analyzed via the framework analysis of individual heterogeneity and discriminatory accuracy (AIHDA)
title_full Evaluating healthcare quality and inequities using generative AI: a simulation study of potentially inappropriate medication among older adults analyzed via the framework analysis of individual heterogeneity and discriminatory accuracy (AIHDA)
title_fullStr Evaluating healthcare quality and inequities using generative AI: a simulation study of potentially inappropriate medication among older adults analyzed via the framework analysis of individual heterogeneity and discriminatory accuracy (AIHDA)
title_full_unstemmed Evaluating healthcare quality and inequities using generative AI: a simulation study of potentially inappropriate medication among older adults analyzed via the framework analysis of individual heterogeneity and discriminatory accuracy (AIHDA)
title_short Evaluating healthcare quality and inequities using generative AI: a simulation study of potentially inappropriate medication among older adults analyzed via the framework analysis of individual heterogeneity and discriminatory accuracy (AIHDA)
title_sort evaluating healthcare quality and inequities using generative ai a simulation study of potentially inappropriate medication among older adults analyzed via the framework analysis of individual heterogeneity and discriminatory accuracy aihda
topic Social epidemiology
Health care quality assessment
Health services evaluation
Epidemiological methods
url https://doi.org/10.1007/s44163-025-00444-0
work_keys_str_mv AT johanoberg evaluatinghealthcarequalityandinequitiesusinggenerativeaiasimulationstudyofpotentiallyinappropriatemedicationamongolderadultsanalyzedviatheframeworkanalysisofindividualheterogeneityanddiscriminatoryaccuracyaihda
AT raquelperezvicente evaluatinghealthcarequalityandinequitiesusinggenerativeaiasimulationstudyofpotentiallyinappropriatemedicationamongolderadultsanalyzedviatheframeworkanalysisofindividualheterogeneityanddiscriminatoryaccuracyaihda
AT martinlindstrom evaluatinghealthcarequalityandinequitiesusinggenerativeaiasimulationstudyofpotentiallyinappropriatemedicationamongolderadultsanalyzedviatheframeworkanalysisofindividualheterogeneityanddiscriminatoryaccuracyaihda
AT patrikmidlov evaluatinghealthcarequalityandinequitiesusinggenerativeaiasimulationstudyofpotentiallyinappropriatemedicationamongolderadultsanalyzedviatheframeworkanalysisofindividualheterogeneityanddiscriminatoryaccuracyaihda
AT juanmerlo evaluatinghealthcarequalityandinequitiesusinggenerativeaiasimulationstudyofpotentiallyinappropriatemedicationamongolderadultsanalyzedviatheframeworkanalysisofindividualheterogeneityanddiscriminatoryaccuracyaihda