Exploring the medical ethical limitations of GPT-4 in clinical decision-making scenarios: a pilot survey

BackgroundThis study aims to conduct an examination of GPT-4’s tendencies when confronted with ethical dilemmas, as well as to ascertain their ethical limitations within clinical decision-makings.MethodsEthical dilemmas were synthesized and organized into 10 different scenarios. To assess the respon...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yu-Tao Xiong, Yu-Min Zeng, Hao-Nan Liu, Ya-Nan Sun, Wei Tang, Chang Liu
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2025-05-01
Series:	Frontiers in Public Health
Subjects:	artificial intelligence GPT-4 ethical medical decision-making LLMs
Online Access:	https://www.frontiersin.org/articles/10.3389/fpubh.2025.1582377/full
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849762269958766592
author	Yu-Tao Xiong Yu-Min Zeng Hao-Nan Liu Ya-Nan Sun Wei Tang Chang Liu
author_facet	Yu-Tao Xiong Yu-Min Zeng Hao-Nan Liu Ya-Nan Sun Wei Tang Chang Liu
author_sort	Yu-Tao Xiong
collection	DOAJ
description	BackgroundThis study aims to conduct an examination of GPT-4’s tendencies when confronted with ethical dilemmas, as well as to ascertain their ethical limitations within clinical decision-makings.MethodsEthical dilemmas were synthesized and organized into 10 different scenarios. To assess the responses of GPT-4 to these dilemmas, a series of iterative and constrained prompting methods were employed. Custom questionnaire analysis and principal adherence analysis were employed to evaluate the GPT-4-generated responses. Questionnaire analysis was used to assess GPT-4’s ability to provide clinical decision-making recommendations, while principal adherence analysis evaluated its alignment with to ethical principles. Error analysis was conducted on GPT-4-generated responses.ResultsThe questionnaire analysis (5-point Likert scale) showed GPT-4 achieving an average score of 4.49, with the highest scores in the Physical Disability scenario (4.75) and the lowest in the Abortion/Surrogacy scenario (3.82). Furthermore, the principal adherence analysis showed GPT-4 achieved an overall consistency rate of 86%, with slightly lower performance (60%) observed in a few specific scenarios.ConclusionAt the current stage, with the appropriate prompt techniques, GPT-4 can offer proactive and comprehensible recommendations for clinical decision-making. However, GPT-4 exhibit certain errors during this process, leading to inconsistencies with ethical principles and thereby limiting its deeper application in clinical practice.
format	Article
id	doaj-art-bceb49de3cb54be595eef742d3ec0334
institution	DOAJ
issn	2296-2565
language	English
publishDate	2025-05-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Public Health
spelling	doaj-art-bceb49de3cb54be595eef742d3ec03342025-08-20T03:05:46ZengFrontiers Media S.A.Frontiers in Public Health2296-25652025-05-011310.3389/fpubh.2025.15823771582377Exploring the medical ethical limitations of GPT-4 in clinical decision-making scenarios: a pilot surveyYu-Tao Xiong0Yu-Min Zeng1Hao-Nan Liu2Ya-Nan Sun3Wei Tang4Chang Liu5State Key Laboratory of Oral Diseases and National Center for Stomatology and National Clinical Research Center for Oral Diseases and Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, ChinaState Key Laboratory of Oral Diseases and National Center for Stomatology and National Clinical Research Center for Oral Diseases and Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, ChinaState Key Laboratory of Oral Diseases and National Center for Stomatology and National Clinical Research Center for Oral Diseases and Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, ChinaMachine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu, ChinaState Key Laboratory of Oral Diseases and National Center for Stomatology and National Clinical Research Center for Oral Diseases and Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, ChinaState Key Laboratory of Oral Diseases and National Center for Stomatology and National Clinical Research Center for Oral Diseases and Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, ChinaBackgroundThis study aims to conduct an examination of GPT-4’s tendencies when confronted with ethical dilemmas, as well as to ascertain their ethical limitations within clinical decision-makings.MethodsEthical dilemmas were synthesized and organized into 10 different scenarios. To assess the responses of GPT-4 to these dilemmas, a series of iterative and constrained prompting methods were employed. Custom questionnaire analysis and principal adherence analysis were employed to evaluate the GPT-4-generated responses. Questionnaire analysis was used to assess GPT-4’s ability to provide clinical decision-making recommendations, while principal adherence analysis evaluated its alignment with to ethical principles. Error analysis was conducted on GPT-4-generated responses.ResultsThe questionnaire analysis (5-point Likert scale) showed GPT-4 achieving an average score of 4.49, with the highest scores in the Physical Disability scenario (4.75) and the lowest in the Abortion/Surrogacy scenario (3.82). Furthermore, the principal adherence analysis showed GPT-4 achieved an overall consistency rate of 86%, with slightly lower performance (60%) observed in a few specific scenarios.ConclusionAt the current stage, with the appropriate prompt techniques, GPT-4 can offer proactive and comprehensible recommendations for clinical decision-making. However, GPT-4 exhibit certain errors during this process, leading to inconsistencies with ethical principles and thereby limiting its deeper application in clinical practice.https://www.frontiersin.org/articles/10.3389/fpubh.2025.1582377/fullartificial intelligenceGPT-4ethical medicaldecision-makingLLMs
spellingShingle	Yu-Tao Xiong Yu-Min Zeng Hao-Nan Liu Ya-Nan Sun Wei Tang Chang Liu Exploring the medical ethical limitations of GPT-4 in clinical decision-making scenarios: a pilot survey Frontiers in Public Health artificial intelligence GPT-4 ethical medical decision-making LLMs
title	Exploring the medical ethical limitations of GPT-4 in clinical decision-making scenarios: a pilot survey
title_full	Exploring the medical ethical limitations of GPT-4 in clinical decision-making scenarios: a pilot survey
title_fullStr	Exploring the medical ethical limitations of GPT-4 in clinical decision-making scenarios: a pilot survey
title_full_unstemmed	Exploring the medical ethical limitations of GPT-4 in clinical decision-making scenarios: a pilot survey
title_short	Exploring the medical ethical limitations of GPT-4 in clinical decision-making scenarios: a pilot survey
title_sort	exploring the medical ethical limitations of gpt 4 in clinical decision making scenarios a pilot survey
topic	artificial intelligence GPT-4 ethical medical decision-making LLMs
url	https://www.frontiersin.org/articles/10.3389/fpubh.2025.1582377/full
work_keys_str_mv	AT yutaoxiong exploringthemedicalethicallimitationsofgpt4inclinicaldecisionmakingscenariosapilotsurvey AT yuminzeng exploringthemedicalethicallimitationsofgpt4inclinicaldecisionmakingscenariosapilotsurvey AT haonanliu exploringthemedicalethicallimitationsofgpt4inclinicaldecisionmakingscenariosapilotsurvey AT yanansun exploringthemedicalethicallimitationsofgpt4inclinicaldecisionmakingscenariosapilotsurvey AT weitang exploringthemedicalethicallimitationsofgpt4inclinicaldecisionmakingscenariosapilotsurvey AT changliu exploringthemedicalethicallimitationsofgpt4inclinicaldecisionmakingscenariosapilotsurvey

Exploring the medical ethical limitations of GPT-4 in clinical decision-making scenarios: a pilot survey

Similar Items