Assessing Accuracy of Chat Generative Pre-Trained Transformer’s Responses to Common Patient Questions Regarding Congenital Upper Limb Differences

Purpose: The purpose was to assess the ability of Chat Generative Pre-Trained Transformer (ChatGPT) 4.0 to accurately and reliably answer patients’ frequently asked questions (FAQs) about congenital upper limb differences (CULDs) and their treatment options. Methods: Two pediatric hand surgeons were...

Full description

Saved in:
Bibliographic Details
Main Authors: Niklaus P. Zeller, BS, Ayush D. Shah, BA, Ann E. Van Heest, MD, Deborah C. Bohn, MD
Format: Article
Language:English
Published: Elsevier 2025-07-01
Series:Journal of Hand Surgery Global Online
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2589514125000842
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850237578115022848
author Niklaus P. Zeller, BS
Ayush D. Shah, BA
Ann E. Van Heest, MD
Deborah C. Bohn, MD
author_facet Niklaus P. Zeller, BS
Ayush D. Shah, BA
Ann E. Van Heest, MD
Deborah C. Bohn, MD
author_sort Niklaus P. Zeller, BS
collection DOAJ
description Purpose: The purpose was to assess the ability of Chat Generative Pre-Trained Transformer (ChatGPT) 4.0 to accurately and reliably answer patients’ frequently asked questions (FAQs) about congenital upper limb differences (CULDs) and their treatment options. Methods: Two pediatric hand surgeons were queried regarding FAQs they receive from parents about CULDs. Sixteen FAQs were input to ChatGPT-4.0 for the following conditions: (1) syndactyly, (2) polydactyly, (3) radial longitudinal deficiency, (4) thumb hypoplasia, and (5) general congenital hand differences. Two additional psychosocial care questions were queried, and all responses were graded by the surgeons using a scale of 1–4, based on the quality of the response. Independent chats were used for each question to reduce memory–retention bias with no pretraining of the software application. Results: Overall, ChatGPT provided relatively reliable, evidence-based responses to the 16 queried FAQs. In total, 164 grades were assigned to the 82 ChatGPT responses: 83 (51%) did not require any clarification, 37 (23%) required minimal clarification, 32 (20%) required moderate clarification, and 13 (8%) received an unsatisfactory rating. However, there was considerable variability in the depth of many responses. When queried on medical associations with syndactyly and polydactyly, ChatGPT provided a detailed account of associated syndromes, although there was no mention that syndromic involvement is relatively rare. Furthermore, ChatGPT recommended that the patients consult a health care provider for individualized care 81 times in 49 responses. It commonly “referred” patients to genetic counselors (n = 26, 32%), followed by pediatric orthopedic surgeons and orthopedic surgeons (n = 16, 20%), and hand surgeons (n = 9, 11%). Conclusions: Chat Generative Pre-Trained Transformer provided evidence-based responses not requiring clarification to a majority of FAQs about CULDs. However, there was considerable variation across the responses, and it rarely “referred” patients to hand surgeons. As new tools for patient education, ChatGPT and similar large language models should be approached cautiously when seeking information about CULDs. Responses do not consistently provide comprehensive, individualized information. 8% of responses were misguiding. Type of study/level of evidence: Economic/decision analysis IIC.
format Article
id doaj-art-bcbdf8025ec6412a9a817f79cbf07586
institution OA Journals
issn 2589-5141
language English
publishDate 2025-07-01
publisher Elsevier
record_format Article
series Journal of Hand Surgery Global Online
spelling doaj-art-bcbdf8025ec6412a9a817f79cbf075862025-08-20T02:01:42ZengElsevierJournal of Hand Surgery Global Online2589-51412025-07-017410076410.1016/j.jhsg.2025.100764Assessing Accuracy of Chat Generative Pre-Trained Transformer’s Responses to Common Patient Questions Regarding Congenital Upper Limb DifferencesNiklaus P. Zeller, BS0Ayush D. Shah, BA1Ann E. Van Heest, MD2Deborah C. Bohn, MD3University of Minnesota Medical School, Minneapolis, MNUniversity of Minnesota Medical School, Minneapolis, MNUniversity of Minnesota Medical School, Minneapolis, MN; Department of Orthopedic Surgery, University of Minnesota, Minneapolis, MN; Corresponding author: Ann E. Van Heest, MD, Department of Orthopedic Surgery, University of Minnesota Medical School, 2512 South 7th Street, Suite R200, Minneapolis, MN 55455.University of Minnesota Medical School, Minneapolis, MN; Department of Orthopedic Surgery, University of Minnesota, Minneapolis, MNPurpose: The purpose was to assess the ability of Chat Generative Pre-Trained Transformer (ChatGPT) 4.0 to accurately and reliably answer patients’ frequently asked questions (FAQs) about congenital upper limb differences (CULDs) and their treatment options. Methods: Two pediatric hand surgeons were queried regarding FAQs they receive from parents about CULDs. Sixteen FAQs were input to ChatGPT-4.0 for the following conditions: (1) syndactyly, (2) polydactyly, (3) radial longitudinal deficiency, (4) thumb hypoplasia, and (5) general congenital hand differences. Two additional psychosocial care questions were queried, and all responses were graded by the surgeons using a scale of 1–4, based on the quality of the response. Independent chats were used for each question to reduce memory–retention bias with no pretraining of the software application. Results: Overall, ChatGPT provided relatively reliable, evidence-based responses to the 16 queried FAQs. In total, 164 grades were assigned to the 82 ChatGPT responses: 83 (51%) did not require any clarification, 37 (23%) required minimal clarification, 32 (20%) required moderate clarification, and 13 (8%) received an unsatisfactory rating. However, there was considerable variability in the depth of many responses. When queried on medical associations with syndactyly and polydactyly, ChatGPT provided a detailed account of associated syndromes, although there was no mention that syndromic involvement is relatively rare. Furthermore, ChatGPT recommended that the patients consult a health care provider for individualized care 81 times in 49 responses. It commonly “referred” patients to genetic counselors (n = 26, 32%), followed by pediatric orthopedic surgeons and orthopedic surgeons (n = 16, 20%), and hand surgeons (n = 9, 11%). Conclusions: Chat Generative Pre-Trained Transformer provided evidence-based responses not requiring clarification to a majority of FAQs about CULDs. However, there was considerable variation across the responses, and it rarely “referred” patients to hand surgeons. As new tools for patient education, ChatGPT and similar large language models should be approached cautiously when seeking information about CULDs. Responses do not consistently provide comprehensive, individualized information. 8% of responses were misguiding. Type of study/level of evidence: Economic/decision analysis IIC.http://www.sciencedirect.com/science/article/pii/S2589514125000842Congenital handCongenital upper limb differencesPatient educationPediatric hand
spellingShingle Niklaus P. Zeller, BS
Ayush D. Shah, BA
Ann E. Van Heest, MD
Deborah C. Bohn, MD
Assessing Accuracy of Chat Generative Pre-Trained Transformer’s Responses to Common Patient Questions Regarding Congenital Upper Limb Differences
Journal of Hand Surgery Global Online
Congenital hand
Congenital upper limb differences
Patient education
Pediatric hand
title Assessing Accuracy of Chat Generative Pre-Trained Transformer’s Responses to Common Patient Questions Regarding Congenital Upper Limb Differences
title_full Assessing Accuracy of Chat Generative Pre-Trained Transformer’s Responses to Common Patient Questions Regarding Congenital Upper Limb Differences
title_fullStr Assessing Accuracy of Chat Generative Pre-Trained Transformer’s Responses to Common Patient Questions Regarding Congenital Upper Limb Differences
title_full_unstemmed Assessing Accuracy of Chat Generative Pre-Trained Transformer’s Responses to Common Patient Questions Regarding Congenital Upper Limb Differences
title_short Assessing Accuracy of Chat Generative Pre-Trained Transformer’s Responses to Common Patient Questions Regarding Congenital Upper Limb Differences
title_sort assessing accuracy of chat generative pre trained transformer s responses to common patient questions regarding congenital upper limb differences
topic Congenital hand
Congenital upper limb differences
Patient education
Pediatric hand
url http://www.sciencedirect.com/science/article/pii/S2589514125000842
work_keys_str_mv AT niklauspzellerbs assessingaccuracyofchatgenerativepretrainedtransformersresponsestocommonpatientquestionsregardingcongenitalupperlimbdifferences
AT ayushdshahba assessingaccuracyofchatgenerativepretrainedtransformersresponsestocommonpatientquestionsregardingcongenitalupperlimbdifferences
AT annevanheestmd assessingaccuracyofchatgenerativepretrainedtransformersresponsestocommonpatientquestionsregardingcongenitalupperlimbdifferences
AT deborahcbohnmd assessingaccuracyofchatgenerativepretrainedtransformersresponsestocommonpatientquestionsregardingcongenitalupperlimbdifferences