Evaluating the perspectives of ChatGPT and Gemini on glenohumeral osteoarthritis management

Background: Integrating machine learning and artificial intelligence (AI) technologies has revolutionized various sectors, including health care. However, their application in orthopedic health-care settings still needs to be improved. This study sought to evaluate Chat Generative Pre-Trained Transf...

Full description

Saved in:
Bibliographic Details
Main Authors: Michael Megafu, DO, MPH, Omar Guerrero, BS, Rafay Hasan, BS, Larry Hunt, MBA, Devri Langhelm, BS, Benning Le, MS, Xinning Li, MD, Robert Kelly, IV, MD, Robert L. Parisien, MD, Antonio Cusano, MD
Format: Article
Language:English
Published: Elsevier 2025-07-01
Series:JSES International
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666638325000933
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849319576712511488
author Michael Megafu, DO, MPH
Omar Guerrero, BS
Rafay Hasan, BS
Larry Hunt, MBA
Devri Langhelm, BS
Benning Le, MS
Xinning Li, MD
Robert Kelly, IV, MD
Robert L. Parisien, MD
Antonio Cusano, MD
author_facet Michael Megafu, DO, MPH
Omar Guerrero, BS
Rafay Hasan, BS
Larry Hunt, MBA
Devri Langhelm, BS
Benning Le, MS
Xinning Li, MD
Robert Kelly, IV, MD
Robert L. Parisien, MD
Antonio Cusano, MD
author_sort Michael Megafu, DO, MPH
collection DOAJ
description Background: Integrating machine learning and artificial intelligence (AI) technologies has revolutionized various sectors, including health care. However, their application in orthopedic health-care settings still needs to be improved. This study sought to evaluate Chat Generative Pre-Trained Transformer (ChatGPT) and Gemini's capacity to make quality medical recommendations regarding glenohumeral osteoarthritis, weighing them against the recommendations established in the Evidence-Based Clinical Practice Guidelines (CPGs) of the American Academy of Orthopaedic Surgeons (AAOS). Methods: The 2020 AAOS CPGs, a widely recognized and respected source, were the basis for determining recommended and nonrecommended treatments in this study. ChatGPT and Gemini were queried on 20 treatments based on these guidelines; 10 were recommended for managing glenohumeral joint osteoarthritis, five were not recommended for managing glenohumeral joint osteoarthritis, and five were reported as consensus statements. These responses were categorized as “Concordance” or “No Concordance” with the AAOS CPGs. A Cohen's Kappa coefficient was calculated to assess the interrater reliability. Results: Among the 20 treatments examined, ChatGPT and Gemini showed concordance with the AAOS CPGs for 10 (100%) and 5 (50%) treatments, respectively. On the other hand, for treatments that AAOS CPGs did not recommend, ChatGPT had concordance for four out of the five treatments (80%), while Gemini had 100% concordance. The Cohen's Kappa coefficient to assess interrater reliability was found to be 0.90, indicating a very high level of agreement between the two raters in categorizing responses as “Concordance” or “No Concordance” with the AAOS CPGs. Conclusion: The study findings reveal that ChatGPT and Gemini cannot solely recommend CPGs as outlined in AAOS CPGs. As patients increasingly utilize external resources such as AI platforms and the Internet for medical recommendations, providers should advise patients to exercise caution when seeking medical advice from these AI platforms for managing glenohumeral joint osteoarthritis.
format Article
id doaj-art-e486b13c57044cf2ae2a38ba9c42575c
institution Kabale University
issn 2666-6383
language English
publishDate 2025-07-01
publisher Elsevier
record_format Article
series JSES International
spelling doaj-art-e486b13c57044cf2ae2a38ba9c42575c2025-08-20T03:50:22ZengElsevierJSES International2666-63832025-07-01941365137010.1016/j.jseint.2025.03.011Evaluating the perspectives of ChatGPT and Gemini on glenohumeral osteoarthritis managementMichael Megafu, DO, MPH0Omar Guerrero, BS1Rafay Hasan, BS2Larry Hunt, MBA3Devri Langhelm, BS4Benning Le, MS5Xinning Li, MD6Robert Kelly, IV, MD7Robert L. Parisien, MD8Antonio Cusano, MD9Department of Orthopaedic Surgery, University of Connecticut, Farmington, CT, USA; Corresponding author: Michael Megafu, DO, MPH, Department of Orthopaedic Surgery, University of Connecticut, 263 Farmington Ave, Farmington, CT 06032, USA.A.T. Still University School of Osteopathic Medicine in Arizona, Mesa, AZ, USAA.T. Still University School of Osteopathic Medicine in Arizona, Mesa, AZ, USAA.T. Still University School of Osteopathic Medicine in Arizona, Mesa, AZ, USAA.T. Still University School of Osteopathic Medicine in Arizona, Mesa, AZ, USAA.T. Still University School of Osteopathic Medicine in Arizona, Mesa, AZ, USADepartment of Orthopedic Surgery, Boston University School of Medicine, Boston, MA, USADepartment of Orthopedic Surgery, University of Pennsylvania, Philadelphia, PA, USADepartment of Orthopedic Surgery, Mount Sinai, New York, NY, USADepartment of Orthopaedic Surgery, University of Connecticut, Farmington, CT, USABackground: Integrating machine learning and artificial intelligence (AI) technologies has revolutionized various sectors, including health care. However, their application in orthopedic health-care settings still needs to be improved. This study sought to evaluate Chat Generative Pre-Trained Transformer (ChatGPT) and Gemini's capacity to make quality medical recommendations regarding glenohumeral osteoarthritis, weighing them against the recommendations established in the Evidence-Based Clinical Practice Guidelines (CPGs) of the American Academy of Orthopaedic Surgeons (AAOS). Methods: The 2020 AAOS CPGs, a widely recognized and respected source, were the basis for determining recommended and nonrecommended treatments in this study. ChatGPT and Gemini were queried on 20 treatments based on these guidelines; 10 were recommended for managing glenohumeral joint osteoarthritis, five were not recommended for managing glenohumeral joint osteoarthritis, and five were reported as consensus statements. These responses were categorized as “Concordance” or “No Concordance” with the AAOS CPGs. A Cohen's Kappa coefficient was calculated to assess the interrater reliability. Results: Among the 20 treatments examined, ChatGPT and Gemini showed concordance with the AAOS CPGs for 10 (100%) and 5 (50%) treatments, respectively. On the other hand, for treatments that AAOS CPGs did not recommend, ChatGPT had concordance for four out of the five treatments (80%), while Gemini had 100% concordance. The Cohen's Kappa coefficient to assess interrater reliability was found to be 0.90, indicating a very high level of agreement between the two raters in categorizing responses as “Concordance” or “No Concordance” with the AAOS CPGs. Conclusion: The study findings reveal that ChatGPT and Gemini cannot solely recommend CPGs as outlined in AAOS CPGs. As patients increasingly utilize external resources such as AI platforms and the Internet for medical recommendations, providers should advise patients to exercise caution when seeking medical advice from these AI platforms for managing glenohumeral joint osteoarthritis.http://www.sciencedirect.com/science/article/pii/S2666638325000933Basic Science StudyValidation of AI Software
spellingShingle Michael Megafu, DO, MPH
Omar Guerrero, BS
Rafay Hasan, BS
Larry Hunt, MBA
Devri Langhelm, BS
Benning Le, MS
Xinning Li, MD
Robert Kelly, IV, MD
Robert L. Parisien, MD
Antonio Cusano, MD
Evaluating the perspectives of ChatGPT and Gemini on glenohumeral osteoarthritis management
JSES International
Basic Science Study
Validation of AI Software
title Evaluating the perspectives of ChatGPT and Gemini on glenohumeral osteoarthritis management
title_full Evaluating the perspectives of ChatGPT and Gemini on glenohumeral osteoarthritis management
title_fullStr Evaluating the perspectives of ChatGPT and Gemini on glenohumeral osteoarthritis management
title_full_unstemmed Evaluating the perspectives of ChatGPT and Gemini on glenohumeral osteoarthritis management
title_short Evaluating the perspectives of ChatGPT and Gemini on glenohumeral osteoarthritis management
title_sort evaluating the perspectives of chatgpt and gemini on glenohumeral osteoarthritis management
topic Basic Science Study
Validation of AI Software
url http://www.sciencedirect.com/science/article/pii/S2666638325000933
work_keys_str_mv AT michaelmegafudomph evaluatingtheperspectivesofchatgptandgeminionglenohumeralosteoarthritismanagement
AT omarguerrerobs evaluatingtheperspectivesofchatgptandgeminionglenohumeralosteoarthritismanagement
AT rafayhasanbs evaluatingtheperspectivesofchatgptandgeminionglenohumeralosteoarthritismanagement
AT larryhuntmba evaluatingtheperspectivesofchatgptandgeminionglenohumeralosteoarthritismanagement
AT devrilanghelmbs evaluatingtheperspectivesofchatgptandgeminionglenohumeralosteoarthritismanagement
AT benninglems evaluatingtheperspectivesofchatgptandgeminionglenohumeralosteoarthritismanagement
AT xinninglimd evaluatingtheperspectivesofchatgptandgeminionglenohumeralosteoarthritismanagement
AT robertkellyivmd evaluatingtheperspectivesofchatgptandgeminionglenohumeralosteoarthritismanagement
AT robertlparisienmd evaluatingtheperspectivesofchatgptandgeminionglenohumeralosteoarthritismanagement
AT antoniocusanomd evaluatingtheperspectivesofchatgptandgeminionglenohumeralosteoarthritismanagement