To Self-Treat or Not to Self-Treat: Evaluating the Diagnostic, Advisory and Referral Effectiveness of ChatGPT Responses to the Most Common Musculoskeletal Disorders

<b>Background/Objectives</b>: The increased accessibility of information has resulted in a rise in patients trying to self-diagnose and opting for self-medication, either as a primary treatment or as a supplement to medical care. Our objective was to evaluate the reliability, comprehensi...

Full description

Saved in:
Bibliographic Details
Main Authors: Ufuk Arzu, Batuhan Gencer
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Diagnostics
Subjects:
Online Access:https://www.mdpi.com/2075-4418/15/14/1834
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850078294444081152
author Ufuk Arzu
Batuhan Gencer
author_facet Ufuk Arzu
Batuhan Gencer
author_sort Ufuk Arzu
collection DOAJ
description <b>Background/Objectives</b>: The increased accessibility of information has resulted in a rise in patients trying to self-diagnose and opting for self-medication, either as a primary treatment or as a supplement to medical care. Our objective was to evaluate the reliability, comprehensibility, and readability of the responses provided by ChatGPT 4.0 when queried about the most prevalent orthopaedic problems, thus ascertaining the occurrence of misguidance and the necessity for an audit of the disseminated information. <b>Methods:</b> ChatGPT 4.0 was presented with 26 open-ended questions. The responses were evaluated by two observers using a Likert scale in the categories of diagnosis, recommendation, and referral. The scores from the responses were subjected to subgroup analysis according to the area of interest (AoI) and anatomical region. The readability and comprehensibility of the chatbot’s responses were analyzed using the Flesch–Kincaid Reading Ease Score (FRES) and Flesch–Kincaid Grade Level (FKGL). <b>Results:</b> The majority of the responses were rated as either ‘adequate’ or ‘excellent’. However, in the diagnosis category, a significant difference was found in the evaluation made according to the AoI (<i>p</i> = 0.007), which is attributed to trauma-related questions. No significant difference was identified in any other category. The mean FKGL score was 7.8 ± 1.267, and the mean FRES was 52.68 ± 8.6. The average estimated reading level required to understand the text was considered as “high school”. <b>Conclusions:</b> ChatGPT 4.0 facilitates the self-diagnosis and self-treatment tendencies of patients with musculoskeletal disorders. However, it is imperative for patients to have a robust understanding of the limitations of chatbot-generated advice, particularly in trauma-related conditions.
format Article
id doaj-art-e0614b5719bf4f89a1e4b599f9788e92
institution DOAJ
issn 2075-4418
language English
publishDate 2025-07-01
publisher MDPI AG
record_format Article
series Diagnostics
spelling doaj-art-e0614b5719bf4f89a1e4b599f9788e922025-08-20T02:45:34ZengMDPI AGDiagnostics2075-44182025-07-011514183410.3390/diagnostics15141834To Self-Treat or Not to Self-Treat: Evaluating the Diagnostic, Advisory and Referral Effectiveness of ChatGPT Responses to the Most Common Musculoskeletal DisordersUfuk Arzu0Batuhan Gencer1Department of Orthopedics and Traumatology, Marmara University Pendik Training and Research Hospital, 34890 Istanbul, TurkeyDepartment of Orthopedics and Traumatology, Marmara University Pendik Training and Research Hospital, 34890 Istanbul, Turkey<b>Background/Objectives</b>: The increased accessibility of information has resulted in a rise in patients trying to self-diagnose and opting for self-medication, either as a primary treatment or as a supplement to medical care. Our objective was to evaluate the reliability, comprehensibility, and readability of the responses provided by ChatGPT 4.0 when queried about the most prevalent orthopaedic problems, thus ascertaining the occurrence of misguidance and the necessity for an audit of the disseminated information. <b>Methods:</b> ChatGPT 4.0 was presented with 26 open-ended questions. The responses were evaluated by two observers using a Likert scale in the categories of diagnosis, recommendation, and referral. The scores from the responses were subjected to subgroup analysis according to the area of interest (AoI) and anatomical region. The readability and comprehensibility of the chatbot’s responses were analyzed using the Flesch–Kincaid Reading Ease Score (FRES) and Flesch–Kincaid Grade Level (FKGL). <b>Results:</b> The majority of the responses were rated as either ‘adequate’ or ‘excellent’. However, in the diagnosis category, a significant difference was found in the evaluation made according to the AoI (<i>p</i> = 0.007), which is attributed to trauma-related questions. No significant difference was identified in any other category. The mean FKGL score was 7.8 ± 1.267, and the mean FRES was 52.68 ± 8.6. The average estimated reading level required to understand the text was considered as “high school”. <b>Conclusions:</b> ChatGPT 4.0 facilitates the self-diagnosis and self-treatment tendencies of patients with musculoskeletal disorders. However, it is imperative for patients to have a robust understanding of the limitations of chatbot-generated advice, particularly in trauma-related conditions.https://www.mdpi.com/2075-4418/15/14/1834ChatGPTself-diagnosisself-treatmentreadabilityFlesch–Kincaid Grade Leveltrauma
spellingShingle Ufuk Arzu
Batuhan Gencer
To Self-Treat or Not to Self-Treat: Evaluating the Diagnostic, Advisory and Referral Effectiveness of ChatGPT Responses to the Most Common Musculoskeletal Disorders
Diagnostics
ChatGPT
self-diagnosis
self-treatment
readability
Flesch–Kincaid Grade Level
trauma
title To Self-Treat or Not to Self-Treat: Evaluating the Diagnostic, Advisory and Referral Effectiveness of ChatGPT Responses to the Most Common Musculoskeletal Disorders
title_full To Self-Treat or Not to Self-Treat: Evaluating the Diagnostic, Advisory and Referral Effectiveness of ChatGPT Responses to the Most Common Musculoskeletal Disorders
title_fullStr To Self-Treat or Not to Self-Treat: Evaluating the Diagnostic, Advisory and Referral Effectiveness of ChatGPT Responses to the Most Common Musculoskeletal Disorders
title_full_unstemmed To Self-Treat or Not to Self-Treat: Evaluating the Diagnostic, Advisory and Referral Effectiveness of ChatGPT Responses to the Most Common Musculoskeletal Disorders
title_short To Self-Treat or Not to Self-Treat: Evaluating the Diagnostic, Advisory and Referral Effectiveness of ChatGPT Responses to the Most Common Musculoskeletal Disorders
title_sort to self treat or not to self treat evaluating the diagnostic advisory and referral effectiveness of chatgpt responses to the most common musculoskeletal disorders
topic ChatGPT
self-diagnosis
self-treatment
readability
Flesch–Kincaid Grade Level
trauma
url https://www.mdpi.com/2075-4418/15/14/1834
work_keys_str_mv AT ufukarzu toselftreatornottoselftreatevaluatingthediagnosticadvisoryandreferraleffectivenessofchatgptresponsestothemostcommonmusculoskeletaldisorders
AT batuhangencer toselftreatornottoselftreatevaluatingthediagnosticadvisoryandreferraleffectivenessofchatgptresponsestothemostcommonmusculoskeletaldisorders