Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment

The integration of Artificial Intelligence (AI) into healthcare has opened new avenues for clinical decision support, particularly in radiology. The aim of this study was to evaluate the accuracy and reproducibility of ChatGPT-4o in the radiographic image interpretation of orthopantomograms (OPGs) f...

Full description

Saved in:
Bibliographic Details
Main Authors: Ana Suárez, Stefania Arena, Alberto Herranz Calzada, Ana Isabel Castillo Varón, Victor Diaz-Flores García, Yolanda Freire
Format: Article
Language:English
Published: Elsevier 2025-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S200103702500131X
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850187511025893376
author Ana Suárez
Stefania Arena
Alberto Herranz Calzada
Ana Isabel Castillo Varón
Victor Diaz-Flores García
Yolanda Freire
author_facet Ana Suárez
Stefania Arena
Alberto Herranz Calzada
Ana Isabel Castillo Varón
Victor Diaz-Flores García
Yolanda Freire
author_sort Ana Suárez
collection DOAJ
description The integration of Artificial Intelligence (AI) into healthcare has opened new avenues for clinical decision support, particularly in radiology. The aim of this study was to evaluate the accuracy and reproducibility of ChatGPT-4o in the radiographic image interpretation of orthopantomograms (OPGs) for assessment of lower third molars, simulating real patient requests for tooth extraction. Thirty OPGs were analyzed, each paired with a standardized prompt submitted to ChatGPT-4o, generating 900 responses (30 per radiograph). Two oral surgery experts independently evaluated the responses using a three-point Likert scale (correct, partially correct/incomplete, incorrect), with disagreements resolved by a third expert. ChatGPT-4o achieved an accuracy rate of 38.44 % (95 % CI: 35.27 %–41.62 %). The percentage agreement among repeated responses was 82.7 %, indicating high consistency, though Gwet’s coefficient of agreement (60.4 %) suggested only moderate repeatability. While the model correctly identified general features in some cases, it frequently provided incomplete or fabricated information, particularly in complex radiographs involving overlapping structures or underdeveloped roots.These findings highlight ChatGPT-4o’s current limitations in dental radiographic interpretation. Although it demonstrated some capability in analyzing OPGs, its accuracy and reliability remain insufficient for unsupervised clinical use. Professional oversight is essential to prevent diagnostic errors. Further refinement and specialized training of AI models are needed to enhance their performance and ensure safe integration into dental practice, especially in patient-facing applications.
format Article
id doaj-art-2155cf57ae1a480e8d2927d9e85315c5
institution OA Journals
issn 2001-0370
language English
publishDate 2025-01-01
publisher Elsevier
record_format Article
series Computational and Structural Biotechnology Journal
spelling doaj-art-2155cf57ae1a480e8d2927d9e85315c52025-08-20T02:16:05ZengElsevierComputational and Structural Biotechnology Journal2001-03702025-01-012814114710.1016/j.csbj.2025.04.010Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessmentAna Suárez0Stefania Arena1Alberto Herranz Calzada2Ana Isabel Castillo Varón3Victor Diaz-Flores García4Yolanda Freire5Department of Pre-Clinic Dentistry II, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, SpainDepartment of Pre-Clinic Dentistry II, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, SpainDepartment of Pre-Clinic Dentistry II, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, Spain; Department of Pre-Clinic Dentistry I, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, SpainDepartment of Medicine. Faculty of Medicine, Health and Sports. Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, SpainDepartment of Pre-Clinic Dentistry I, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, Spain; Correspondence to: Department of Pre-Clinic Dentistry I, School of Biomedical Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, Spain.Department of Pre-Clinic Dentistry II, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, SpainThe integration of Artificial Intelligence (AI) into healthcare has opened new avenues for clinical decision support, particularly in radiology. The aim of this study was to evaluate the accuracy and reproducibility of ChatGPT-4o in the radiographic image interpretation of orthopantomograms (OPGs) for assessment of lower third molars, simulating real patient requests for tooth extraction. Thirty OPGs were analyzed, each paired with a standardized prompt submitted to ChatGPT-4o, generating 900 responses (30 per radiograph). Two oral surgery experts independently evaluated the responses using a three-point Likert scale (correct, partially correct/incomplete, incorrect), with disagreements resolved by a third expert. ChatGPT-4o achieved an accuracy rate of 38.44 % (95 % CI: 35.27 %–41.62 %). The percentage agreement among repeated responses was 82.7 %, indicating high consistency, though Gwet’s coefficient of agreement (60.4 %) suggested only moderate repeatability. While the model correctly identified general features in some cases, it frequently provided incomplete or fabricated information, particularly in complex radiographs involving overlapping structures or underdeveloped roots.These findings highlight ChatGPT-4o’s current limitations in dental radiographic interpretation. Although it demonstrated some capability in analyzing OPGs, its accuracy and reliability remain insufficient for unsupervised clinical use. Professional oversight is essential to prevent diagnostic errors. Further refinement and specialized training of AI models are needed to enhance their performance and ensure safe integration into dental practice, especially in patient-facing applications.http://www.sciencedirect.com/science/article/pii/S200103702500131XArtificial IntelligenceChatGPTOral surgeryOrthopantomographyDentistry
spellingShingle Ana Suárez
Stefania Arena
Alberto Herranz Calzada
Ana Isabel Castillo Varón
Victor Diaz-Flores García
Yolanda Freire
Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment
Computational and Structural Biotechnology Journal
Artificial Intelligence
ChatGPT
Oral surgery
Orthopantomography
Dentistry
title Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment
title_full Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment
title_fullStr Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment
title_full_unstemmed Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment
title_short Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment
title_sort decoding wisdom evaluating chatgpt s accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment
topic Artificial Intelligence
ChatGPT
Oral surgery
Orthopantomography
Dentistry
url http://www.sciencedirect.com/science/article/pii/S200103702500131X
work_keys_str_mv AT anasuarez decodingwisdomevaluatingchatgptsaccuracyandreproducibilityinanalyzingorthopantomographicimagesforthirdmolarassessment
AT stefaniaarena decodingwisdomevaluatingchatgptsaccuracyandreproducibilityinanalyzingorthopantomographicimagesforthirdmolarassessment
AT albertoherranzcalzada decodingwisdomevaluatingchatgptsaccuracyandreproducibilityinanalyzingorthopantomographicimagesforthirdmolarassessment
AT anaisabelcastillovaron decodingwisdomevaluatingchatgptsaccuracyandreproducibilityinanalyzingorthopantomographicimagesforthirdmolarassessment
AT victordiazfloresgarcia decodingwisdomevaluatingchatgptsaccuracyandreproducibilityinanalyzingorthopantomographicimagesforthirdmolarassessment
AT yolandafreire decodingwisdomevaluatingchatgptsaccuracyandreproducibilityinanalyzingorthopantomographicimagesforthirdmolarassessment