Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment

The integration of Artificial Intelligence (AI) into healthcare has opened new avenues for clinical decision support, particularly in radiology. The aim of this study was to evaluate the accuracy and reproducibility of ChatGPT-4o in the radiographic image interpretation of orthopantomograms (OPGs) f...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ana Suárez, Stefania Arena, Alberto Herranz Calzada, Ana Isabel Castillo Varón, Victor Diaz-Flores García, Yolanda Freire
Format:	Article
Language:	English
Published:	Elsevier 2025-01-01
Series:	Computational and Structural Biotechnology Journal
Subjects:	Artificial Intelligence ChatGPT Oral surgery Orthopantomography Dentistry
Online Access:	http://www.sciencedirect.com/science/article/pii/S200103702500131X
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850187511025893376
author	Ana Suárez Stefania Arena Alberto Herranz Calzada Ana Isabel Castillo Varón Victor Diaz-Flores García Yolanda Freire
author_facet	Ana Suárez Stefania Arena Alberto Herranz Calzada Ana Isabel Castillo Varón Victor Diaz-Flores García Yolanda Freire
author_sort	Ana Suárez
collection	DOAJ
description	The integration of Artificial Intelligence (AI) into healthcare has opened new avenues for clinical decision support, particularly in radiology. The aim of this study was to evaluate the accuracy and reproducibility of ChatGPT-4o in the radiographic image interpretation of orthopantomograms (OPGs) for assessment of lower third molars, simulating real patient requests for tooth extraction. Thirty OPGs were analyzed, each paired with a standardized prompt submitted to ChatGPT-4o, generating 900 responses (30 per radiograph). Two oral surgery experts independently evaluated the responses using a three-point Likert scale (correct, partially correct/incomplete, incorrect), with disagreements resolved by a third expert. ChatGPT-4o achieved an accuracy rate of 38.44 % (95 % CI: 35.27 %–41.62 %). The percentage agreement among repeated responses was 82.7 %, indicating high consistency, though Gwet’s coefficient of agreement (60.4 %) suggested only moderate repeatability. While the model correctly identified general features in some cases, it frequently provided incomplete or fabricated information, particularly in complex radiographs involving overlapping structures or underdeveloped roots.These findings highlight ChatGPT-4o’s current limitations in dental radiographic interpretation. Although it demonstrated some capability in analyzing OPGs, its accuracy and reliability remain insufficient for unsupervised clinical use. Professional oversight is essential to prevent diagnostic errors. Further refinement and specialized training of AI models are needed to enhance their performance and ensure safe integration into dental practice, especially in patient-facing applications.
format	Article
id	doaj-art-2155cf57ae1a480e8d2927d9e85315c5
institution	OA Journals
issn	2001-0370
language	English
publishDate	2025-01-01
publisher	Elsevier
record_format	Article
series	Computational and Structural Biotechnology Journal
spelling	doaj-art-2155cf57ae1a480e8d2927d9e85315c52025-08-20T02:16:05ZengElsevierComputational and Structural Biotechnology Journal2001-03702025-01-012814114710.1016/j.csbj.2025.04.010Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessmentAna Suárez0Stefania Arena1Alberto Herranz Calzada2Ana Isabel Castillo Varón3Victor Diaz-Flores García4Yolanda Freire5Department of Pre-Clinic Dentistry II, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, SpainDepartment of Pre-Clinic Dentistry II, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, SpainDepartment of Pre-Clinic Dentistry II, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, Spain; Department of Pre-Clinic Dentistry I, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, SpainDepartment of Medicine. Faculty of Medicine, Health and Sports. Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, SpainDepartment of Pre-Clinic Dentistry I, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, Spain; Correspondence to: Department of Pre-Clinic Dentistry I, School of Biomedical Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, Spain.Department of Pre-Clinic Dentistry II, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, SpainThe integration of Artificial Intelligence (AI) into healthcare has opened new avenues for clinical decision support, particularly in radiology. The aim of this study was to evaluate the accuracy and reproducibility of ChatGPT-4o in the radiographic image interpretation of orthopantomograms (OPGs) for assessment of lower third molars, simulating real patient requests for tooth extraction. Thirty OPGs were analyzed, each paired with a standardized prompt submitted to ChatGPT-4o, generating 900 responses (30 per radiograph). Two oral surgery experts independently evaluated the responses using a three-point Likert scale (correct, partially correct/incomplete, incorrect), with disagreements resolved by a third expert. ChatGPT-4o achieved an accuracy rate of 38.44 % (95 % CI: 35.27 %–41.62 %). The percentage agreement among repeated responses was 82.7 %, indicating high consistency, though Gwet’s coefficient of agreement (60.4 %) suggested only moderate repeatability. While the model correctly identified general features in some cases, it frequently provided incomplete or fabricated information, particularly in complex radiographs involving overlapping structures or underdeveloped roots.These findings highlight ChatGPT-4o’s current limitations in dental radiographic interpretation. Although it demonstrated some capability in analyzing OPGs, its accuracy and reliability remain insufficient for unsupervised clinical use. Professional oversight is essential to prevent diagnostic errors. Further refinement and specialized training of AI models are needed to enhance their performance and ensure safe integration into dental practice, especially in patient-facing applications.http://www.sciencedirect.com/science/article/pii/S200103702500131XArtificial IntelligenceChatGPTOral surgeryOrthopantomographyDentistry
spellingShingle	Ana Suárez Stefania Arena Alberto Herranz Calzada Ana Isabel Castillo Varón Victor Diaz-Flores García Yolanda Freire Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment Computational and Structural Biotechnology Journal Artificial Intelligence ChatGPT Oral surgery Orthopantomography Dentistry
title	Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment
title_full	Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment
title_fullStr	Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment
title_full_unstemmed	Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment
title_short	Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment
title_sort	decoding wisdom evaluating chatgpt s accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment
topic	Artificial Intelligence ChatGPT Oral surgery Orthopantomography Dentistry
url	http://www.sciencedirect.com/science/article/pii/S200103702500131X
work_keys_str_mv	AT anasuarez decodingwisdomevaluatingchatgptsaccuracyandreproducibilityinanalyzingorthopantomographicimagesforthirdmolarassessment AT stefaniaarena decodingwisdomevaluatingchatgptsaccuracyandreproducibilityinanalyzingorthopantomographicimagesforthirdmolarassessment AT albertoherranzcalzada decodingwisdomevaluatingchatgptsaccuracyandreproducibilityinanalyzingorthopantomographicimagesforthirdmolarassessment AT anaisabelcastillovaron decodingwisdomevaluatingchatgptsaccuracyandreproducibilityinanalyzingorthopantomographicimagesforthirdmolarassessment AT victordiazfloresgarcia decodingwisdomevaluatingchatgptsaccuracyandreproducibilityinanalyzingorthopantomographicimagesforthirdmolarassessment AT yolandafreire decodingwisdomevaluatingchatgptsaccuracyandreproducibilityinanalyzingorthopantomographicimagesforthirdmolarassessment

Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment

Similar Items