Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment
The integration of Artificial Intelligence (AI) into healthcare has opened new avenues for clinical decision support, particularly in radiology. The aim of this study was to evaluate the accuracy and reproducibility of ChatGPT-4o in the radiographic image interpretation of orthopantomograms (OPGs) f...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-01-01
|
| Series: | Computational and Structural Biotechnology Journal |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S200103702500131X |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850187511025893376 |
|---|---|
| author | Ana Suárez Stefania Arena Alberto Herranz Calzada Ana Isabel Castillo Varón Victor Diaz-Flores García Yolanda Freire |
| author_facet | Ana Suárez Stefania Arena Alberto Herranz Calzada Ana Isabel Castillo Varón Victor Diaz-Flores García Yolanda Freire |
| author_sort | Ana Suárez |
| collection | DOAJ |
| description | The integration of Artificial Intelligence (AI) into healthcare has opened new avenues for clinical decision support, particularly in radiology. The aim of this study was to evaluate the accuracy and reproducibility of ChatGPT-4o in the radiographic image interpretation of orthopantomograms (OPGs) for assessment of lower third molars, simulating real patient requests for tooth extraction. Thirty OPGs were analyzed, each paired with a standardized prompt submitted to ChatGPT-4o, generating 900 responses (30 per radiograph). Two oral surgery experts independently evaluated the responses using a three-point Likert scale (correct, partially correct/incomplete, incorrect), with disagreements resolved by a third expert. ChatGPT-4o achieved an accuracy rate of 38.44 % (95 % CI: 35.27 %–41.62 %). The percentage agreement among repeated responses was 82.7 %, indicating high consistency, though Gwet’s coefficient of agreement (60.4 %) suggested only moderate repeatability. While the model correctly identified general features in some cases, it frequently provided incomplete or fabricated information, particularly in complex radiographs involving overlapping structures or underdeveloped roots.These findings highlight ChatGPT-4o’s current limitations in dental radiographic interpretation. Although it demonstrated some capability in analyzing OPGs, its accuracy and reliability remain insufficient for unsupervised clinical use. Professional oversight is essential to prevent diagnostic errors. Further refinement and specialized training of AI models are needed to enhance their performance and ensure safe integration into dental practice, especially in patient-facing applications. |
| format | Article |
| id | doaj-art-2155cf57ae1a480e8d2927d9e85315c5 |
| institution | OA Journals |
| issn | 2001-0370 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Computational and Structural Biotechnology Journal |
| spelling | doaj-art-2155cf57ae1a480e8d2927d9e85315c52025-08-20T02:16:05ZengElsevierComputational and Structural Biotechnology Journal2001-03702025-01-012814114710.1016/j.csbj.2025.04.010Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessmentAna Suárez0Stefania Arena1Alberto Herranz Calzada2Ana Isabel Castillo Varón3Victor Diaz-Flores García4Yolanda Freire5Department of Pre-Clinic Dentistry II, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, SpainDepartment of Pre-Clinic Dentistry II, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, SpainDepartment of Pre-Clinic Dentistry II, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, Spain; Department of Pre-Clinic Dentistry I, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, SpainDepartment of Medicine. Faculty of Medicine, Health and Sports. Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, SpainDepartment of Pre-Clinic Dentistry I, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, Spain; Correspondence to: Department of Pre-Clinic Dentistry I, School of Biomedical Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, Spain.Department of Pre-Clinic Dentistry II, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, Madrid 28670, SpainThe integration of Artificial Intelligence (AI) into healthcare has opened new avenues for clinical decision support, particularly in radiology. The aim of this study was to evaluate the accuracy and reproducibility of ChatGPT-4o in the radiographic image interpretation of orthopantomograms (OPGs) for assessment of lower third molars, simulating real patient requests for tooth extraction. Thirty OPGs were analyzed, each paired with a standardized prompt submitted to ChatGPT-4o, generating 900 responses (30 per radiograph). Two oral surgery experts independently evaluated the responses using a three-point Likert scale (correct, partially correct/incomplete, incorrect), with disagreements resolved by a third expert. ChatGPT-4o achieved an accuracy rate of 38.44 % (95 % CI: 35.27 %–41.62 %). The percentage agreement among repeated responses was 82.7 %, indicating high consistency, though Gwet’s coefficient of agreement (60.4 %) suggested only moderate repeatability. While the model correctly identified general features in some cases, it frequently provided incomplete or fabricated information, particularly in complex radiographs involving overlapping structures or underdeveloped roots.These findings highlight ChatGPT-4o’s current limitations in dental radiographic interpretation. Although it demonstrated some capability in analyzing OPGs, its accuracy and reliability remain insufficient for unsupervised clinical use. Professional oversight is essential to prevent diagnostic errors. Further refinement and specialized training of AI models are needed to enhance their performance and ensure safe integration into dental practice, especially in patient-facing applications.http://www.sciencedirect.com/science/article/pii/S200103702500131XArtificial IntelligenceChatGPTOral surgeryOrthopantomographyDentistry |
| spellingShingle | Ana Suárez Stefania Arena Alberto Herranz Calzada Ana Isabel Castillo Varón Victor Diaz-Flores García Yolanda Freire Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment Computational and Structural Biotechnology Journal Artificial Intelligence ChatGPT Oral surgery Orthopantomography Dentistry |
| title | Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment |
| title_full | Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment |
| title_fullStr | Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment |
| title_full_unstemmed | Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment |
| title_short | Decoding wisdom: Evaluating ChatGPT's accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment |
| title_sort | decoding wisdom evaluating chatgpt s accuracy and reproducibility in analyzing orthopantomographic images for third molar assessment |
| topic | Artificial Intelligence ChatGPT Oral surgery Orthopantomography Dentistry |
| url | http://www.sciencedirect.com/science/article/pii/S200103702500131X |
| work_keys_str_mv | AT anasuarez decodingwisdomevaluatingchatgptsaccuracyandreproducibilityinanalyzingorthopantomographicimagesforthirdmolarassessment AT stefaniaarena decodingwisdomevaluatingchatgptsaccuracyandreproducibilityinanalyzingorthopantomographicimagesforthirdmolarassessment AT albertoherranzcalzada decodingwisdomevaluatingchatgptsaccuracyandreproducibilityinanalyzingorthopantomographicimagesforthirdmolarassessment AT anaisabelcastillovaron decodingwisdomevaluatingchatgptsaccuracyandreproducibilityinanalyzingorthopantomographicimagesforthirdmolarassessment AT victordiazfloresgarcia decodingwisdomevaluatingchatgptsaccuracyandreproducibilityinanalyzingorthopantomographicimagesforthirdmolarassessment AT yolandafreire decodingwisdomevaluatingchatgptsaccuracyandreproducibilityinanalyzingorthopantomographicimagesforthirdmolarassessment |