Artificial intelligence-guided distal radius fracture detection on plain radiographs in comparison with human raters

Abstract Background The aim of this study was to compare the performance of artificial intelligence (AI) in detecting distal radius fractures (DRFs) on plain radiographs with the performance of human raters. Methods We retrospectively analysed all wrist radiographs taken in our hospital since the in...

Full description

Saved in:
Bibliographic Details
Main Authors: Nikolai Ramadanov, Patric John, Robert Hable, Andreas Georg Schreyer, Simon Shabo, Robert Prill, Mikhail Salzmann
Format: Article
Language:English
Published: BMC 2025-05-01
Series:Journal of Orthopaedic Surgery and Research
Online Access:https://doi.org/10.1186/s13018-025-05888-9
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850272914395365376
author Nikolai Ramadanov
Patric John
Robert Hable
Andreas Georg Schreyer
Simon Shabo
Robert Prill
Mikhail Salzmann
author_facet Nikolai Ramadanov
Patric John
Robert Hable
Andreas Georg Schreyer
Simon Shabo
Robert Prill
Mikhail Salzmann
author_sort Nikolai Ramadanov
collection DOAJ
description Abstract Background The aim of this study was to compare the performance of artificial intelligence (AI) in detecting distal radius fractures (DRFs) on plain radiographs with the performance of human raters. Methods We retrospectively analysed all wrist radiographs taken in our hospital since the introduction of AI-guided fracture detection from 11 September 2023 to 10 September 2024. The ground truth was defined by the radiological report of a board-certified radiologist based solely on conventional radiographs. The following parameters were calculated: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN), accuracy (%), Cohen’s Kappa coefficient, F1 score, sensitivity (%), specificity (%), Youden Index (J Statistic). Results In total 1145 plain radiographs of the wrist were taken between 11 September 2023 and 10 September 2024. The mean age of the included patients was 46.6 years (± 27.3), ranging from 2 to 99 years and 59.0% were female. According to the ground truth, of the 556 anteroposterior (AP) radiographs, 225 cases (40.5%) had a DRF, and of the 589 lateral view radiographs, 240 cases (40.7%) had a DRF. The AI system showed the following results on AP radiographs: accuracy (%): 95.90; Cohen’s Kappa: 0.913; F1 score: 0.947; sensitivity (%): 92.02; specificity (%): 98.45; Youden Index: 90.47. The orthopedic surgeon achieved a sensitivity of 91.5%, specificity of 97.8%, an overall accuracy of 95.1%, F1 score of 0.943, and Cohen’s kappa of 0.901. These results were comparable to those of the AI model. Conclusion AI-guided detection of DRF demonstrated diagnostic performance nearly identical to that of an experienced orthopedic surgeon across all key metrics. The marginal differences observed in sensitivity and specificity suggest that AI can reliably support clinical fracture assessment based solely on conventional radiographs.
format Article
id doaj-art-cf9d8a3419f94636b0ff1cbd8950cc8e
institution OA Journals
issn 1749-799X
language English
publishDate 2025-05-01
publisher BMC
record_format Article
series Journal of Orthopaedic Surgery and Research
spelling doaj-art-cf9d8a3419f94636b0ff1cbd8950cc8e2025-08-20T01:51:39ZengBMCJournal of Orthopaedic Surgery and Research1749-799X2025-05-012011910.1186/s13018-025-05888-9Artificial intelligence-guided distal radius fracture detection on plain radiographs in comparison with human ratersNikolai Ramadanov0Patric John1Robert Hable2Andreas Georg Schreyer3Simon Shabo4Robert Prill5Mikhail Salzmann6Center of Orthopaedics and Traumatology, Brandenburg Medical School, University Hospital Brandenburg/HavelCenter of Orthopaedics and Traumatology, Brandenburg Medical School, University Hospital Brandenburg/HavelFaculty of Applied Computer Science, Deggendorf Institute of TechnologyFaculty of Health Science Brandenburg, Brandenburg Medical School Theodor FontaneFaculty of Health Science Brandenburg, Brandenburg Medical School Theodor FontaneCenter of Orthopaedics and Traumatology, Brandenburg Medical School, University Hospital Brandenburg/HavelCenter of Orthopaedics and Traumatology, Brandenburg Medical School, University Hospital Brandenburg/HavelAbstract Background The aim of this study was to compare the performance of artificial intelligence (AI) in detecting distal radius fractures (DRFs) on plain radiographs with the performance of human raters. Methods We retrospectively analysed all wrist radiographs taken in our hospital since the introduction of AI-guided fracture detection from 11 September 2023 to 10 September 2024. The ground truth was defined by the radiological report of a board-certified radiologist based solely on conventional radiographs. The following parameters were calculated: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN), accuracy (%), Cohen’s Kappa coefficient, F1 score, sensitivity (%), specificity (%), Youden Index (J Statistic). Results In total 1145 plain radiographs of the wrist were taken between 11 September 2023 and 10 September 2024. The mean age of the included patients was 46.6 years (± 27.3), ranging from 2 to 99 years and 59.0% were female. According to the ground truth, of the 556 anteroposterior (AP) radiographs, 225 cases (40.5%) had a DRF, and of the 589 lateral view radiographs, 240 cases (40.7%) had a DRF. The AI system showed the following results on AP radiographs: accuracy (%): 95.90; Cohen’s Kappa: 0.913; F1 score: 0.947; sensitivity (%): 92.02; specificity (%): 98.45; Youden Index: 90.47. The orthopedic surgeon achieved a sensitivity of 91.5%, specificity of 97.8%, an overall accuracy of 95.1%, F1 score of 0.943, and Cohen’s kappa of 0.901. These results were comparable to those of the AI model. Conclusion AI-guided detection of DRF demonstrated diagnostic performance nearly identical to that of an experienced orthopedic surgeon across all key metrics. The marginal differences observed in sensitivity and specificity suggest that AI can reliably support clinical fracture assessment based solely on conventional radiographs.https://doi.org/10.1186/s13018-025-05888-9
spellingShingle Nikolai Ramadanov
Patric John
Robert Hable
Andreas Georg Schreyer
Simon Shabo
Robert Prill
Mikhail Salzmann
Artificial intelligence-guided distal radius fracture detection on plain radiographs in comparison with human raters
Journal of Orthopaedic Surgery and Research
title Artificial intelligence-guided distal radius fracture detection on plain radiographs in comparison with human raters
title_full Artificial intelligence-guided distal radius fracture detection on plain radiographs in comparison with human raters
title_fullStr Artificial intelligence-guided distal radius fracture detection on plain radiographs in comparison with human raters
title_full_unstemmed Artificial intelligence-guided distal radius fracture detection on plain radiographs in comparison with human raters
title_short Artificial intelligence-guided distal radius fracture detection on plain radiographs in comparison with human raters
title_sort artificial intelligence guided distal radius fracture detection on plain radiographs in comparison with human raters
url https://doi.org/10.1186/s13018-025-05888-9
work_keys_str_mv AT nikolairamadanov artificialintelligenceguideddistalradiusfracturedetectiononplainradiographsincomparisonwithhumanraters
AT patricjohn artificialintelligenceguideddistalradiusfracturedetectiononplainradiographsincomparisonwithhumanraters
AT roberthable artificialintelligenceguideddistalradiusfracturedetectiononplainradiographsincomparisonwithhumanraters
AT andreasgeorgschreyer artificialintelligenceguideddistalradiusfracturedetectiononplainradiographsincomparisonwithhumanraters
AT simonshabo artificialintelligenceguideddistalradiusfracturedetectiononplainradiographsincomparisonwithhumanraters
AT robertprill artificialintelligenceguideddistalradiusfracturedetectiononplainradiographsincomparisonwithhumanraters
AT mikhailsalzmann artificialintelligenceguideddistalradiusfracturedetectiononplainradiographsincomparisonwithhumanraters