Impact of Radiologist Experience on AI Annotation Quality in Chest Radiographs: A Comparative Analysis

<b>Background/Objectives</b>: In the burgeoning field of medical imaging and Artificial Intelligence (AI), high-quality annotations for training AI-models are crucial. However, there are still only a few large datasets, as segmentation is time-consuming, experts have limited time. This s...

Full description

Saved in:
Bibliographic Details
Main Authors: Malte Michel Multusch, Lasse Hansen, Mattias Paul Heinrich, Lennart Berkel, Axel Saalbach, Heinrich Schulz, Franz Wegner, Joerg Barkhausen, Malte Maria Sieren
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Diagnostics
Subjects:
Online Access:https://www.mdpi.com/2075-4418/15/6/777
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850205127146733568
author Malte Michel Multusch
Lasse Hansen
Mattias Paul Heinrich
Lennart Berkel
Axel Saalbach
Heinrich Schulz
Franz Wegner
Joerg Barkhausen
Malte Maria Sieren
author_facet Malte Michel Multusch
Lasse Hansen
Mattias Paul Heinrich
Lennart Berkel
Axel Saalbach
Heinrich Schulz
Franz Wegner
Joerg Barkhausen
Malte Maria Sieren
author_sort Malte Michel Multusch
collection DOAJ
description <b>Background/Objectives</b>: In the burgeoning field of medical imaging and Artificial Intelligence (AI), high-quality annotations for training AI-models are crucial. However, there are still only a few large datasets, as segmentation is time-consuming, experts have limited time. This study investigates how the experience of radiologists affects the quality of annotations. <b>Methods</b>: We randomly collected 53 anonymized chest radiographs. Fifteen readers with varying levels of expertise annotated the anatomical structures of different complexity, pneumonic opacities and central venous catheters (CVC) as examples of pathologies and foreign material. The readers were divided into three groups of five. The groups consisted of medical students (MS), junior professionals (JP) with less than five years of working experience and senior professionals (SP) with more than five years of experience. Each annotation was compared to a gold standard consisting of a consensus annotation of three senior board-certified radiologists. We calculated the Dice coefficient (DSC) and Hausdorff distance (HD) to evaluate annotation quality. Inter- and intrareader variability and time dependencies were investigated using Intraclass Correlation Coefficient (ICC) and Ordinary Least Squares (OLS). <b>Results</b>: Senior professionals generally showed better performance, while medical students had higher variability in their annotations. Significant differences were noted, especially for complex structures (DSC Pneumonic Opacities as mean [standard deviation]: MS: 0.516 [0.246]; SP: 0.631 [0.211]). However, it should be noted that overall deviation and intraclass variance was higher for these structures even for seniors, highlighting the inherent limitations of conventional radiography. Experience showed a positive relationship with annotation quality for VCS and lung but was not a significant factor for other structures. <b>Conclusions</b>: Experience level significantly impacts annotation quality. Senior radiologists provided higher-quality annotations for complex structures, while less experienced readers could still annotate simpler structures with satisfying accuracy. We suggest a mixed-expertise approach, enabling the highly experienced to utilize their knowledge most effectively. With the increase in numbers of examinations, radiology will rely on AI support tools in the future. Therefore, economizing the process of data acquisition and AI-training; for example, by integrating less experienced radiologists, will help to meet the coming challenges.
format Article
id doaj-art-a9999aa0380749ea87ee79a595c82e43
institution OA Journals
issn 2075-4418
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Diagnostics
spelling doaj-art-a9999aa0380749ea87ee79a595c82e432025-08-20T02:11:09ZengMDPI AGDiagnostics2075-44182025-03-0115677710.3390/diagnostics15060777Impact of Radiologist Experience on AI Annotation Quality in Chest Radiographs: A Comparative AnalysisMalte Michel Multusch0Lasse Hansen1Mattias Paul Heinrich2Lennart Berkel3Axel Saalbach4Heinrich Schulz5Franz Wegner6Joerg Barkhausen7Malte Maria Sieren8Department of Radiology and Nuclear Medicine, UKSH, 23538 Lübeck, GermanyEchoScout GmbH, 23562 Lübeck, GermanyInstitute of Medical Informatics, University of Lübeck, 23538 Lübeck, GermanyDepartment of Radiology and Nuclear Medicine, UKSH, 23538 Lübeck, GermanyPhilips Innovative Technologies, 22335 Hamburg, GermanyPhilips Innovative Technologies, 22335 Hamburg, GermanyDepartment of Radiology and Nuclear Medicine, UKSH, 23538 Lübeck, GermanyDepartment of Radiology and Nuclear Medicine, UKSH, 23538 Lübeck, GermanyDepartment of Radiology and Nuclear Medicine, UKSH, 23538 Lübeck, Germany<b>Background/Objectives</b>: In the burgeoning field of medical imaging and Artificial Intelligence (AI), high-quality annotations for training AI-models are crucial. However, there are still only a few large datasets, as segmentation is time-consuming, experts have limited time. This study investigates how the experience of radiologists affects the quality of annotations. <b>Methods</b>: We randomly collected 53 anonymized chest radiographs. Fifteen readers with varying levels of expertise annotated the anatomical structures of different complexity, pneumonic opacities and central venous catheters (CVC) as examples of pathologies and foreign material. The readers were divided into three groups of five. The groups consisted of medical students (MS), junior professionals (JP) with less than five years of working experience and senior professionals (SP) with more than five years of experience. Each annotation was compared to a gold standard consisting of a consensus annotation of three senior board-certified radiologists. We calculated the Dice coefficient (DSC) and Hausdorff distance (HD) to evaluate annotation quality. Inter- and intrareader variability and time dependencies were investigated using Intraclass Correlation Coefficient (ICC) and Ordinary Least Squares (OLS). <b>Results</b>: Senior professionals generally showed better performance, while medical students had higher variability in their annotations. Significant differences were noted, especially for complex structures (DSC Pneumonic Opacities as mean [standard deviation]: MS: 0.516 [0.246]; SP: 0.631 [0.211]). However, it should be noted that overall deviation and intraclass variance was higher for these structures even for seniors, highlighting the inherent limitations of conventional radiography. Experience showed a positive relationship with annotation quality for VCS and lung but was not a significant factor for other structures. <b>Conclusions</b>: Experience level significantly impacts annotation quality. Senior radiologists provided higher-quality annotations for complex structures, while less experienced readers could still annotate simpler structures with satisfying accuracy. We suggest a mixed-expertise approach, enabling the highly experienced to utilize their knowledge most effectively. With the increase in numbers of examinations, radiology will rely on AI support tools in the future. Therefore, economizing the process of data acquisition and AI-training; for example, by integrating less experienced radiologists, will help to meet the coming challenges.https://www.mdpi.com/2075-4418/15/6/777annotation qualityinterreader comparisonchest radiographAI research
spellingShingle Malte Michel Multusch
Lasse Hansen
Mattias Paul Heinrich
Lennart Berkel
Axel Saalbach
Heinrich Schulz
Franz Wegner
Joerg Barkhausen
Malte Maria Sieren
Impact of Radiologist Experience on AI Annotation Quality in Chest Radiographs: A Comparative Analysis
Diagnostics
annotation quality
interreader comparison
chest radiograph
AI research
title Impact of Radiologist Experience on AI Annotation Quality in Chest Radiographs: A Comparative Analysis
title_full Impact of Radiologist Experience on AI Annotation Quality in Chest Radiographs: A Comparative Analysis
title_fullStr Impact of Radiologist Experience on AI Annotation Quality in Chest Radiographs: A Comparative Analysis
title_full_unstemmed Impact of Radiologist Experience on AI Annotation Quality in Chest Radiographs: A Comparative Analysis
title_short Impact of Radiologist Experience on AI Annotation Quality in Chest Radiographs: A Comparative Analysis
title_sort impact of radiologist experience on ai annotation quality in chest radiographs a comparative analysis
topic annotation quality
interreader comparison
chest radiograph
AI research
url https://www.mdpi.com/2075-4418/15/6/777
work_keys_str_mv AT maltemichelmultusch impactofradiologistexperienceonaiannotationqualityinchestradiographsacomparativeanalysis
AT lassehansen impactofradiologistexperienceonaiannotationqualityinchestradiographsacomparativeanalysis
AT mattiaspaulheinrich impactofradiologistexperienceonaiannotationqualityinchestradiographsacomparativeanalysis
AT lennartberkel impactofradiologistexperienceonaiannotationqualityinchestradiographsacomparativeanalysis
AT axelsaalbach impactofradiologistexperienceonaiannotationqualityinchestradiographsacomparativeanalysis
AT heinrichschulz impactofradiologistexperienceonaiannotationqualityinchestradiographsacomparativeanalysis
AT franzwegner impactofradiologistexperienceonaiannotationqualityinchestradiographsacomparativeanalysis
AT joergbarkhausen impactofradiologistexperienceonaiannotationqualityinchestradiographsacomparativeanalysis
AT maltemariasieren impactofradiologistexperienceonaiannotationqualityinchestradiographsacomparativeanalysis