Evaluating Uncertainty Quantification in Medical Image Segmentation: A Multi-Dataset, Multi-Algorithm Study

Deep learning is revolutionizing various scientific fields, with medical applications at the forefront. One key focus is automating image segmentation, a process crucial in many clinical services. However, medical images are often ambiguous and challenging even for experts. To address this, reliable...

Full description

Saved in:
Bibliographic Details
Main Authors: Nyaz Jalal, Małgorzata Śliwińska, Wadim Wojciechowski, Iwona Kucybała, Miłosz Rozynek, Kamil Krupa, Patrycja Matusik, Jarosław Jarczewski, Zbisław Tabor
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/14/21/10020
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deep learning is revolutionizing various scientific fields, with medical applications at the forefront. One key focus is automating image segmentation, a process crucial in many clinical services. However, medical images are often ambiguous and challenging even for experts. To address this, reliable models need to quantify their uncertainty, allowing physicians to understand the model’s confidence in its segmentation. This paper explores how the performance and uncertainty of a model are influenced by the number of annotations per input sample. We examine the effects of both single and multiple manual annotations on various deep learning architectures. To tackle this question, we employ three widely recognized deep learning architectures and evaluate them across four publicly available datasets. Furthermore, we explore the effects of dropout rates on Monte Carlo models by examining uncertainty models with dropout rates of 20%, 40%, 60%, and 80%. Subsequently, we evaluate the models using various measurement metrics. The findings reveal that the influence of multiple annotations varies significantly depending on the datasets. Additionally, we observe that the dropout rate has minimal or no impact on the model’s performance unless there is a substantial loss of training data, primarily evident in the 80% dropout rate scenario.
ISSN:2076-3417