Which Augmentation Should I Use? An Empirical Investigation of Augmentations for Self-Supervised Phonocardiogram Representation Learning

Despite recent advancements in deep learning, its application in real-world medical settings, such as phonocardiogram (PCG) classification, remains limited. A significant barrier is the lack of high-quality annotated datasets, which hampers the development of robust, generalizable models that can pe...

Full description

Saved in:

Bibliographic Details
Main Authors:	Aristotelis Ballas, Vasileios Papapanagiotou, Christos Diou
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	Contrastive learning deep learning OOD representation learning phonocardiogram classification self-supervised learning
Online Access:	https://ieeexplore.ieee.org/document/10804781/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Despite recent advancements in deep learning, its application in real-world medical settings, such as phonocardiogram (PCG) classification, remains limited. A significant barrier is the lack of high-quality annotated datasets, which hampers the development of robust, generalizable models that can perform well on newly collected, out-of-distribution (OOD) data. Self-Supervised Learning (SSL), particularly contrastive learning, has shown promise in mitigating the issue of data scarcity by leveraging unlabeled data to enhance model robustness and effectiveness. Even though SSL methods have been proposed and researched in other domains, works focusing on the impact of data augmentations on model robustness for PCG classification is limited. In particular, while augmentations are a key component in SSL, selecting the most suitable transformations during the training process is highly challenging and time-consuming. Improper augmentations can lead to substantial performance degradation, even hindering the network’s ability to learn meaningful representations. Addressing this gap, our research aims to explore and evaluate a wide range of audio-based augmentations and uncover combinations that enhance SSL model performance in PCG classification. We conduct a comprehensive comparative analysis across multiple datasets and downstream tasks, assessing the impact of various augmentations on model performance and generalization. Our findings reveal that depending on the training distribution, augmentation choice significantly influences model robustness, with fully-supervised models experiencing up to a 32% drop in effectiveness when applied to unseen data, while SSL models demonstrate greater resilience, losing only 10% or even improving in some cases. This study also sheds light on the most promising and appropriate augmentations for robust PCG signal processing, by calculating their effect size on model training. These insights equip researchers and practitioners with valuable guidelines for building more robust, reliable models in PCG signal processing.
ISSN:	2169-3536

Which Augmentation Should I Use? An Empirical Investigation of Augmentations for Self-Supervised Phonocardiogram Representation Learning

Similar Items