Improving Audio Recognition With Randomized Area Ratio Patch Masking: A Data Augmentation Perspective

In audio recognition, improving the accuracy and generalizability of Pretrained Audio Neural Networks (PANNs) remains challenging. This study introduces Randomized Area Ratio Patch Masking (RARPM), a novel data augmentation technique that applies random patches with varying transparency to log mel s...

Full description

Saved in:
Bibliographic Details
Main Authors: Weichun Wong, Yachun Li, Shihan Li
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10706845/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850064140074221568
author Weichun Wong
Yachun Li
Shihan Li
author_facet Weichun Wong
Yachun Li
Shihan Li
author_sort Weichun Wong
collection DOAJ
description In audio recognition, improving the accuracy and generalizability of Pretrained Audio Neural Networks (PANNs) remains challenging. This study introduces Randomized Area Ratio Patch Masking (RARPM), a novel data augmentation technique that applies random patches with varying transparency to log mel spectrograms during training. This method aims to enhance model learning by diversifying training data, optimized for the MobileNetV1 architecture. The study uses the AudioSet dataset, comprising over two million labeled sound clips, to validate the effectiveness of RARPM. The results show that RARPM achieves a mean average precision (mAP) of 0.385, surpassing the baseline SpecAugment’s mAP of 0.366. This research contributes a new strategy for data augmentation, demonstrating significant improvements in audio recognition tasks and paving the way for more robust models applicable across diverse architectures.
format Article
id doaj-art-e4a39f912d5e496a87e3713ef5aafb87
institution DOAJ
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-e4a39f912d5e496a87e3713ef5aafb872025-08-20T02:49:22ZengIEEEIEEE Access2169-35362024-01-011217254817256110.1109/ACCESS.2024.347550810706845Improving Audio Recognition With Randomized Area Ratio Patch Masking: A Data Augmentation PerspectiveWeichun Wong0https://orcid.org/0009-0001-3711-4082Yachun Li1Shihan Li2https://orcid.org/0000-0003-3055-8702Department of Electrical and Computer Engineering, Tamkang University, New Taipei City, TaiwanDepartment of Electrical and Computer Engineering, Tamkang University, New Taipei City, TaiwanDepartment of Electrical and Computer Engineering, Tamkang University, New Taipei City, TaiwanIn audio recognition, improving the accuracy and generalizability of Pretrained Audio Neural Networks (PANNs) remains challenging. This study introduces Randomized Area Ratio Patch Masking (RARPM), a novel data augmentation technique that applies random patches with varying transparency to log mel spectrograms during training. This method aims to enhance model learning by diversifying training data, optimized for the MobileNetV1 architecture. The study uses the AudioSet dataset, comprising over two million labeled sound clips, to validate the effectiveness of RARPM. The results show that RARPM achieves a mean average precision (mAP) of 0.385, surpassing the baseline SpecAugment’s mAP of 0.366. This research contributes a new strategy for data augmentation, demonstrating significant improvements in audio recognition tasks and paving the way for more robust models applicable across diverse architectures.https://ieeexplore.ieee.org/document/10706845/Randomized area ratio patch maskingdata augmentationneural networksaudio classificationspectrogram analysis
spellingShingle Weichun Wong
Yachun Li
Shihan Li
Improving Audio Recognition With Randomized Area Ratio Patch Masking: A Data Augmentation Perspective
IEEE Access
Randomized area ratio patch masking
data augmentation
neural networks
audio classification
spectrogram analysis
title Improving Audio Recognition With Randomized Area Ratio Patch Masking: A Data Augmentation Perspective
title_full Improving Audio Recognition With Randomized Area Ratio Patch Masking: A Data Augmentation Perspective
title_fullStr Improving Audio Recognition With Randomized Area Ratio Patch Masking: A Data Augmentation Perspective
title_full_unstemmed Improving Audio Recognition With Randomized Area Ratio Patch Masking: A Data Augmentation Perspective
title_short Improving Audio Recognition With Randomized Area Ratio Patch Masking: A Data Augmentation Perspective
title_sort improving audio recognition with randomized area ratio patch masking a data augmentation perspective
topic Randomized area ratio patch masking
data augmentation
neural networks
audio classification
spectrogram analysis
url https://ieeexplore.ieee.org/document/10706845/
work_keys_str_mv AT weichunwong improvingaudiorecognitionwithrandomizedarearatiopatchmaskingadataaugmentationperspective
AT yachunli improvingaudiorecognitionwithrandomizedarearatiopatchmaskingadataaugmentationperspective
AT shihanli improvingaudiorecognitionwithrandomizedarearatiopatchmaskingadataaugmentationperspective