Improving Audio Recognition With Randomized Area Ratio Patch Masking: A Data Augmentation Perspective
In audio recognition, improving the accuracy and generalizability of Pretrained Audio Neural Networks (PANNs) remains challenging. This study introduces Randomized Area Ratio Patch Masking (RARPM), a novel data augmentation technique that applies random patches with varying transparency to log mel s...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10706845/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850064140074221568 |
|---|---|
| author | Weichun Wong Yachun Li Shihan Li |
| author_facet | Weichun Wong Yachun Li Shihan Li |
| author_sort | Weichun Wong |
| collection | DOAJ |
| description | In audio recognition, improving the accuracy and generalizability of Pretrained Audio Neural Networks (PANNs) remains challenging. This study introduces Randomized Area Ratio Patch Masking (RARPM), a novel data augmentation technique that applies random patches with varying transparency to log mel spectrograms during training. This method aims to enhance model learning by diversifying training data, optimized for the MobileNetV1 architecture. The study uses the AudioSet dataset, comprising over two million labeled sound clips, to validate the effectiveness of RARPM. The results show that RARPM achieves a mean average precision (mAP) of 0.385, surpassing the baseline SpecAugment’s mAP of 0.366. This research contributes a new strategy for data augmentation, demonstrating significant improvements in audio recognition tasks and paving the way for more robust models applicable across diverse architectures. |
| format | Article |
| id | doaj-art-e4a39f912d5e496a87e3713ef5aafb87 |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-e4a39f912d5e496a87e3713ef5aafb872025-08-20T02:49:22ZengIEEEIEEE Access2169-35362024-01-011217254817256110.1109/ACCESS.2024.347550810706845Improving Audio Recognition With Randomized Area Ratio Patch Masking: A Data Augmentation PerspectiveWeichun Wong0https://orcid.org/0009-0001-3711-4082Yachun Li1Shihan Li2https://orcid.org/0000-0003-3055-8702Department of Electrical and Computer Engineering, Tamkang University, New Taipei City, TaiwanDepartment of Electrical and Computer Engineering, Tamkang University, New Taipei City, TaiwanDepartment of Electrical and Computer Engineering, Tamkang University, New Taipei City, TaiwanIn audio recognition, improving the accuracy and generalizability of Pretrained Audio Neural Networks (PANNs) remains challenging. This study introduces Randomized Area Ratio Patch Masking (RARPM), a novel data augmentation technique that applies random patches with varying transparency to log mel spectrograms during training. This method aims to enhance model learning by diversifying training data, optimized for the MobileNetV1 architecture. The study uses the AudioSet dataset, comprising over two million labeled sound clips, to validate the effectiveness of RARPM. The results show that RARPM achieves a mean average precision (mAP) of 0.385, surpassing the baseline SpecAugment’s mAP of 0.366. This research contributes a new strategy for data augmentation, demonstrating significant improvements in audio recognition tasks and paving the way for more robust models applicable across diverse architectures.https://ieeexplore.ieee.org/document/10706845/Randomized area ratio patch maskingdata augmentationneural networksaudio classificationspectrogram analysis |
| spellingShingle | Weichun Wong Yachun Li Shihan Li Improving Audio Recognition With Randomized Area Ratio Patch Masking: A Data Augmentation Perspective IEEE Access Randomized area ratio patch masking data augmentation neural networks audio classification spectrogram analysis |
| title | Improving Audio Recognition With Randomized Area Ratio Patch Masking: A Data Augmentation Perspective |
| title_full | Improving Audio Recognition With Randomized Area Ratio Patch Masking: A Data Augmentation Perspective |
| title_fullStr | Improving Audio Recognition With Randomized Area Ratio Patch Masking: A Data Augmentation Perspective |
| title_full_unstemmed | Improving Audio Recognition With Randomized Area Ratio Patch Masking: A Data Augmentation Perspective |
| title_short | Improving Audio Recognition With Randomized Area Ratio Patch Masking: A Data Augmentation Perspective |
| title_sort | improving audio recognition with randomized area ratio patch masking a data augmentation perspective |
| topic | Randomized area ratio patch masking data augmentation neural networks audio classification spectrogram analysis |
| url | https://ieeexplore.ieee.org/document/10706845/ |
| work_keys_str_mv | AT weichunwong improvingaudiorecognitionwithrandomizedarearatiopatchmaskingadataaugmentationperspective AT yachunli improvingaudiorecognitionwithrandomizedarearatiopatchmaskingadataaugmentationperspective AT shihanli improvingaudiorecognitionwithrandomizedarearatiopatchmaskingadataaugmentationperspective |