Enhancing voice spoofing detection in noisy environments using frequency feature masking augmentation

In the rapidly evolving landscape of voice-related technology, high-tech companies are developing multifaceted voice assistants, tailored to their specific organizational goals. This technological evolution, however, introduces heightened security vulnerabilities such as voice spoofing attacks. To a...

Full description

Saved in:
Bibliographic Details
Main Authors: Soyul Han, Jaejin Seo, Sunmook Choi, Taein Kang, Sanghyeok Chung, Seungeun Lee, Seoyoung Park, Seungsang Oh, Il-Youp Kwak
Format: Article
Language:English
Published: Elsevier 2025-03-01
Series:Engineering Science and Technology, an International Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2215098625000278
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In the rapidly evolving landscape of voice-related technology, high-tech companies are developing multifaceted voice assistants, tailored to their specific organizational goals. This technological evolution, however, introduces heightened security vulnerabilities such as voice spoofing attacks. To address voice spoofing challenges, various competitions like ASVspoof 2015, 2017, 2019, 2021, and ADD 2022 have emerged. ADD 2022’s Track 1 aimed to classify genuine and fake speech signals in the presence of noise. Our exploratory data analysis revealed that for a given speech sample, noisy signals tend to occur within similar frequency bands. If a model is heavily reliant on data within frequency ranges that contains noise, its performance will be suboptimal. To address this issue, we propose a data augmentation technique called Frequency Feature Masking (FFM), which randomly masks frequency bands. FFM helps prevent overfitting and enhances the model’s robustness by avoiding reliance on specific frequency bands. Furthermore, we propose a frequency band masking method using a bell-shaped filter. This allows for smooth transitions between masked and unmasked frequencies, enabling the model to naturally mimic frequency variations in real speech signals. We compare the performance of various data augmentation methods with FFM in two spoofing detection datasets, ASVspoof 2019 LA and ADD 2022. The proposed FFM augmentation achieves state-of-the-art results in both datasets. The ADD 2022 dataset showed an improvement of approximately 51% after the application of FFM, while there was a 54% improvement in the ASVspoof 2019 LA dataset. In addition, we have made the code and demo used in the experiment publicly available.
ISSN:2215-0986