Enhancing voice spoofing detection in noisy environments using frequency feature masking augmentation
In the rapidly evolving landscape of voice-related technology, high-tech companies are developing multifaceted voice assistants, tailored to their specific organizational goals. This technological evolution, however, introduces heightened security vulnerabilities such as voice spoofing attacks. To a...
Saved in:
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-03-01
|
Series: | Engineering Science and Technology, an International Journal |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2215098625000278 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1825206885418008576 |
---|---|
author | Soyul Han Jaejin Seo Sunmook Choi Taein Kang Sanghyeok Chung Seungeun Lee Seoyoung Park Seungsang Oh Il-Youp Kwak |
author_facet | Soyul Han Jaejin Seo Sunmook Choi Taein Kang Sanghyeok Chung Seungeun Lee Seoyoung Park Seungsang Oh Il-Youp Kwak |
author_sort | Soyul Han |
collection | DOAJ |
description | In the rapidly evolving landscape of voice-related technology, high-tech companies are developing multifaceted voice assistants, tailored to their specific organizational goals. This technological evolution, however, introduces heightened security vulnerabilities such as voice spoofing attacks. To address voice spoofing challenges, various competitions like ASVspoof 2015, 2017, 2019, 2021, and ADD 2022 have emerged. ADD 2022’s Track 1 aimed to classify genuine and fake speech signals in the presence of noise. Our exploratory data analysis revealed that for a given speech sample, noisy signals tend to occur within similar frequency bands. If a model is heavily reliant on data within frequency ranges that contains noise, its performance will be suboptimal. To address this issue, we propose a data augmentation technique called Frequency Feature Masking (FFM), which randomly masks frequency bands. FFM helps prevent overfitting and enhances the model’s robustness by avoiding reliance on specific frequency bands. Furthermore, we propose a frequency band masking method using a bell-shaped filter. This allows for smooth transitions between masked and unmasked frequencies, enabling the model to naturally mimic frequency variations in real speech signals. We compare the performance of various data augmentation methods with FFM in two spoofing detection datasets, ASVspoof 2019 LA and ADD 2022. The proposed FFM augmentation achieves state-of-the-art results in both datasets. The ADD 2022 dataset showed an improvement of approximately 51% after the application of FFM, while there was a 54% improvement in the ASVspoof 2019 LA dataset. In addition, we have made the code and demo used in the experiment publicly available. |
format | Article |
id | doaj-art-8f86703526234adaa0fdc674232a87aa |
institution | Kabale University |
issn | 2215-0986 |
language | English |
publishDate | 2025-03-01 |
publisher | Elsevier |
record_format | Article |
series | Engineering Science and Technology, an International Journal |
spelling | doaj-art-8f86703526234adaa0fdc674232a87aa2025-02-07T04:47:41ZengElsevierEngineering Science and Technology, an International Journal2215-09862025-03-0163101972Enhancing voice spoofing detection in noisy environments using frequency feature masking augmentationSoyul Han0Jaejin Seo1Sunmook Choi2Taein Kang3Sanghyeok Chung4Seungeun Lee5Seoyoung Park6Seungsang Oh7Il-Youp Kwak8Institute for Data Innovation in Science, Seoul National University, Seoul 08826, South Korea; Department of Statistics and Data Science, Chung-Ang University, Seoul 06974, South KoreaDepartment of Statistics and Data Science, Chung-Ang University, Seoul 06974, South KoreaDepartment of Mathematics, Korea University, Seoul 02841, South Korea; Center for Applied Mathematics, Cornell University, Ithaca, NY 14850, USADepartment of Statistics and Data Science, Chung-Ang University, Seoul 06974, South KoreaDepartment of Mathematics, Korea University, Seoul 02841, South KoreaDepartment of Mathematics, Korea University, Seoul 02841, South KoreaDepartment of Statistics and Data Science, Chung-Ang University, Seoul 06974, South KoreaDepartment of Mathematics, Korea University, Seoul 02841, South Korea; Corresponding authors.Department of Statistics and Data Science, Chung-Ang University, Seoul 06974, South Korea; Corresponding authors.In the rapidly evolving landscape of voice-related technology, high-tech companies are developing multifaceted voice assistants, tailored to their specific organizational goals. This technological evolution, however, introduces heightened security vulnerabilities such as voice spoofing attacks. To address voice spoofing challenges, various competitions like ASVspoof 2015, 2017, 2019, 2021, and ADD 2022 have emerged. ADD 2022’s Track 1 aimed to classify genuine and fake speech signals in the presence of noise. Our exploratory data analysis revealed that for a given speech sample, noisy signals tend to occur within similar frequency bands. If a model is heavily reliant on data within frequency ranges that contains noise, its performance will be suboptimal. To address this issue, we propose a data augmentation technique called Frequency Feature Masking (FFM), which randomly masks frequency bands. FFM helps prevent overfitting and enhances the model’s robustness by avoiding reliance on specific frequency bands. Furthermore, we propose a frequency band masking method using a bell-shaped filter. This allows for smooth transitions between masked and unmasked frequencies, enabling the model to naturally mimic frequency variations in real speech signals. We compare the performance of various data augmentation methods with FFM in two spoofing detection datasets, ASVspoof 2019 LA and ADD 2022. The proposed FFM augmentation achieves state-of-the-art results in both datasets. The ADD 2022 dataset showed an improvement of approximately 51% after the application of FFM, while there was a 54% improvement in the ASVspoof 2019 LA dataset. In addition, we have made the code and demo used in the experiment publicly available.http://www.sciencedirect.com/science/article/pii/S2215098625000278Voice spoofing detectionFake audio detectionData augmentationDeep learning |
spellingShingle | Soyul Han Jaejin Seo Sunmook Choi Taein Kang Sanghyeok Chung Seungeun Lee Seoyoung Park Seungsang Oh Il-Youp Kwak Enhancing voice spoofing detection in noisy environments using frequency feature masking augmentation Engineering Science and Technology, an International Journal Voice spoofing detection Fake audio detection Data augmentation Deep learning |
title | Enhancing voice spoofing detection in noisy environments using frequency feature masking augmentation |
title_full | Enhancing voice spoofing detection in noisy environments using frequency feature masking augmentation |
title_fullStr | Enhancing voice spoofing detection in noisy environments using frequency feature masking augmentation |
title_full_unstemmed | Enhancing voice spoofing detection in noisy environments using frequency feature masking augmentation |
title_short | Enhancing voice spoofing detection in noisy environments using frequency feature masking augmentation |
title_sort | enhancing voice spoofing detection in noisy environments using frequency feature masking augmentation |
topic | Voice spoofing detection Fake audio detection Data augmentation Deep learning |
url | http://www.sciencedirect.com/science/article/pii/S2215098625000278 |
work_keys_str_mv | AT soyulhan enhancingvoicespoofingdetectioninnoisyenvironmentsusingfrequencyfeaturemaskingaugmentation AT jaejinseo enhancingvoicespoofingdetectioninnoisyenvironmentsusingfrequencyfeaturemaskingaugmentation AT sunmookchoi enhancingvoicespoofingdetectioninnoisyenvironmentsusingfrequencyfeaturemaskingaugmentation AT taeinkang enhancingvoicespoofingdetectioninnoisyenvironmentsusingfrequencyfeaturemaskingaugmentation AT sanghyeokchung enhancingvoicespoofingdetectioninnoisyenvironmentsusingfrequencyfeaturemaskingaugmentation AT seungeunlee enhancingvoicespoofingdetectioninnoisyenvironmentsusingfrequencyfeaturemaskingaugmentation AT seoyoungpark enhancingvoicespoofingdetectioninnoisyenvironmentsusingfrequencyfeaturemaskingaugmentation AT seungsangoh enhancingvoicespoofingdetectioninnoisyenvironmentsusingfrequencyfeaturemaskingaugmentation AT ilyoupkwak enhancingvoicespoofingdetectioninnoisyenvironmentsusingfrequencyfeaturemaskingaugmentation |