Enhancing voice spoofing detection in noisy environments using frequency feature masking augmentation

In the rapidly evolving landscape of voice-related technology, high-tech companies are developing multifaceted voice assistants, tailored to their specific organizational goals. This technological evolution, however, introduces heightened security vulnerabilities such as voice spoofing attacks. To a...

Full description

Saved in:
Bibliographic Details
Main Authors: Soyul Han, Jaejin Seo, Sunmook Choi, Taein Kang, Sanghyeok Chung, Seungeun Lee, Seoyoung Park, Seungsang Oh, Il-Youp Kwak
Format: Article
Language:English
Published: Elsevier 2025-03-01
Series:Engineering Science and Technology, an International Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2215098625000278
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1825206885418008576
author Soyul Han
Jaejin Seo
Sunmook Choi
Taein Kang
Sanghyeok Chung
Seungeun Lee
Seoyoung Park
Seungsang Oh
Il-Youp Kwak
author_facet Soyul Han
Jaejin Seo
Sunmook Choi
Taein Kang
Sanghyeok Chung
Seungeun Lee
Seoyoung Park
Seungsang Oh
Il-Youp Kwak
author_sort Soyul Han
collection DOAJ
description In the rapidly evolving landscape of voice-related technology, high-tech companies are developing multifaceted voice assistants, tailored to their specific organizational goals. This technological evolution, however, introduces heightened security vulnerabilities such as voice spoofing attacks. To address voice spoofing challenges, various competitions like ASVspoof 2015, 2017, 2019, 2021, and ADD 2022 have emerged. ADD 2022’s Track 1 aimed to classify genuine and fake speech signals in the presence of noise. Our exploratory data analysis revealed that for a given speech sample, noisy signals tend to occur within similar frequency bands. If a model is heavily reliant on data within frequency ranges that contains noise, its performance will be suboptimal. To address this issue, we propose a data augmentation technique called Frequency Feature Masking (FFM), which randomly masks frequency bands. FFM helps prevent overfitting and enhances the model’s robustness by avoiding reliance on specific frequency bands. Furthermore, we propose a frequency band masking method using a bell-shaped filter. This allows for smooth transitions between masked and unmasked frequencies, enabling the model to naturally mimic frequency variations in real speech signals. We compare the performance of various data augmentation methods with FFM in two spoofing detection datasets, ASVspoof 2019 LA and ADD 2022. The proposed FFM augmentation achieves state-of-the-art results in both datasets. The ADD 2022 dataset showed an improvement of approximately 51% after the application of FFM, while there was a 54% improvement in the ASVspoof 2019 LA dataset. In addition, we have made the code and demo used in the experiment publicly available.
format Article
id doaj-art-8f86703526234adaa0fdc674232a87aa
institution Kabale University
issn 2215-0986
language English
publishDate 2025-03-01
publisher Elsevier
record_format Article
series Engineering Science and Technology, an International Journal
spelling doaj-art-8f86703526234adaa0fdc674232a87aa2025-02-07T04:47:41ZengElsevierEngineering Science and Technology, an International Journal2215-09862025-03-0163101972Enhancing voice spoofing detection in noisy environments using frequency feature masking augmentationSoyul Han0Jaejin Seo1Sunmook Choi2Taein Kang3Sanghyeok Chung4Seungeun Lee5Seoyoung Park6Seungsang Oh7Il-Youp Kwak8Institute for Data Innovation in Science, Seoul National University, Seoul 08826, South Korea; Department of Statistics and Data Science, Chung-Ang University, Seoul 06974, South KoreaDepartment of Statistics and Data Science, Chung-Ang University, Seoul 06974, South KoreaDepartment of Mathematics, Korea University, Seoul 02841, South Korea; Center for Applied Mathematics, Cornell University, Ithaca, NY 14850, USADepartment of Statistics and Data Science, Chung-Ang University, Seoul 06974, South KoreaDepartment of Mathematics, Korea University, Seoul 02841, South KoreaDepartment of Mathematics, Korea University, Seoul 02841, South KoreaDepartment of Statistics and Data Science, Chung-Ang University, Seoul 06974, South KoreaDepartment of Mathematics, Korea University, Seoul 02841, South Korea; Corresponding authors.Department of Statistics and Data Science, Chung-Ang University, Seoul 06974, South Korea; Corresponding authors.In the rapidly evolving landscape of voice-related technology, high-tech companies are developing multifaceted voice assistants, tailored to their specific organizational goals. This technological evolution, however, introduces heightened security vulnerabilities such as voice spoofing attacks. To address voice spoofing challenges, various competitions like ASVspoof 2015, 2017, 2019, 2021, and ADD 2022 have emerged. ADD 2022’s Track 1 aimed to classify genuine and fake speech signals in the presence of noise. Our exploratory data analysis revealed that for a given speech sample, noisy signals tend to occur within similar frequency bands. If a model is heavily reliant on data within frequency ranges that contains noise, its performance will be suboptimal. To address this issue, we propose a data augmentation technique called Frequency Feature Masking (FFM), which randomly masks frequency bands. FFM helps prevent overfitting and enhances the model’s robustness by avoiding reliance on specific frequency bands. Furthermore, we propose a frequency band masking method using a bell-shaped filter. This allows for smooth transitions between masked and unmasked frequencies, enabling the model to naturally mimic frequency variations in real speech signals. We compare the performance of various data augmentation methods with FFM in two spoofing detection datasets, ASVspoof 2019 LA and ADD 2022. The proposed FFM augmentation achieves state-of-the-art results in both datasets. The ADD 2022 dataset showed an improvement of approximately 51% after the application of FFM, while there was a 54% improvement in the ASVspoof 2019 LA dataset. In addition, we have made the code and demo used in the experiment publicly available.http://www.sciencedirect.com/science/article/pii/S2215098625000278Voice spoofing detectionFake audio detectionData augmentationDeep learning
spellingShingle Soyul Han
Jaejin Seo
Sunmook Choi
Taein Kang
Sanghyeok Chung
Seungeun Lee
Seoyoung Park
Seungsang Oh
Il-Youp Kwak
Enhancing voice spoofing detection in noisy environments using frequency feature masking augmentation
Engineering Science and Technology, an International Journal
Voice spoofing detection
Fake audio detection
Data augmentation
Deep learning
title Enhancing voice spoofing detection in noisy environments using frequency feature masking augmentation
title_full Enhancing voice spoofing detection in noisy environments using frequency feature masking augmentation
title_fullStr Enhancing voice spoofing detection in noisy environments using frequency feature masking augmentation
title_full_unstemmed Enhancing voice spoofing detection in noisy environments using frequency feature masking augmentation
title_short Enhancing voice spoofing detection in noisy environments using frequency feature masking augmentation
title_sort enhancing voice spoofing detection in noisy environments using frequency feature masking augmentation
topic Voice spoofing detection
Fake audio detection
Data augmentation
Deep learning
url http://www.sciencedirect.com/science/article/pii/S2215098625000278
work_keys_str_mv AT soyulhan enhancingvoicespoofingdetectioninnoisyenvironmentsusingfrequencyfeaturemaskingaugmentation
AT jaejinseo enhancingvoicespoofingdetectioninnoisyenvironmentsusingfrequencyfeaturemaskingaugmentation
AT sunmookchoi enhancingvoicespoofingdetectioninnoisyenvironmentsusingfrequencyfeaturemaskingaugmentation
AT taeinkang enhancingvoicespoofingdetectioninnoisyenvironmentsusingfrequencyfeaturemaskingaugmentation
AT sanghyeokchung enhancingvoicespoofingdetectioninnoisyenvironmentsusingfrequencyfeaturemaskingaugmentation
AT seungeunlee enhancingvoicespoofingdetectioninnoisyenvironmentsusingfrequencyfeaturemaskingaugmentation
AT seoyoungpark enhancingvoicespoofingdetectioninnoisyenvironmentsusingfrequencyfeaturemaskingaugmentation
AT seungsangoh enhancingvoicespoofingdetectioninnoisyenvironmentsusingfrequencyfeaturemaskingaugmentation
AT ilyoupkwak enhancingvoicespoofingdetectioninnoisyenvironmentsusingfrequencyfeaturemaskingaugmentation