DeepLASD countermeasure for logical access audio spoofing

Abstract Voice-based authentication systems have become increasingly vulnerable to logical access (LA) spoofing through sophisticated voice conversion (VC) and text-to-speech (TTS) attacks. This paper proposes an end-to-end deep learning approach DeepLASD, that processes raw waveforms to detect spoo...

Full description

Saved in:
Bibliographic Details
Main Authors: Hamed Al-Tairi, Ali Javed, Tasawer Khan, Abdul Khader Jilani Saudagar
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-04808-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849334788233625600
author Hamed Al-Tairi
Ali Javed
Tasawer Khan
Abdul Khader Jilani Saudagar
author_facet Hamed Al-Tairi
Ali Javed
Tasawer Khan
Abdul Khader Jilani Saudagar
author_sort Hamed Al-Tairi
collection DOAJ
description Abstract Voice-based authentication systems have become increasingly vulnerable to logical access (LA) spoofing through sophisticated voice conversion (VC) and text-to-speech (TTS) attacks. This paper proposes an end-to-end deep learning approach DeepLASD, that processes raw waveforms to detect spoofed speech without relying on handcrafted features. The model incorporates a SincConv layer for interpretable spectral processing, along with residual convolutional blocks that integrate attention for improved feature extraction. We introduce GeLU activation in residual blocks to enhance our method’s ability to better capture the unique traits in real and spoof samples. A gated recurrent unit is further employed for temporal dynamics modeling. Extensive experimentation was conducted on the large-scale and diverse ASVspoof 2019 and 2021 datasets. Achieving an Equal Error Rate as low as $$4.98\%$$ and a minimum Tandem Detection Cost Function of 0.1208, along with strong generalization to both VC and TTS spoof types, demonstrate the competency of the proposed method for LA spoofing detection. Although the results on the ASVspoof 2021 dataset underscore the challenges posed by next-generation synthetic speech, the proposed solution exhibits notable adaptability. These findings affirm that the proposed end-to-end anti-spoofing framework enhances security and detection capabilities in voice authentication systems.
format Article
id doaj-art-a172e9dc5ce24e4cb616b8d446964545
institution Kabale University
issn 2045-2322
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-a172e9dc5ce24e4cb616b8d4469645452025-08-20T03:45:28ZengNature PortfolioScientific Reports2045-23222025-07-0115111010.1038/s41598-025-04808-5DeepLASD countermeasure for logical access audio spoofingHamed Al-Tairi0Ali Javed1Tasawer Khan2Abdul Khader Jilani Saudagar3School of Information Technology, Whitecliffe CollegeDepartment of Software Engineering, University of Engineering and TechnologyJames Watt School of Engineering, University of GlasgowInformation Systems Department,College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU)Abstract Voice-based authentication systems have become increasingly vulnerable to logical access (LA) spoofing through sophisticated voice conversion (VC) and text-to-speech (TTS) attacks. This paper proposes an end-to-end deep learning approach DeepLASD, that processes raw waveforms to detect spoofed speech without relying on handcrafted features. The model incorporates a SincConv layer for interpretable spectral processing, along with residual convolutional blocks that integrate attention for improved feature extraction. We introduce GeLU activation in residual blocks to enhance our method’s ability to better capture the unique traits in real and spoof samples. A gated recurrent unit is further employed for temporal dynamics modeling. Extensive experimentation was conducted on the large-scale and diverse ASVspoof 2019 and 2021 datasets. Achieving an Equal Error Rate as low as $$4.98\%$$ and a minimum Tandem Detection Cost Function of 0.1208, along with strong generalization to both VC and TTS spoof types, demonstrate the competency of the proposed method for LA spoofing detection. Although the results on the ASVspoof 2021 dataset underscore the challenges posed by next-generation synthetic speech, the proposed solution exhibits notable adaptability. These findings affirm that the proposed end-to-end anti-spoofing framework enhances security and detection capabilities in voice authentication systems.https://doi.org/10.1038/s41598-025-04808-5Automatic speaker verificationDeep learningLogical access attacksSpoof detectionText-to-speech synthesisVoice conversion
spellingShingle Hamed Al-Tairi
Ali Javed
Tasawer Khan
Abdul Khader Jilani Saudagar
DeepLASD countermeasure for logical access audio spoofing
Scientific Reports
Automatic speaker verification
Deep learning
Logical access attacks
Spoof detection
Text-to-speech synthesis
Voice conversion
title DeepLASD countermeasure for logical access audio spoofing
title_full DeepLASD countermeasure for logical access audio spoofing
title_fullStr DeepLASD countermeasure for logical access audio spoofing
title_full_unstemmed DeepLASD countermeasure for logical access audio spoofing
title_short DeepLASD countermeasure for logical access audio spoofing
title_sort deeplasd countermeasure for logical access audio spoofing
topic Automatic speaker verification
Deep learning
Logical access attacks
Spoof detection
Text-to-speech synthesis
Voice conversion
url https://doi.org/10.1038/s41598-025-04808-5
work_keys_str_mv AT hamedaltairi deeplasdcountermeasureforlogicalaccessaudiospoofing
AT alijaved deeplasdcountermeasureforlogicalaccessaudiospoofing
AT tasawerkhan deeplasdcountermeasureforlogicalaccessaudiospoofing
AT abdulkhaderjilanisaudagar deeplasdcountermeasureforlogicalaccessaudiospoofing