DeepLASD countermeasure for logical access audio spoofing
Abstract Voice-based authentication systems have become increasingly vulnerable to logical access (LA) spoofing through sophisticated voice conversion (VC) and text-to-speech (TTS) attacks. This paper proposes an end-to-end deep learning approach DeepLASD, that processes raw waveforms to detect spoo...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-04808-5 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849334788233625600 |
|---|---|
| author | Hamed Al-Tairi Ali Javed Tasawer Khan Abdul Khader Jilani Saudagar |
| author_facet | Hamed Al-Tairi Ali Javed Tasawer Khan Abdul Khader Jilani Saudagar |
| author_sort | Hamed Al-Tairi |
| collection | DOAJ |
| description | Abstract Voice-based authentication systems have become increasingly vulnerable to logical access (LA) spoofing through sophisticated voice conversion (VC) and text-to-speech (TTS) attacks. This paper proposes an end-to-end deep learning approach DeepLASD, that processes raw waveforms to detect spoofed speech without relying on handcrafted features. The model incorporates a SincConv layer for interpretable spectral processing, along with residual convolutional blocks that integrate attention for improved feature extraction. We introduce GeLU activation in residual blocks to enhance our method’s ability to better capture the unique traits in real and spoof samples. A gated recurrent unit is further employed for temporal dynamics modeling. Extensive experimentation was conducted on the large-scale and diverse ASVspoof 2019 and 2021 datasets. Achieving an Equal Error Rate as low as $$4.98\%$$ and a minimum Tandem Detection Cost Function of 0.1208, along with strong generalization to both VC and TTS spoof types, demonstrate the competency of the proposed method for LA spoofing detection. Although the results on the ASVspoof 2021 dataset underscore the challenges posed by next-generation synthetic speech, the proposed solution exhibits notable adaptability. These findings affirm that the proposed end-to-end anti-spoofing framework enhances security and detection capabilities in voice authentication systems. |
| format | Article |
| id | doaj-art-a172e9dc5ce24e4cb616b8d446964545 |
| institution | Kabale University |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-a172e9dc5ce24e4cb616b8d4469645452025-08-20T03:45:28ZengNature PortfolioScientific Reports2045-23222025-07-0115111010.1038/s41598-025-04808-5DeepLASD countermeasure for logical access audio spoofingHamed Al-Tairi0Ali Javed1Tasawer Khan2Abdul Khader Jilani Saudagar3School of Information Technology, Whitecliffe CollegeDepartment of Software Engineering, University of Engineering and TechnologyJames Watt School of Engineering, University of GlasgowInformation Systems Department,College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU)Abstract Voice-based authentication systems have become increasingly vulnerable to logical access (LA) spoofing through sophisticated voice conversion (VC) and text-to-speech (TTS) attacks. This paper proposes an end-to-end deep learning approach DeepLASD, that processes raw waveforms to detect spoofed speech without relying on handcrafted features. The model incorporates a SincConv layer for interpretable spectral processing, along with residual convolutional blocks that integrate attention for improved feature extraction. We introduce GeLU activation in residual blocks to enhance our method’s ability to better capture the unique traits in real and spoof samples. A gated recurrent unit is further employed for temporal dynamics modeling. Extensive experimentation was conducted on the large-scale and diverse ASVspoof 2019 and 2021 datasets. Achieving an Equal Error Rate as low as $$4.98\%$$ and a minimum Tandem Detection Cost Function of 0.1208, along with strong generalization to both VC and TTS spoof types, demonstrate the competency of the proposed method for LA spoofing detection. Although the results on the ASVspoof 2021 dataset underscore the challenges posed by next-generation synthetic speech, the proposed solution exhibits notable adaptability. These findings affirm that the proposed end-to-end anti-spoofing framework enhances security and detection capabilities in voice authentication systems.https://doi.org/10.1038/s41598-025-04808-5Automatic speaker verificationDeep learningLogical access attacksSpoof detectionText-to-speech synthesisVoice conversion |
| spellingShingle | Hamed Al-Tairi Ali Javed Tasawer Khan Abdul Khader Jilani Saudagar DeepLASD countermeasure for logical access audio spoofing Scientific Reports Automatic speaker verification Deep learning Logical access attacks Spoof detection Text-to-speech synthesis Voice conversion |
| title | DeepLASD countermeasure for logical access audio spoofing |
| title_full | DeepLASD countermeasure for logical access audio spoofing |
| title_fullStr | DeepLASD countermeasure for logical access audio spoofing |
| title_full_unstemmed | DeepLASD countermeasure for logical access audio spoofing |
| title_short | DeepLASD countermeasure for logical access audio spoofing |
| title_sort | deeplasd countermeasure for logical access audio spoofing |
| topic | Automatic speaker verification Deep learning Logical access attacks Spoof detection Text-to-speech synthesis Voice conversion |
| url | https://doi.org/10.1038/s41598-025-04808-5 |
| work_keys_str_mv | AT hamedaltairi deeplasdcountermeasureforlogicalaccessaudiospoofing AT alijaved deeplasdcountermeasureforlogicalaccessaudiospoofing AT tasawerkhan deeplasdcountermeasureforlogicalaccessaudiospoofing AT abdulkhaderjilanisaudagar deeplasdcountermeasureforlogicalaccessaudiospoofing |