DDFNet: A Dual-Domain Fusion Network for Robust Synthetic Speech Detection
The detection of synthetic speech has become a pressing challenge due to the potential societal risks posed by synthetic speech technologies. Existing methods primarily focus on either the time or frequency domain of speech, limiting their ability to generalize to new and diverse speech synthesis al...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Big Data and Cognitive Computing |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2504-2289/9/3/58 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850204899629858816 |
|---|---|
| author | Jing Lu Qiang Zhang Jialu Cao Hui Tian |
| author_facet | Jing Lu Qiang Zhang Jialu Cao Hui Tian |
| author_sort | Jing Lu |
| collection | DOAJ |
| description | The detection of synthetic speech has become a pressing challenge due to the potential societal risks posed by synthetic speech technologies. Existing methods primarily focus on either the time or frequency domain of speech, limiting their ability to generalize to new and diverse speech synthesis algorithms. In this work, we present a novel and scientifically grounded approach, the Dual-domain Fusion Network (DDFNet), which synergistically integrates features from both the time and frequency domains to capture complementary information. The architecture consists of two specialized single-domain feature extraction networks, each optimized for the unique characteristics of its respective domain, and a feature fusion network that effectively combines these features at a deep level. Moreover, we incorporate multi-task learning to simultaneously capture rich, multi-faceted representations, further enhancing the model’s generalization capability. Extensive experiments on the ASVspoof 2019 Logical Access corpus and ASVspoof 2021 tracks demonstrate that DDFNet achieves strong performance, maintaining competitive results despite the challenges posed by channel changes and compression coding, highlighting its robust generalization ability. |
| format | Article |
| id | doaj-art-dda2c72a24ea477eac7e7b38408ea11a |
| institution | OA Journals |
| issn | 2504-2289 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Big Data and Cognitive Computing |
| spelling | doaj-art-dda2c72a24ea477eac7e7b38408ea11a2025-08-20T02:11:12ZengMDPI AGBig Data and Cognitive Computing2504-22892025-03-01935810.3390/bdcc9030058DDFNet: A Dual-Domain Fusion Network for Robust Synthetic Speech DetectionJing Lu0Qiang Zhang1Jialu Cao2Hui Tian3College of Computer Science and Technology, Huaqiao University, Xiamen 361021, ChinaCollege of Computer Science and Technology, Huaqiao University, Xiamen 361021, ChinaCollege of Computer Science and Technology, Huaqiao University, Xiamen 361021, ChinaCollege of Computer Science and Technology, Huaqiao University, Xiamen 361021, ChinaThe detection of synthetic speech has become a pressing challenge due to the potential societal risks posed by synthetic speech technologies. Existing methods primarily focus on either the time or frequency domain of speech, limiting their ability to generalize to new and diverse speech synthesis algorithms. In this work, we present a novel and scientifically grounded approach, the Dual-domain Fusion Network (DDFNet), which synergistically integrates features from both the time and frequency domains to capture complementary information. The architecture consists of two specialized single-domain feature extraction networks, each optimized for the unique characteristics of its respective domain, and a feature fusion network that effectively combines these features at a deep level. Moreover, we incorporate multi-task learning to simultaneously capture rich, multi-faceted representations, further enhancing the model’s generalization capability. Extensive experiments on the ASVspoof 2019 Logical Access corpus and ASVspoof 2021 tracks demonstrate that DDFNet achieves strong performance, maintaining competitive results despite the challenges posed by channel changes and compression coding, highlighting its robust generalization ability.https://www.mdpi.com/2504-2289/9/3/58Dual-domain fusionmulti-task learningsynthetic speech detectionspeech forensics |
| spellingShingle | Jing Lu Qiang Zhang Jialu Cao Hui Tian DDFNet: A Dual-Domain Fusion Network for Robust Synthetic Speech Detection Big Data and Cognitive Computing Dual-domain fusion multi-task learning synthetic speech detection speech forensics |
| title | DDFNet: A Dual-Domain Fusion Network for Robust Synthetic Speech Detection |
| title_full | DDFNet: A Dual-Domain Fusion Network for Robust Synthetic Speech Detection |
| title_fullStr | DDFNet: A Dual-Domain Fusion Network for Robust Synthetic Speech Detection |
| title_full_unstemmed | DDFNet: A Dual-Domain Fusion Network for Robust Synthetic Speech Detection |
| title_short | DDFNet: A Dual-Domain Fusion Network for Robust Synthetic Speech Detection |
| title_sort | ddfnet a dual domain fusion network for robust synthetic speech detection |
| topic | Dual-domain fusion multi-task learning synthetic speech detection speech forensics |
| url | https://www.mdpi.com/2504-2289/9/3/58 |
| work_keys_str_mv | AT jinglu ddfnetadualdomainfusionnetworkforrobustsyntheticspeechdetection AT qiangzhang ddfnetadualdomainfusionnetworkforrobustsyntheticspeechdetection AT jialucao ddfnetadualdomainfusionnetworkforrobustsyntheticspeechdetection AT huitian ddfnetadualdomainfusionnetworkforrobustsyntheticspeechdetection |