DDFNet: A Dual-Domain Fusion Network for Robust Synthetic Speech Detection

The detection of synthetic speech has become a pressing challenge due to the potential societal risks posed by synthetic speech technologies. Existing methods primarily focus on either the time or frequency domain of speech, limiting their ability to generalize to new and diverse speech synthesis al...

Full description

Saved in:
Bibliographic Details
Main Authors: Jing Lu, Qiang Zhang, Jialu Cao, Hui Tian
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Big Data and Cognitive Computing
Subjects:
Online Access:https://www.mdpi.com/2504-2289/9/3/58
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850204899629858816
author Jing Lu
Qiang Zhang
Jialu Cao
Hui Tian
author_facet Jing Lu
Qiang Zhang
Jialu Cao
Hui Tian
author_sort Jing Lu
collection DOAJ
description The detection of synthetic speech has become a pressing challenge due to the potential societal risks posed by synthetic speech technologies. Existing methods primarily focus on either the time or frequency domain of speech, limiting their ability to generalize to new and diverse speech synthesis algorithms. In this work, we present a novel and scientifically grounded approach, the Dual-domain Fusion Network (DDFNet), which synergistically integrates features from both the time and frequency domains to capture complementary information. The architecture consists of two specialized single-domain feature extraction networks, each optimized for the unique characteristics of its respective domain, and a feature fusion network that effectively combines these features at a deep level. Moreover, we incorporate multi-task learning to simultaneously capture rich, multi-faceted representations, further enhancing the model’s generalization capability. Extensive experiments on the ASVspoof 2019 Logical Access corpus and ASVspoof 2021 tracks demonstrate that DDFNet achieves strong performance, maintaining competitive results despite the challenges posed by channel changes and compression coding, highlighting its robust generalization ability.
format Article
id doaj-art-dda2c72a24ea477eac7e7b38408ea11a
institution OA Journals
issn 2504-2289
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Big Data and Cognitive Computing
spelling doaj-art-dda2c72a24ea477eac7e7b38408ea11a2025-08-20T02:11:12ZengMDPI AGBig Data and Cognitive Computing2504-22892025-03-01935810.3390/bdcc9030058DDFNet: A Dual-Domain Fusion Network for Robust Synthetic Speech DetectionJing Lu0Qiang Zhang1Jialu Cao2Hui Tian3College of Computer Science and Technology, Huaqiao University, Xiamen 361021, ChinaCollege of Computer Science and Technology, Huaqiao University, Xiamen 361021, ChinaCollege of Computer Science and Technology, Huaqiao University, Xiamen 361021, ChinaCollege of Computer Science and Technology, Huaqiao University, Xiamen 361021, ChinaThe detection of synthetic speech has become a pressing challenge due to the potential societal risks posed by synthetic speech technologies. Existing methods primarily focus on either the time or frequency domain of speech, limiting their ability to generalize to new and diverse speech synthesis algorithms. In this work, we present a novel and scientifically grounded approach, the Dual-domain Fusion Network (DDFNet), which synergistically integrates features from both the time and frequency domains to capture complementary information. The architecture consists of two specialized single-domain feature extraction networks, each optimized for the unique characteristics of its respective domain, and a feature fusion network that effectively combines these features at a deep level. Moreover, we incorporate multi-task learning to simultaneously capture rich, multi-faceted representations, further enhancing the model’s generalization capability. Extensive experiments on the ASVspoof 2019 Logical Access corpus and ASVspoof 2021 tracks demonstrate that DDFNet achieves strong performance, maintaining competitive results despite the challenges posed by channel changes and compression coding, highlighting its robust generalization ability.https://www.mdpi.com/2504-2289/9/3/58Dual-domain fusionmulti-task learningsynthetic speech detectionspeech forensics
spellingShingle Jing Lu
Qiang Zhang
Jialu Cao
Hui Tian
DDFNet: A Dual-Domain Fusion Network for Robust Synthetic Speech Detection
Big Data and Cognitive Computing
Dual-domain fusion
multi-task learning
synthetic speech detection
speech forensics
title DDFNet: A Dual-Domain Fusion Network for Robust Synthetic Speech Detection
title_full DDFNet: A Dual-Domain Fusion Network for Robust Synthetic Speech Detection
title_fullStr DDFNet: A Dual-Domain Fusion Network for Robust Synthetic Speech Detection
title_full_unstemmed DDFNet: A Dual-Domain Fusion Network for Robust Synthetic Speech Detection
title_short DDFNet: A Dual-Domain Fusion Network for Robust Synthetic Speech Detection
title_sort ddfnet a dual domain fusion network for robust synthetic speech detection
topic Dual-domain fusion
multi-task learning
synthetic speech detection
speech forensics
url https://www.mdpi.com/2504-2289/9/3/58
work_keys_str_mv AT jinglu ddfnetadualdomainfusionnetworkforrobustsyntheticspeechdetection
AT qiangzhang ddfnetadualdomainfusionnetworkforrobustsyntheticspeechdetection
AT jialucao ddfnetadualdomainfusionnetworkforrobustsyntheticspeechdetection
AT huitian ddfnetadualdomainfusionnetworkforrobustsyntheticspeechdetection