WaViT-CDC: Wavelet Vision Transformer With Central Difference Convolutions for Spatial-Frequency Deepfake Detection
The increasing popularity of generative AI has led to a significant rise in deepfake content, creating an urgent need for generalized and reliable deepfake detection methods. Since existing approaches rely on either spatial-domain features or frequency-domain features, they struggle to generalize ac...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Open Journal of Signal Processing |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11007485/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849689833152184320 |
|---|---|
| author | Nour Eldin Alaa Badr Jean-Christophe Nebel Darrel Greenhill Xing Liang |
| author_facet | Nour Eldin Alaa Badr Jean-Christophe Nebel Darrel Greenhill Xing Liang |
| author_sort | Nour Eldin Alaa Badr |
| collection | DOAJ |
| description | The increasing popularity of generative AI has led to a significant rise in deepfake content, creating an urgent need for generalized and reliable deepfake detection methods. Since existing approaches rely on either spatial-domain features or frequency-domain features, they struggle to generalize across unseen datasets, especially those with subtle manipulations. To address these challenges, a novel end-to-end Wavelet Central Difference Convolutional Vision Transformer framework is designed to enhance spatial-frequency deepfake detection. Unlike previous methods, this approach applies the Discrete Wavelet Transform for multi-level frequency decomposition and Central Difference Convolution to capture local fine-grained discrepancies and focus on texture variances, while also incorporating Vision Transformers for global contextual understanding. The Frequency-Spatial Feature Fusion Attention module integrates these features, enabling the effective detection of fake artifacts. Moreover, in contrast to earlier work, subtle perturbations to both spatial and frequency domains are introduced to further improve generalization. Generalization cross-dataset evaluations demonstrate that WaViT-CDC outperforms state-of-the-art methods, when trained on both low-quality and high-quality face images, achieving an average performance increase of 2.5% and 4.5% on challenging high-resolution, real-world datasets such as Celeb-DF and WildDeepfake. |
| format | Article |
| id | doaj-art-a6d0fe0ee1ed4b12b32967554ec8bdff |
| institution | DOAJ |
| issn | 2644-1322 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Open Journal of Signal Processing |
| spelling | doaj-art-a6d0fe0ee1ed4b12b32967554ec8bdff2025-08-20T03:21:30ZengIEEEIEEE Open Journal of Signal Processing2644-13222025-01-01662163010.1109/OJSP.2025.357167911007485WaViT-CDC: Wavelet Vision Transformer With Central Difference Convolutions for Spatial-Frequency Deepfake DetectionNour Eldin Alaa Badr0Jean-Christophe Nebel1https://orcid.org/0000-0003-1812-5269Darrel Greenhill2Xing Liang3https://orcid.org/0000-0002-6630-298XSchool of Computer Science and Mathematics, Kingston University London, London, U.K.School of Computer Science and Mathematics, Kingston University London, London, U.K.School of Computer Science and Mathematics, Kingston University London, London, U.K.School of Computer Science and Mathematics, Kingston University London, London, U.K.The increasing popularity of generative AI has led to a significant rise in deepfake content, creating an urgent need for generalized and reliable deepfake detection methods. Since existing approaches rely on either spatial-domain features or frequency-domain features, they struggle to generalize across unseen datasets, especially those with subtle manipulations. To address these challenges, a novel end-to-end Wavelet Central Difference Convolutional Vision Transformer framework is designed to enhance spatial-frequency deepfake detection. Unlike previous methods, this approach applies the Discrete Wavelet Transform for multi-level frequency decomposition and Central Difference Convolution to capture local fine-grained discrepancies and focus on texture variances, while also incorporating Vision Transformers for global contextual understanding. The Frequency-Spatial Feature Fusion Attention module integrates these features, enabling the effective detection of fake artifacts. Moreover, in contrast to earlier work, subtle perturbations to both spatial and frequency domains are introduced to further improve generalization. Generalization cross-dataset evaluations demonstrate that WaViT-CDC outperforms state-of-the-art methods, when trained on both low-quality and high-quality face images, achieving an average performance increase of 2.5% and 4.5% on challenging high-resolution, real-world datasets such as Celeb-DF and WildDeepfake.https://ieeexplore.ieee.org/document/11007485/Deepfake detectioncentral difference convolutionsvision transformerspatial-frequency analysisdiscrete wavelet transformsubtle perturbations |
| spellingShingle | Nour Eldin Alaa Badr Jean-Christophe Nebel Darrel Greenhill Xing Liang WaViT-CDC: Wavelet Vision Transformer With Central Difference Convolutions for Spatial-Frequency Deepfake Detection IEEE Open Journal of Signal Processing Deepfake detection central difference convolutions vision transformer spatial-frequency analysis discrete wavelet transform subtle perturbations |
| title | WaViT-CDC: Wavelet Vision Transformer With Central Difference Convolutions for Spatial-Frequency Deepfake Detection |
| title_full | WaViT-CDC: Wavelet Vision Transformer With Central Difference Convolutions for Spatial-Frequency Deepfake Detection |
| title_fullStr | WaViT-CDC: Wavelet Vision Transformer With Central Difference Convolutions for Spatial-Frequency Deepfake Detection |
| title_full_unstemmed | WaViT-CDC: Wavelet Vision Transformer With Central Difference Convolutions for Spatial-Frequency Deepfake Detection |
| title_short | WaViT-CDC: Wavelet Vision Transformer With Central Difference Convolutions for Spatial-Frequency Deepfake Detection |
| title_sort | wavit cdc wavelet vision transformer with central difference convolutions for spatial frequency deepfake detection |
| topic | Deepfake detection central difference convolutions vision transformer spatial-frequency analysis discrete wavelet transform subtle perturbations |
| url | https://ieeexplore.ieee.org/document/11007485/ |
| work_keys_str_mv | AT noureldinalaabadr wavitcdcwaveletvisiontransformerwithcentraldifferenceconvolutionsforspatialfrequencydeepfakedetection AT jeanchristophenebel wavitcdcwaveletvisiontransformerwithcentraldifferenceconvolutionsforspatialfrequencydeepfakedetection AT darrelgreenhill wavitcdcwaveletvisiontransformerwithcentraldifferenceconvolutionsforspatialfrequencydeepfakedetection AT xingliang wavitcdcwaveletvisiontransformerwithcentraldifferenceconvolutionsforspatialfrequencydeepfakedetection |