WaViT-CDC: Wavelet Vision Transformer With Central Difference Convolutions for Spatial-Frequency Deepfake Detection

The increasing popularity of generative AI has led to a significant rise in deepfake content, creating an urgent need for generalized and reliable deepfake detection methods. Since existing approaches rely on either spatial-domain features or frequency-domain features, they struggle to generalize ac...

Full description

Saved in:
Bibliographic Details
Main Authors: Nour Eldin Alaa Badr, Jean-Christophe Nebel, Darrel Greenhill, Xing Liang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Open Journal of Signal Processing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11007485/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849689833152184320
author Nour Eldin Alaa Badr
Jean-Christophe Nebel
Darrel Greenhill
Xing Liang
author_facet Nour Eldin Alaa Badr
Jean-Christophe Nebel
Darrel Greenhill
Xing Liang
author_sort Nour Eldin Alaa Badr
collection DOAJ
description The increasing popularity of generative AI has led to a significant rise in deepfake content, creating an urgent need for generalized and reliable deepfake detection methods. Since existing approaches rely on either spatial-domain features or frequency-domain features, they struggle to generalize across unseen datasets, especially those with subtle manipulations. To address these challenges, a novel end-to-end Wavelet Central Difference Convolutional Vision Transformer framework is designed to enhance spatial-frequency deepfake detection. Unlike previous methods, this approach applies the Discrete Wavelet Transform for multi-level frequency decomposition and Central Difference Convolution to capture local fine-grained discrepancies and focus on texture variances, while also incorporating Vision Transformers for global contextual understanding. The Frequency-Spatial Feature Fusion Attention module integrates these features, enabling the effective detection of fake artifacts. Moreover, in contrast to earlier work, subtle perturbations to both spatial and frequency domains are introduced to further improve generalization. Generalization cross-dataset evaluations demonstrate that WaViT-CDC outperforms state-of-the-art methods, when trained on both low-quality and high-quality face images, achieving an average performance increase of 2.5% and 4.5% on challenging high-resolution, real-world datasets such as Celeb-DF and WildDeepfake.
format Article
id doaj-art-a6d0fe0ee1ed4b12b32967554ec8bdff
institution DOAJ
issn 2644-1322
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Open Journal of Signal Processing
spelling doaj-art-a6d0fe0ee1ed4b12b32967554ec8bdff2025-08-20T03:21:30ZengIEEEIEEE Open Journal of Signal Processing2644-13222025-01-01662163010.1109/OJSP.2025.357167911007485WaViT-CDC: Wavelet Vision Transformer With Central Difference Convolutions for Spatial-Frequency Deepfake DetectionNour Eldin Alaa Badr0Jean-Christophe Nebel1https://orcid.org/0000-0003-1812-5269Darrel Greenhill2Xing Liang3https://orcid.org/0000-0002-6630-298XSchool of Computer Science and Mathematics, Kingston University London, London, U.K.School of Computer Science and Mathematics, Kingston University London, London, U.K.School of Computer Science and Mathematics, Kingston University London, London, U.K.School of Computer Science and Mathematics, Kingston University London, London, U.K.The increasing popularity of generative AI has led to a significant rise in deepfake content, creating an urgent need for generalized and reliable deepfake detection methods. Since existing approaches rely on either spatial-domain features or frequency-domain features, they struggle to generalize across unseen datasets, especially those with subtle manipulations. To address these challenges, a novel end-to-end Wavelet Central Difference Convolutional Vision Transformer framework is designed to enhance spatial-frequency deepfake detection. Unlike previous methods, this approach applies the Discrete Wavelet Transform for multi-level frequency decomposition and Central Difference Convolution to capture local fine-grained discrepancies and focus on texture variances, while also incorporating Vision Transformers for global contextual understanding. The Frequency-Spatial Feature Fusion Attention module integrates these features, enabling the effective detection of fake artifacts. Moreover, in contrast to earlier work, subtle perturbations to both spatial and frequency domains are introduced to further improve generalization. Generalization cross-dataset evaluations demonstrate that WaViT-CDC outperforms state-of-the-art methods, when trained on both low-quality and high-quality face images, achieving an average performance increase of 2.5% and 4.5% on challenging high-resolution, real-world datasets such as Celeb-DF and WildDeepfake.https://ieeexplore.ieee.org/document/11007485/Deepfake detectioncentral difference convolutionsvision transformerspatial-frequency analysisdiscrete wavelet transformsubtle perturbations
spellingShingle Nour Eldin Alaa Badr
Jean-Christophe Nebel
Darrel Greenhill
Xing Liang
WaViT-CDC: Wavelet Vision Transformer With Central Difference Convolutions for Spatial-Frequency Deepfake Detection
IEEE Open Journal of Signal Processing
Deepfake detection
central difference convolutions
vision transformer
spatial-frequency analysis
discrete wavelet transform
subtle perturbations
title WaViT-CDC: Wavelet Vision Transformer With Central Difference Convolutions for Spatial-Frequency Deepfake Detection
title_full WaViT-CDC: Wavelet Vision Transformer With Central Difference Convolutions for Spatial-Frequency Deepfake Detection
title_fullStr WaViT-CDC: Wavelet Vision Transformer With Central Difference Convolutions for Spatial-Frequency Deepfake Detection
title_full_unstemmed WaViT-CDC: Wavelet Vision Transformer With Central Difference Convolutions for Spatial-Frequency Deepfake Detection
title_short WaViT-CDC: Wavelet Vision Transformer With Central Difference Convolutions for Spatial-Frequency Deepfake Detection
title_sort wavit cdc wavelet vision transformer with central difference convolutions for spatial frequency deepfake detection
topic Deepfake detection
central difference convolutions
vision transformer
spatial-frequency analysis
discrete wavelet transform
subtle perturbations
url https://ieeexplore.ieee.org/document/11007485/
work_keys_str_mv AT noureldinalaabadr wavitcdcwaveletvisiontransformerwithcentraldifferenceconvolutionsforspatialfrequencydeepfakedetection
AT jeanchristophenebel wavitcdcwaveletvisiontransformerwithcentraldifferenceconvolutionsforspatialfrequencydeepfakedetection
AT darrelgreenhill wavitcdcwaveletvisiontransformerwithcentraldifferenceconvolutionsforspatialfrequencydeepfakedetection
AT xingliang wavitcdcwaveletvisiontransformerwithcentraldifferenceconvolutionsforspatialfrequencydeepfakedetection