Unmasking the Fake: Machine Learning Approach for Deepfake Voice Detection

Deepfake voice refers to artificially generated or manipulated audio that mimics a person’s voice, often created using advanced AI techniques. These synthetic voices can be used to convincingly imitate someone, making them nearly indistinguishable from genuine recordings. We present an ad...

Full description

Saved in:
Bibliographic Details
Main Authors: Muhammad Usama Tanveer Gujjar, Kashif Munir, Madiha Amjad, Atiq Ur Rehman, Amine Bermak
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10811921/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849219841738670080
author Muhammad Usama Tanveer Gujjar
Kashif Munir
Madiha Amjad
Atiq Ur Rehman
Amine Bermak
author_facet Muhammad Usama Tanveer Gujjar
Kashif Munir
Madiha Amjad
Atiq Ur Rehman
Amine Bermak
author_sort Muhammad Usama Tanveer Gujjar
collection DOAJ
description Deepfake voice refers to artificially generated or manipulated audio that mimics a person’s voice, often created using advanced AI techniques. These synthetic voices can be used to convincingly imitate someone, making them nearly indistinguishable from genuine recordings. We present an advanced method for deepfake voice detection, leveraging a custom model named MFCC-GNB XtractNet. By extracting Mel-Frequency Cepstral Coefficients (MFCC) from audio samples which serve as the foundational features for identifying genuine and fake voices. These MFCC features are then enhanced through a transformation process that employs a Gaussian Naive Bayes (GNB) model in conjunction with Non-Negative Factorization, creating a more discriminative feature set for subsequent analysis. These features are fed to our developed model, MFCC-GNB XtractNet to identify deep fake voice.To rigorously evaluate the effectiveness of our approach, we deployed a range of machine learning models, including Random Forest (RF), K-Nearest Neighbors Classifier (KNC), Logistic Regression (LR) and Gaussian Naive Bayes (GNB). Each model’s performance is assessed through k-fold cross-validation, ensuring a robust evaluation across multiple data splits. Additionally, we performed a computational cost analysis to measure the efficiency of the models in terms of training time and resource usage. The results of our experiments were highly promising, with our MFCC-GNB XtractNet + GNB model achieving an impressive accuracy score of 99.93%. This exceptional performance underscores the model’s ability to effectively distinguish between real and deepfake voices setting a new benchmark in the field of voice authentication.
format Article
id doaj-art-3b434e6934aa4606899cbdfb16bfb6bc
institution Kabale University
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-3b434e6934aa4606899cbdfb16bfb6bc2025-01-01T00:01:25ZengIEEEIEEE Access2169-35362024-01-011219744219745310.1109/ACCESS.2024.352102610811921Unmasking the Fake: Machine Learning Approach for Deepfake Voice DetectionMuhammad Usama Tanveer Gujjar0https://orcid.org/0009-0002-7374-9461Kashif Munir1https://orcid.org/0000-0001-5114-4213Madiha Amjad2https://orcid.org/0000-0002-1297-8757Atiq Ur Rehman3https://orcid.org/0000-0003-0248-7919Amine Bermak4https://orcid.org/0000-0003-4984-6093Institute of Information Technology, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, PakistanInstitute of Information Technology, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, PakistanInstitute of Information Technology, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, PakistanDivision of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Doha, QatarDivision of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Doha, QatarDeepfake voice refers to artificially generated or manipulated audio that mimics a person’s voice, often created using advanced AI techniques. These synthetic voices can be used to convincingly imitate someone, making them nearly indistinguishable from genuine recordings. We present an advanced method for deepfake voice detection, leveraging a custom model named MFCC-GNB XtractNet. By extracting Mel-Frequency Cepstral Coefficients (MFCC) from audio samples which serve as the foundational features for identifying genuine and fake voices. These MFCC features are then enhanced through a transformation process that employs a Gaussian Naive Bayes (GNB) model in conjunction with Non-Negative Factorization, creating a more discriminative feature set for subsequent analysis. These features are fed to our developed model, MFCC-GNB XtractNet to identify deep fake voice.To rigorously evaluate the effectiveness of our approach, we deployed a range of machine learning models, including Random Forest (RF), K-Nearest Neighbors Classifier (KNC), Logistic Regression (LR) and Gaussian Naive Bayes (GNB). Each model’s performance is assessed through k-fold cross-validation, ensuring a robust evaluation across multiple data splits. Additionally, we performed a computational cost analysis to measure the efficiency of the models in terms of training time and resource usage. The results of our experiments were highly promising, with our MFCC-GNB XtractNet + GNB model achieving an impressive accuracy score of 99.93%. This exceptional performance underscores the model’s ability to effectively distinguish between real and deepfake voices setting a new benchmark in the field of voice authentication.https://ieeexplore.ieee.org/document/10811921/Deep fake voicemachine learningMFCC-GNB XtractNettransfer learning
spellingShingle Muhammad Usama Tanveer Gujjar
Kashif Munir
Madiha Amjad
Atiq Ur Rehman
Amine Bermak
Unmasking the Fake: Machine Learning Approach for Deepfake Voice Detection
IEEE Access
Deep fake voice
machine learning
MFCC-GNB XtractNet
transfer learning
title Unmasking the Fake: Machine Learning Approach for Deepfake Voice Detection
title_full Unmasking the Fake: Machine Learning Approach for Deepfake Voice Detection
title_fullStr Unmasking the Fake: Machine Learning Approach for Deepfake Voice Detection
title_full_unstemmed Unmasking the Fake: Machine Learning Approach for Deepfake Voice Detection
title_short Unmasking the Fake: Machine Learning Approach for Deepfake Voice Detection
title_sort unmasking the fake machine learning approach for deepfake voice detection
topic Deep fake voice
machine learning
MFCC-GNB XtractNet
transfer learning
url https://ieeexplore.ieee.org/document/10811921/
work_keys_str_mv AT muhammadusamatanveergujjar unmaskingthefakemachinelearningapproachfordeepfakevoicedetection
AT kashifmunir unmaskingthefakemachinelearningapproachfordeepfakevoicedetection
AT madihaamjad unmaskingthefakemachinelearningapproachfordeepfakevoicedetection
AT atiqurrehman unmaskingthefakemachinelearningapproachfordeepfakevoicedetection
AT aminebermak unmaskingthefakemachinelearningapproachfordeepfakevoicedetection