Robust audio authenticity detection based on One-Class learning

Deepfake technology poses a serious threat to social economy, political stability, and social security. Among them, voice forgery technology is widely used in harmful activities such as phone scams and public opinion manipulation. In recent years, with the application of deep learning technology, sp...

Full description

Saved in:
Bibliographic Details
Main Authors: LIANG Ziqi, ZHANG Xulong, WANG Jianzong, XIAO Jing
Format: Article
Language:zho
Published: China InfoCom Media Group 2025-05-01
Series:大数据
Subjects:
Online Access:http://www.j-bigdataresearch.com.cn/zh/article/doi/10.11959/j.issn.2096-0271.2025031/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deepfake technology poses a serious threat to social economy, political stability, and social security. Among them, voice forgery technology is widely used in harmful activities such as phone scams and public opinion manipulation. In recent years, with the application of deep learning technology, speech synthesis and voice conversion technology have advanced rapidly, and can generate fake voices which are enough to deceive both machines and humans. In response to the harm of voice forgery, many voice deception detection technologies have emerged to improve the reliability of speaker verification systems. However, existing methods often rely on prior knowledge of known attack types, and their generalization ability is limited when they faced with unknown attack types of prior knowledge. We built a voice deception detection system based on One-Class learning, which enhanced the generalization ability of the model by establishing strict decision boundaries for real voices and judging samples outside the boundaries as fake voices. In addition, to address the scarcity of fake voice data, the more versatile and robust self-supervised model Wav2vec2 was introduced for feature extraction, further improving the recognition accuracy of the model when faced with unknown attacks. Experimental results show that the proposed method can not only ensure good voice anti-spoofing performance, but also reduce the potential interference of the CM system to the downstream ASV system, effectively solving the problems of scarce fake voice data and insufficient generalization ability of the model.
ISSN:2096-0271