Data Augmentation for Voiceprint Recognition Using Generative Adversarial Networks

Voiceprint recognition systems often face challenges related to limited and diverse datasets, which hinder their performance and generalization capabilities. This study proposes a novel approach that integrates generative adversarial networks (GANs) for data augmentation and convolutional neural net...

Full description

Saved in:
Bibliographic Details
Main Authors: Yao-San Lin, Hung-Yu Chen, Mei-Ling Huang, Tsung-Yu Hsieh
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/17/12/583
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Voiceprint recognition systems often face challenges related to limited and diverse datasets, which hinder their performance and generalization capabilities. This study proposes a novel approach that integrates generative adversarial networks (GANs) for data augmentation and convolutional neural networks (CNNs) with mel-frequency cepstral coefficients (MFCCs) for voiceprint classification. Experimental results demonstrate that the proposed methodology improves recognition accuracy by up to 15% in low-resource scenarios. The optimal ratio of real-to-GAN-generated samples was determined to be 3:2, which balanced dataset diversity and model performance. In specific cases, the model achieved an accuracy of 96.6%, showcasing its effectiveness in capturing unique voice characteristics while mitigating overfitting. These results highlight the potential of combining GAN-augmented data and CNN-based classification to enhance voiceprint recognition in diverse and resource-constrained environments.
ISSN:1999-4893