Power Wavelet Cepstral Coefficients (PWCC): An Accurate Auditory Model-Based Feature Extraction Method for Robust Speaker Recognition

Human capability for Speaker Recognition (SR) exceeds recent machine learning approaches, even in noisy environments. To bridge this gap, researchers investigate the human auditory system to support machine learning algorithm performance. The paper introduces a novel feature extraction method, named...

Full description

Saved in:
Bibliographic Details
Main Authors: Youssef Zouhir, Mohamed Zarka, Kais Ouni, Lilia El Amraoui
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11024023/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Human capability for Speaker Recognition (SR) exceeds recent machine learning approaches, even in noisy environments. To bridge this gap, researchers investigate the human auditory system to support machine learning algorithm performance. The paper introduces a novel feature extraction method, named “Power Wavelet Cepstral Coefficients” (PWCC), for enhancing SR accuracy. This method is derived from the “Normalized Wavelet FilterBank” (NWFB), which utilizes an “Equivalent Rectangular Bandwidth” rate (ERB-rate) scale and additionally integrates a “Noise Suppression Module” (NSM). The NWFB imitates the cochlea’s frequency selectivity using “Morlet Wavelet filters” alongside an ERB-rate scale. The NSM applies a medium-duration power analysis, an asymmetrical noise-suppression module incorporating a temporal masking component, and a spectral smoothing module to reduce the impact of noisy signal. To assess the performance of the proposed PWCC method, experiments were conducted using clean speech signals from the TIMIT database, corrupted with various noises from the AURORA dataset. Using a “Gaussian Mixture Model-Universal Background Model” (GMM-UBM) classifier, the PWCC method demonstrated superior SR accuracy in noisy environments compared to traditional methods such as PNCC and MFCC. Furthermore, PWCC maintained higher precision, recall, and F1-scores than PNCC and MFCC under overall noise conditions. For instance, with babble noise at 15 dB SNR, PWCC achieved a recognition rate of 92.06%, compared to 75.24% for PNCC and 68.33% for MFCC.
ISSN:2169-3536