Spectral Mapping Using Kernel Principal Components Regression for Voice Conversion

The Gaussian mixture model (GMM) method is popular and efficient for voice conversion (VC), but it is often subject to overfitting. In this paper, the principal component regression (PCR) method is adopted for the spectral mapping between source speech and target speech, and the numbers of principal...

Full description

Saved in:

Bibliographic Details
Main Authors:	Peng SONG, Li ZHAO, Yongqiang BAO
Format:	Article
Language:	English
Published:	Institute of Fundamental Technological Research Polish Academy of Sciences 2013-03-01
Series:	Archives of Acoustics
Subjects:	spectral mapping overfitting oversmoothing discontinuity kernel principal component regression
Online Access:	https://acoustics.ippt.pan.pl/index.php/aa/article/view/5
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850104170443440128
author	Peng SONG Li ZHAO Yongqiang BAO
author_facet	Peng SONG Li ZHAO Yongqiang BAO
author_sort	Peng SONG
collection	DOAJ
description	The Gaussian mixture model (GMM) method is popular and efficient for voice conversion (VC), but it is often subject to overfitting. In this paper, the principal component regression (PCR) method is adopted for the spectral mapping between source speech and target speech, and the numbers of principal components are adjusted properly to prevent the overfitting. Then, in order to better model the nonlinear relationships between the source speech and target speech, the kernel principal component regression (KPCR) method is also proposed. Moreover, a KPCR combined with GMM method is further proposed to improve the accuracy of conversion. In addition, the discontinuity and oversmoothing problems of the traditional GMM method are also addressed. On the one hand, in order to solve the discontinuity problem, the adaptive median filter is adopted to smooth the posterior probabilities. On the other hand, the two mixture components with higher posterior probabilities for each frame are chosen for VC to reduce the oversmoothing problem. Finally, the objective and subjective experiments are carried out, and the results demonstrate that the proposed approach shows greatly better performance than the GMM method. In the objective tests, the proposed method shows lower cepstral distances and higher identification rates than the GMM method. While in the subjective tests, the proposed method obtains higher scores of preference and perceptual quality.
format	Article
id	doaj-art-e74a284ba7824012a94c7c7bf48a3fd5
institution	DOAJ
issn	0137-5075 2300-262X
language	English
publishDate	2013-03-01
publisher	Institute of Fundamental Technological Research Polish Academy of Sciences
record_format	Article
series	Archives of Acoustics
spelling	doaj-art-e74a284ba7824012a94c7c7bf48a3fd52025-08-20T02:39:23ZengInstitute of Fundamental Technological Research Polish Academy of SciencesArchives of Acoustics0137-50752300-262X2013-03-01381Spectral Mapping Using Kernel Principal Components Regression for Voice ConversionPeng SONG0Li ZHAO1Yongqiang BAO2Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education Southeast UniversityKey Laboratory of Underwater Acoustic Signal Processing of Ministry of Education Southeast UniversitySchool of Communication Engineering, Nanjing Institute of TechnologyThe Gaussian mixture model (GMM) method is popular and efficient for voice conversion (VC), but it is often subject to overfitting. In this paper, the principal component regression (PCR) method is adopted for the spectral mapping between source speech and target speech, and the numbers of principal components are adjusted properly to prevent the overfitting. Then, in order to better model the nonlinear relationships between the source speech and target speech, the kernel principal component regression (KPCR) method is also proposed. Moreover, a KPCR combined with GMM method is further proposed to improve the accuracy of conversion. In addition, the discontinuity and oversmoothing problems of the traditional GMM method are also addressed. On the one hand, in order to solve the discontinuity problem, the adaptive median filter is adopted to smooth the posterior probabilities. On the other hand, the two mixture components with higher posterior probabilities for each frame are chosen for VC to reduce the oversmoothing problem. Finally, the objective and subjective experiments are carried out, and the results demonstrate that the proposed approach shows greatly better performance than the GMM method. In the objective tests, the proposed method shows lower cepstral distances and higher identification rates than the GMM method. While in the subjective tests, the proposed method obtains higher scores of preference and perceptual quality.https://acoustics.ippt.pan.pl/index.php/aa/article/view/5spectral mappingoverfittingoversmoothingdiscontinuitykernel principal component regression
spellingShingle	Peng SONG Li ZHAO Yongqiang BAO Spectral Mapping Using Kernel Principal Components Regression for Voice Conversion Archives of Acoustics spectral mapping overfitting oversmoothing discontinuity kernel principal component regression
title	Spectral Mapping Using Kernel Principal Components Regression for Voice Conversion
title_full	Spectral Mapping Using Kernel Principal Components Regression for Voice Conversion
title_fullStr	Spectral Mapping Using Kernel Principal Components Regression for Voice Conversion
title_full_unstemmed	Spectral Mapping Using Kernel Principal Components Regression for Voice Conversion
title_short	Spectral Mapping Using Kernel Principal Components Regression for Voice Conversion
title_sort	spectral mapping using kernel principal components regression for voice conversion
topic	spectral mapping overfitting oversmoothing discontinuity kernel principal component regression
url	https://acoustics.ippt.pan.pl/index.php/aa/article/view/5
work_keys_str_mv	AT pengsong spectralmappingusingkernelprincipalcomponentsregressionforvoiceconversion AT lizhao spectralmappingusingkernelprincipalcomponentsregressionforvoiceconversion AT yongqiangbao spectralmappingusingkernelprincipalcomponentsregressionforvoiceconversion

Spectral Mapping Using Kernel Principal Components Regression for Voice Conversion

Similar Items