Parameter optimisation for a physical model of the vocal system

Abstract This study explores optimisation techniques for refining articulatory parameters in the Pink Trombone, a simplified physical speech synthesiser, to accurately emulate male and female vocal tract characteristics in non-speech sounds. We employ black-box and grey-box approaches, leveraging a...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mateo Cámara, José Luis Blanco, Joshua D. Reiss
Format:	Article
Language:	English
Published:	SpringerOpen 2025-07-01
Series:	EURASIP Journal on Audio, Speech, and Music Processing
Subjects:	Analysis-by-synthesis Acoustic-to-articulatory inversion Articulatory copy synthesis Pink Trombone Procedural audio
Online Access:	https://doi.org/10.1186/s13636-025-00414-5
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849763928563777536
author	Mateo Cámara José Luis Blanco Joshua D. Reiss
author_facet	Mateo Cámara José Luis Blanco Joshua D. Reiss
author_sort	Mateo Cámara
collection	DOAJ
description	Abstract This study explores optimisation techniques for refining articulatory parameters in the Pink Trombone, a simplified physical speech synthesiser, to accurately emulate male and female vocal tract characteristics in non-speech sounds. We employ black-box and grey-box approaches, leveraging a genetic optimiser and Mel-spectrogram representations to infer articulatory configurations from human recordings via direct spectral comparison. Optimisation is performed over time windows to ensure temporal coherence, introducing modifications to SOTA objective metrics. We integrate grey-box strategies, incorporating pYIN for fundamental frequency estimation and a ResNet-based neural network as a neural codebook to enhance the optimisation process. Our findings confirm the synthesiser’s ability to replicate human vocalisations, achieving superior performance over existing techniques in subjective evaluations. We refined the perceptual metric ViSQOL, providing a calibrated framework for future auditory assessments in physical speech synthesis. These contributions establish a methodology for articulatory parameter estimation, improving synthesis quality and expanding vocalisation modelling and analysis applications.
format	Article
id	doaj-art-c6c9924c3bbf4301b25a96cd1f2310fc
institution	DOAJ
issn	1687-4722
language	English
publishDate	2025-07-01
publisher	SpringerOpen
record_format	Article
series	EURASIP Journal on Audio, Speech, and Music Processing
spelling	doaj-art-c6c9924c3bbf4301b25a96cd1f2310fc2025-08-20T03:05:16ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47222025-07-012025111510.1186/s13636-025-00414-5Parameter optimisation for a physical model of the vocal systemMateo Cámara0José Luis Blanco1Joshua D. Reiss2Grupo de Aplicaciones del Procesado de Señal, Universidad Politécnica de MadridGrupo de Aplicaciones del Procesado de Señal, Universidad Politécnica de MadridCentre for Digital Music, Queen Mary University of LondonAbstract This study explores optimisation techniques for refining articulatory parameters in the Pink Trombone, a simplified physical speech synthesiser, to accurately emulate male and female vocal tract characteristics in non-speech sounds. We employ black-box and grey-box approaches, leveraging a genetic optimiser and Mel-spectrogram representations to infer articulatory configurations from human recordings via direct spectral comparison. Optimisation is performed over time windows to ensure temporal coherence, introducing modifications to SOTA objective metrics. We integrate grey-box strategies, incorporating pYIN for fundamental frequency estimation and a ResNet-based neural network as a neural codebook to enhance the optimisation process. Our findings confirm the synthesiser’s ability to replicate human vocalisations, achieving superior performance over existing techniques in subjective evaluations. We refined the perceptual metric ViSQOL, providing a calibrated framework for future auditory assessments in physical speech synthesis. These contributions establish a methodology for articulatory parameter estimation, improving synthesis quality and expanding vocalisation modelling and analysis applications.https://doi.org/10.1186/s13636-025-00414-5Analysis-by-synthesisAcoustic-to-articulatory inversionArticulatory copy synthesisPink TromboneProcedural audio
spellingShingle	Mateo Cámara José Luis Blanco Joshua D. Reiss Parameter optimisation for a physical model of the vocal system EURASIP Journal on Audio, Speech, and Music Processing Analysis-by-synthesis Acoustic-to-articulatory inversion Articulatory copy synthesis Pink Trombone Procedural audio
title	Parameter optimisation for a physical model of the vocal system
title_full	Parameter optimisation for a physical model of the vocal system
title_fullStr	Parameter optimisation for a physical model of the vocal system
title_full_unstemmed	Parameter optimisation for a physical model of the vocal system
title_short	Parameter optimisation for a physical model of the vocal system
title_sort	parameter optimisation for a physical model of the vocal system
topic	Analysis-by-synthesis Acoustic-to-articulatory inversion Articulatory copy synthesis Pink Trombone Procedural audio
url	https://doi.org/10.1186/s13636-025-00414-5
work_keys_str_mv	AT mateocamara parameteroptimisationforaphysicalmodelofthevocalsystem AT joseluisblanco parameteroptimisationforaphysicalmodelofthevocalsystem AT joshuadreiss parameteroptimisationforaphysicalmodelofthevocalsystem

Parameter optimisation for a physical model of the vocal system

Similar Items