Parameter optimisation for a physical model of the vocal system

Abstract This study explores optimisation techniques for refining articulatory parameters in the Pink Trombone, a simplified physical speech synthesiser, to accurately emulate male and female vocal tract characteristics in non-speech sounds. We employ black-box and grey-box approaches, leveraging a...

Full description

Saved in:
Bibliographic Details
Main Authors: Mateo Cámara, José Luis Blanco, Joshua D. Reiss
Format: Article
Language:English
Published: SpringerOpen 2025-07-01
Series:EURASIP Journal on Audio, Speech, and Music Processing
Subjects:
Online Access:https://doi.org/10.1186/s13636-025-00414-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849763928563777536
author Mateo Cámara
José Luis Blanco
Joshua D. Reiss
author_facet Mateo Cámara
José Luis Blanco
Joshua D. Reiss
author_sort Mateo Cámara
collection DOAJ
description Abstract This study explores optimisation techniques for refining articulatory parameters in the Pink Trombone, a simplified physical speech synthesiser, to accurately emulate male and female vocal tract characteristics in non-speech sounds. We employ black-box and grey-box approaches, leveraging a genetic optimiser and Mel-spectrogram representations to infer articulatory configurations from human recordings via direct spectral comparison. Optimisation is performed over time windows to ensure temporal coherence, introducing modifications to SOTA objective metrics. We integrate grey-box strategies, incorporating pYIN for fundamental frequency estimation and a ResNet-based neural network as a neural codebook to enhance the optimisation process. Our findings confirm the synthesiser’s ability to replicate human vocalisations, achieving superior performance over existing techniques in subjective evaluations. We refined the perceptual metric ViSQOL, providing a calibrated framework for future auditory assessments in physical speech synthesis. These contributions establish a methodology for articulatory parameter estimation, improving synthesis quality and expanding vocalisation modelling and analysis applications.
format Article
id doaj-art-c6c9924c3bbf4301b25a96cd1f2310fc
institution DOAJ
issn 1687-4722
language English
publishDate 2025-07-01
publisher SpringerOpen
record_format Article
series EURASIP Journal on Audio, Speech, and Music Processing
spelling doaj-art-c6c9924c3bbf4301b25a96cd1f2310fc2025-08-20T03:05:16ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47222025-07-012025111510.1186/s13636-025-00414-5Parameter optimisation for a physical model of the vocal systemMateo Cámara0José Luis Blanco1Joshua D. Reiss2Grupo de Aplicaciones del Procesado de Señal, Universidad Politécnica de MadridGrupo de Aplicaciones del Procesado de Señal, Universidad Politécnica de MadridCentre for Digital Music, Queen Mary University of LondonAbstract This study explores optimisation techniques for refining articulatory parameters in the Pink Trombone, a simplified physical speech synthesiser, to accurately emulate male and female vocal tract characteristics in non-speech sounds. We employ black-box and grey-box approaches, leveraging a genetic optimiser and Mel-spectrogram representations to infer articulatory configurations from human recordings via direct spectral comparison. Optimisation is performed over time windows to ensure temporal coherence, introducing modifications to SOTA objective metrics. We integrate grey-box strategies, incorporating pYIN for fundamental frequency estimation and a ResNet-based neural network as a neural codebook to enhance the optimisation process. Our findings confirm the synthesiser’s ability to replicate human vocalisations, achieving superior performance over existing techniques in subjective evaluations. We refined the perceptual metric ViSQOL, providing a calibrated framework for future auditory assessments in physical speech synthesis. These contributions establish a methodology for articulatory parameter estimation, improving synthesis quality and expanding vocalisation modelling and analysis applications.https://doi.org/10.1186/s13636-025-00414-5Analysis-by-synthesisAcoustic-to-articulatory inversionArticulatory copy synthesisPink TromboneProcedural audio
spellingShingle Mateo Cámara
José Luis Blanco
Joshua D. Reiss
Parameter optimisation for a physical model of the vocal system
EURASIP Journal on Audio, Speech, and Music Processing
Analysis-by-synthesis
Acoustic-to-articulatory inversion
Articulatory copy synthesis
Pink Trombone
Procedural audio
title Parameter optimisation for a physical model of the vocal system
title_full Parameter optimisation for a physical model of the vocal system
title_fullStr Parameter optimisation for a physical model of the vocal system
title_full_unstemmed Parameter optimisation for a physical model of the vocal system
title_short Parameter optimisation for a physical model of the vocal system
title_sort parameter optimisation for a physical model of the vocal system
topic Analysis-by-synthesis
Acoustic-to-articulatory inversion
Articulatory copy synthesis
Pink Trombone
Procedural audio
url https://doi.org/10.1186/s13636-025-00414-5
work_keys_str_mv AT mateocamara parameteroptimisationforaphysicalmodelofthevocalsystem
AT joseluisblanco parameteroptimisationforaphysicalmodelofthevocalsystem
AT joshuadreiss parameteroptimisationforaphysicalmodelofthevocalsystem