Enhancing Expressiveness in Vocal Conversational Agents through Large Language Model-Generated Speech Synthesis Markup Language

Advancements in speech synthesis have enabled more natural and engaging conversational agents, including neural text-to-speech models that can adjust speech inflections to produce distinct vocal styles. For example, Azure Neural Voices can adjust speech using Speech Synthesis Markup Language (SSML)...

Full description

Saved in:

Bibliographic Details
Main Author:	Joseph Salisbury
Format:	Article
Language:	English
Published:	LibraryPress@UF 2025-05-01
Series:	Proceedings of the International Florida Artificial Intelligence Research Society Conference
Subjects:	Expressive Speech Synthesis Large Language Models Human-Robot Interaction Behavior Generation Generative AI
Online Access:	https://journals.flvc.org/FLAIRS/article/view/138814
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Advancements in speech synthesis have enabled more natural and engaging conversational agents, including neural text-to-speech models that can adjust speech inflections to produce distinct vocal styles. For example, Azure Neural Voices can adjust speech using Speech Synthesis Markup Language (SSML) style tags, such as “affectionate,” “cheerful,” and “hopeful.” However, determining when to apply these tags in real-time interactions can be challenging and time-consuming. In this paper, we present a prompt-based approach that enables large language models (LLMs) to dynamically stylize their responses with appropriate SSML tags, enhancing synthesized speech expressiveness across 34 different styles. Using targeted probes designed to elicit specific speech styles, we demonstrate that LLM-generated responses are syntactically well-formed and correctly apply style tags to enhance expressiveness. This simple, customizable approach facilitates the rapid development of expressive vocal conversational agents.
ISSN:	2334-0754 2334-0762

Enhancing Expressiveness in Vocal Conversational Agents through Large Language Model-Generated Speech Synthesis Markup Language

Similar Items