Recurring spoken term discovery in the zero-resource constraint using diagonal patterns

Spoken term discovery (STD) is challenging when a large volume of spoken content is generated without annotations. Unsupervised approaches resolve this challenge by directly computing pattern matches from the acoustic feature representation of the speech signal. However, this approach produces a lot...

Full description

Saved in:
Bibliographic Details
Main Authors: Sudhakar Pandiarajan, Sreenivasa Rao K, Pabitra Mitra
Format: Article
Language:English
Published: Cambridge University Press 2025-01-01
Series:Data-Centric Engineering
Subjects:
Online Access:https://www.cambridge.org/core/product/identifier/S2632673624000480/type/journal_article
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Spoken term discovery (STD) is challenging when a large volume of spoken content is generated without annotations. Unsupervised approaches resolve this challenge by directly computing pattern matches from the acoustic feature representation of the speech signal. However, this approach produces a lot of false alarms due to inherent speech variabilities, leading to performance degradation in the STD task. To overcome these challenges and improve performance, we propose a two-stage approach. First, we identify an acoustic feature representation that emphasizes spoken content irrespective of the variability challenge. Second, we employ the proposed diagonal pattern search to capture spoken term matches in an unsupervised way without any transcriptions. The proposed approach validated using Microsoft Speech Corpus for Low-Resource languages reveals that an 18% gain in hit ratio and 37% reduction in the false alarm ratio was achieved compared with the state-of-the-art methods.
ISSN:2632-6736