Efficient Chinese-Malay Speech-Text Translation via Layer-Freezing Adaptation of Multimodal Foundation Models

This paper addresses the challenge of Chinese-Malay speech-to-text translation (S2TT), a crucial yet under-resourced language pair in computational linguistics. We introduce Layer-Freezing Adaptive Fine-Tuning (LFAFT), a parameter-efficient strategy that selectively freezes and unfreezes Transformer...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiao Liang, Yen-Min Jasmina Khaw, Soung-Yue Liew, Tien-Ping Tan, Donghong Qin
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10994436/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper addresses the challenge of Chinese-Malay speech-to-text translation (S2TT), a crucial yet under-resourced language pair in computational linguistics. We introduce Layer-Freezing Adaptive Fine-Tuning (LFAFT), a parameter-efficient strategy that selectively freezes and unfreezes Transformer layers to optimize model adaptation. LFAFT achieves an 11.8% relative improvement in BLEU-4 scores while reducing trainable parameters by 45% compared to full fine-tuning. Using our newly constructed Chinese-Malay parallel corpus, our approach improves BLEU scores from 1.86 to 9.30 (+7.44 points) compared to existing Chinese-Malay speech translation systems. This work not only establishes the first large-scale Chinese-Malay S2TT dataset but also presents an efficient adaptation method that makes low-resource speech translation more accessible and computationally feasible.
ISSN:2169-3536