A speech recognition method with enhanced transformer decoder

A speech recognition method with enhanced transformer decoder

Abstract Addressing the issue that the Transformer decoder struggles to capture local features for monotonic alignment in speech recognition, and simultaneously incorporating language model cold fusion training into the decoder, an enhanced decoder-based speech recognition model is investigated. The...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hengbo Hu, Tong Niu, Zhenhua He
Format:	Article
Language:	English
Published:	SpringerOpen 2025-02-01
Series:	EURASIP Journal on Audio, Speech, and Music Processing
Subjects:	Cross-attention Transformer decoder Language model cold fusion
Online Access:	https://doi.org/10.1186/s13636-025-00394-6
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Comparison of Linear and Nonlinear Methods for Decoding Selective Attention to Speech From Ear-EEG Recordings
by: Mike D. Thornton, et al.
Published: (2025-01-01)

Enhanced Conformer-Based Speech Recognition via Model Fusion and Adaptive Decoding with Dynamic Rescoring
by: Junhao Geng, et al.
Published: (2024-12-01)

Integrating Pose Features and Cross-Relationship Learning for Human–Object Interaction Detection
by: Lang Wu, et al.
Published: (2025-03-01)

SDNet: Sandwich Decoder Network for Waterbody Segmentation in Remote Sensing Imagery
by: Hao Ni, et al.
Published: (2025-01-01)

Enhancing Emotion Recognition in Speech Based on Self-Supervised Learning: Cross-Attention Fusion of Acoustic and Semantic Features
by: Bashar M. Deeb, et al.
Published: (2025-01-01)

Lower Limb Motion Recognition Based on Surface Electromyography Decoding Using S-Transform Energy Concentration
by: Baoyu Li, et al.
Published: (2025-04-01)

CGFTNet: Content-Guided Frequency Domain Transform Network for Face Super-Resolution
by: Yeerlan Yekeben, et al.
Published: (2024-12-01)

Efficient guided inpainting of larger hole missing images based on hierarchical decoding network
by: Xiucheng Dong, et al.
Published: (2025-01-01)

Alternate encoder and dual decoder CNN-Transformer networks for medical image segmentation
by: Lin Zhang, et al.
Published: (2025-03-01)

AMEEGNet: attention-based multiscale EEGNet for effective motor imagery EEG decoding
by: Xuejian Wu, et al.
Published: (2025-01-01)

Deep Learning-Based Short Text Summarization: An Integrated BERT and Transformer Encoder–Decoder Approach
by: Fahd A. Ghanem, et al.
Published: (2025-04-01)

Beyond Granularity: Enhancing Continuous Sign Language Recognition with Granularity-Aware Feature Fusion and Attention Optimization
by: Yao Du, et al.
Published: (2024-10-01)

Vision Transformer and Language Model Based Radiology Report Generation
by: Mashood Mohammad Mohsan, et al.
Published: (2023-01-01)

SADNet: sustained attention decoding in a driving task by self-attention convolutional neural network
by: Shuzhong Lai, et al.
Published: (2024-12-01)

Auto-embedding transformer under multi-source information fusion for few-shot fault diagnosis
by: Bo Wang, et al.
Published: (2025-07-01)

Adaptive channel decoding method for polar codes
by: YE Maolin, et al.
Published: (2022-09-01)

LANet for medical image segmentation
by: Di Zhao, et al.
Published: (2025-04-01)

ResDecode: Accelerating Large Language Models Inference via Residual Decoding Heads
by: Ziqian Zeng, et al.
Published: (2025-06-01)

Multi-Class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum
by: Yuanming Zhang, et al.
Published: (2025-01-01)

EEG-powered cerebral transformer for athletic performance
by: Qikai Sun
Published: (2024-12-01)

Enhanced Panoramic Radiograph-Based Tooth Segmentation and Identification Using an Attention Gate-Based Encoder–Decoder Network
by: Salih Taha Alperen Özçelik, et al.
Published: (2024-12-01)

Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer
by: Mohanad Sameer, et al.
Published: (2023-03-01)

Research on Transformer Noise Suppression Based on Redundant Convolutional Encoder Decoder
by: Quanjin XIN, et al.
Published: (2023-04-01)

Multiscale fusion enhanced spiking neural network for invasive BCI neural signal decoding
by: Yu Song, et al.
Published: (2025-02-01)

SSCANL decoder based joint iterative detection and decoding algorithm
by: Chongyang LIU, et al.
Published: (2022-10-01)

SSCANL decoder based joint iterative detection and decoding algorithm
by: Chongyang LIU, et al.
Published: (2022-10-01)

Short-window EEG-based auditory attention decoding for neuroadaptive hearing support for smart healthcare
by: Ihtiram Raza Khan, et al.
Published: (2025-09-01)

Adaptive Transformer-Based Deep Learning Framework for Continuous Sign Language Recognition and Translation
by: Yahia Said, et al.
Published: (2025-03-01)

Hybrid Multi-Attention Network for Audio–Visual Emotion Recognition Through Multimodal Feature Fusion
by: Sathishkumar Moorthy, et al.
Published: (2025-03-01)

End-to-end feature fusion for jointly optimized speech enhancement and automatic speech recognition
by: Mohamed Medani, et al.
Published: (2025-07-01)

A comparative study of deep End-to-End Automatic Speech Recognition models for doctor-patient conversations in Polish in a real-life acoustic environment
by: Karolina Pondel-Sycz, et al.
Published: (2025-07-01)

Study on coded distributed fast Hadamard transform based non-binary LDPC code decoding algorithm
by: Rui LIU, et al.
Published: (2023-10-01)

Study on coded distributed fast Hadamard transform based non-binary LDPC code decoding algorithm
by: Rui LIU, et al.
Published: (2023-10-01)

Improving Windowed Decoding of SC LDPC Codes by Effective Decoding Termination, Message Reuse, and Amplification
by: Inayat Ali, et al.
Published: (2018-01-01)

Optical flow estimation based on global cross information and dynamic encoder–dynamic decoder
by: Haoxin Guo, et al.
Published: (2025-01-01)

Text-Enhanced Graph Attention Hashing for Cross-Modal Retrieval
by: Qiang Zou, et al.
Published: (2024-10-01)

AFT-SAM: Adaptive Fusion Transformer with a Sparse Attention Mechanism for Audio–Visual Speech Recognition
by: Na Che, et al.
Published: (2024-12-01)

Decoding covert visual attention to motion direction using graph theory features of EEG signals and quadratic discriminant analysis
by: Zeinab Rezaei, et al.
Published: (2024-12-01)

DRST-Net: A Dual-Branch Feature Fusion Network Combining ResNet50 and Swin Transformer for Welding Light Strip Recognition
by: Yuan Lu, et al.
Published: (2025-02-01)

Improved segmented CRC assisted puncturing Polar decoding
by: Yanhong NI, et al.
Published: (2019-03-01)