A Lightweight Tri-Stream Feature Fusion Network for Speech Emotion Recognition
Understanding and modeling emotions from speech is a fundamental challenge in speech processing and a key enabler of emotionally intelligent human-computer interaction. However, defining and extracting robust emotional features remains difficult due to the nuanced and context-dependent nature of hum...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11075664/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Understanding and modeling emotions from speech is a fundamental challenge in speech processing and a key enabler of emotionally intelligent human-computer interaction. However, defining and extracting robust emotional features remains difficult due to the nuanced and context-dependent nature of human affect. Existing approaches, focusing on prosodic features or deep representations from pre-trained models, often struggle to capture the full spectrum of emotional cues present in real-world speech. To address these limitations, we introduce Tri-Stream, a novel speech emotion recognition (SER) framework that concurrently leverages spectrogram and waveform modalities. Tri-Stream integrates three complementary feature streams: spectral patterns extracted via a Swin Transformer, deep acoustic representations from HuBERT, and engineered prosodic features capturing rhythmic information. These streams are fused and processed by a GRU-based classifier for final emotion prediction. Extensive evaluations on four benchmark datasets (IEMOCAP, SAVEE, RAVDESS, EMO-DB) demonstrate that Tri-Stream consistently outperforms state-of-the-art baselines, achieving 79.86% unweighted accuracy on IEMOCAP and best performance on the remaining datasets, highlighting its effectiveness and robustness across diverse emotional speech corpora. |
|---|---|
| ISSN: | 2169-3536 |