Shuffling Augmented Decoupled Features for Multimodal Emotion Recognition
Multimodal emotion recognition (MER) aims to identify human emotions using data from multiple modalities. Despite promising advances in previous MER methods, their performance remains limited due to the small size of available datasets, a result of the challenges in collecting multimodal data. While...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11014057/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Multimodal emotion recognition (MER) aims to identify human emotions using data from multiple modalities. Despite promising advances in previous MER methods, their performance remains limited due to the small size of available datasets, a result of the challenges in collecting multimodal data. While data augmentation can address this issue, generating augmented multimodal data without altering the underlying emotional meaning remains particularly challenging. To tackle this problem, we introduce a decoupled feature augmentation method that automatically learns multimodal feature variations in a decoupled feature space for MER. Specifically, we decompose multimodal features into modality-invariant and modality-specific components and then augment each component within the decoupled feature space across multiple modalities. Unlike existing unimodal augmentation approaches, our method preserves cross-modal semantic consistency by jointly augmenting the decoupled components. To enhance model generalization and stability, we propose a learning strategy that gradually incorporates more diverse information by using a combined set of original and augmented decoupled features. Comprehensive experiments on two MER benchmarks demonstrate that our method outperforms or is comparable to several baseline methods. |
|---|---|
| ISSN: | 2169-3536 |