Knowledge enhancement for speech emotion recognition via multi-level acoustic feature
Speech emotion recognition (SER) has become an increasingly attractive machine learning task for domain applications. It aims to improve the discriminative capacity of speech emotion utilising a certain type of features (e.g. MFCC, Spectrograms, Wav2vec2) or multi-type combination features. However,...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Taylor & Francis Group
2024-12-01
|
| Series: | Connection Science |
| Subjects: | |
| Online Access: | https://www.tandfonline.com/doi/10.1080/09540091.2024.2312103 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850214446002077696 |
|---|---|
| author | Huan Zhao Nianxin Huang Haijiao Chen |
| author_facet | Huan Zhao Nianxin Huang Haijiao Chen |
| author_sort | Huan Zhao |
| collection | DOAJ |
| description | Speech emotion recognition (SER) has become an increasingly attractive machine learning task for domain applications. It aims to improve the discriminative capacity of speech emotion utilising a certain type of features (e.g. MFCC, Spectrograms, Wav2vec2) or multi-type combination features. However, the potential of acoustic-related deep features is frequently overlooked in existing approaches that rely solely on a single type of feature or employ a basic combination of multiple feature types. To address this challenge, a multi-level acoustic feature cross-fusion approach is proposed, aiming to compensate for missing information between various features. It helps to enhance the SER performance by integrating different types of knowledge through the cross-fusion mechanism. Moreover, multi-task learning is utilised to share useful information through gender recognition, which can also obtain multiple common representations in a fine-grained space. Experimental results show that the fusion approach can capture the inner connections between multilevel acoustic features to refine the knowledge. The SOTA results were obtained under the same experimental conditions. |
| format | Article |
| id | doaj-art-856a9be7d7d74df7942ade9556c59a5b |
| institution | OA Journals |
| issn | 0954-0091 1360-0494 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | Taylor & Francis Group |
| record_format | Article |
| series | Connection Science |
| spelling | doaj-art-856a9be7d7d74df7942ade9556c59a5b2025-08-20T02:08:54ZengTaylor & Francis GroupConnection Science0954-00911360-04942024-12-0136110.1080/09540091.2024.2312103Knowledge enhancement for speech emotion recognition via multi-level acoustic featureHuan Zhao0Nianxin Huang1Haijiao Chen2College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, People's Republic of ChinaCollege of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, People's Republic of ChinaCollege of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, People's Republic of ChinaSpeech emotion recognition (SER) has become an increasingly attractive machine learning task for domain applications. It aims to improve the discriminative capacity of speech emotion utilising a certain type of features (e.g. MFCC, Spectrograms, Wav2vec2) or multi-type combination features. However, the potential of acoustic-related deep features is frequently overlooked in existing approaches that rely solely on a single type of feature or employ a basic combination of multiple feature types. To address this challenge, a multi-level acoustic feature cross-fusion approach is proposed, aiming to compensate for missing information between various features. It helps to enhance the SER performance by integrating different types of knowledge through the cross-fusion mechanism. Moreover, multi-task learning is utilised to share useful information through gender recognition, which can also obtain multiple common representations in a fine-grained space. Experimental results show that the fusion approach can capture the inner connections between multilevel acoustic features to refine the knowledge. The SOTA results were obtained under the same experimental conditions.https://www.tandfonline.com/doi/10.1080/09540091.2024.2312103Cross-fusionmulti-level featuremulti-task learningspeech emotion recognition |
| spellingShingle | Huan Zhao Nianxin Huang Haijiao Chen Knowledge enhancement for speech emotion recognition via multi-level acoustic feature Connection Science Cross-fusion multi-level feature multi-task learning speech emotion recognition |
| title | Knowledge enhancement for speech emotion recognition via multi-level acoustic feature |
| title_full | Knowledge enhancement for speech emotion recognition via multi-level acoustic feature |
| title_fullStr | Knowledge enhancement for speech emotion recognition via multi-level acoustic feature |
| title_full_unstemmed | Knowledge enhancement for speech emotion recognition via multi-level acoustic feature |
| title_short | Knowledge enhancement for speech emotion recognition via multi-level acoustic feature |
| title_sort | knowledge enhancement for speech emotion recognition via multi level acoustic feature |
| topic | Cross-fusion multi-level feature multi-task learning speech emotion recognition |
| url | https://www.tandfonline.com/doi/10.1080/09540091.2024.2312103 |
| work_keys_str_mv | AT huanzhao knowledgeenhancementforspeechemotionrecognitionviamultilevelacousticfeature AT nianxinhuang knowledgeenhancementforspeechemotionrecognitionviamultilevelacousticfeature AT haijiaochen knowledgeenhancementforspeechemotionrecognitionviamultilevelacousticfeature |