Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data
In this paper, we propose a method for designing a classification model of speech emotional state based on the feature-map fusion of temporal convolutional network (TCN) and the pretrained convolutional neural networks (CNN) from Korean speech database. For this purpose, the proposed approach is com...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10854478/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832575584829440000 |
---|---|
author | A-Hyeon Jo Keun-Chang Kwak |
author_facet | A-Hyeon Jo Keun-Chang Kwak |
author_sort | A-Hyeon Jo |
collection | DOAJ |
description | In this paper, we propose a method for designing a classification model of speech emotional state based on the feature-map fusion of temporal convolutional network (TCN) and the pretrained convolutional neural networks (CNN) from Korean speech database. For this purpose, the proposed approach is comprised of four main stages. In the first stage, we extract Mel-frequency cepstral coefficient (MFCC) and gammatone cepstral coefficient features (GFCC) in the frequency domain as well as log-Mel spectrogram in the time-frequency domain. From these features, the second stage performs training process using TCN and the yet another audio Mobile Net network (YAMNet), respectively. In the third stage, we perform feature-map fusion using canonical correlation analysis (CCA), stationary wavelet transform (SWT), and fuzzy c-means-based principal component averaging (FCMPCA), respectively. From these steps, speech emotion recognition model is effectively designed through the fusion model of TCN and YAMNet as well as feature-map fusion methods. Finally, we evaluate the performance comparison from five databases: the AI-Hub speech emotion dataset built in Korea and Korean speech emotional state classification dataset built from Chosun University as well as Emo-DB, RAVDESS, and TESS datasets. The experimental results showed that the proposed model revealed good performance in comparison to other previous works in most datasets. |
format | Article |
id | doaj-art-5ea0d76aece148b188b7ec161bf63edb |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-5ea0d76aece148b188b7ec161bf63edb2025-01-31T23:04:42ZengIEEEIEEE Access2169-35362025-01-0113199471996310.1109/ACCESS.2025.353417610854478Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion DataA-Hyeon Jo0https://orcid.org/0000-0003-1909-5655Keun-Chang Kwak1https://orcid.org/0000-0002-3821-0711Department of Electronic Engineering, Interdisciplinary Program in IT-Bio Convergence System, Chosun University, Gwangju, South KoreaSchool of Electronic Engineering, Chosun University, Gwangju, South KoreaIn this paper, we propose a method for designing a classification model of speech emotional state based on the feature-map fusion of temporal convolutional network (TCN) and the pretrained convolutional neural networks (CNN) from Korean speech database. For this purpose, the proposed approach is comprised of four main stages. In the first stage, we extract Mel-frequency cepstral coefficient (MFCC) and gammatone cepstral coefficient features (GFCC) in the frequency domain as well as log-Mel spectrogram in the time-frequency domain. From these features, the second stage performs training process using TCN and the yet another audio Mobile Net network (YAMNet), respectively. In the third stage, we perform feature-map fusion using canonical correlation analysis (CCA), stationary wavelet transform (SWT), and fuzzy c-means-based principal component averaging (FCMPCA), respectively. From these steps, speech emotion recognition model is effectively designed through the fusion model of TCN and YAMNet as well as feature-map fusion methods. Finally, we evaluate the performance comparison from five databases: the AI-Hub speech emotion dataset built in Korea and Korean speech emotional state classification dataset built from Chosun University as well as Emo-DB, RAVDESS, and TESS datasets. The experimental results showed that the proposed model revealed good performance in comparison to other previous works in most datasets.https://ieeexplore.ieee.org/document/10854478/Feature-map fusionKorean speech dataspeech emotion recognitiontemporal convolutional networkpretrained convolutional neural networks |
spellingShingle | A-Hyeon Jo Keun-Chang Kwak Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data IEEE Access Feature-map fusion Korean speech data speech emotion recognition temporal convolutional network pretrained convolutional neural networks |
title | Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data |
title_full | Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data |
title_fullStr | Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data |
title_full_unstemmed | Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data |
title_short | Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data |
title_sort | classification of speech emotion state based on feature map fusion of tcn and pretrained cnn model from korean speech emotion data |
topic | Feature-map fusion Korean speech data speech emotion recognition temporal convolutional network pretrained convolutional neural networks |
url | https://ieeexplore.ieee.org/document/10854478/ |
work_keys_str_mv | AT ahyeonjo classificationofspeechemotionstatebasedonfeaturemapfusionoftcnandpretrainedcnnmodelfromkoreanspeechemotiondata AT keunchangkwak classificationofspeechemotionstatebasedonfeaturemapfusionoftcnandpretrainedcnnmodelfromkoreanspeechemotiondata |