Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data

In this paper, we propose a method for designing a classification model of speech emotional state based on the feature-map fusion of temporal convolutional network (TCN) and the pretrained convolutional neural networks (CNN) from Korean speech database. For this purpose, the proposed approach is com...

Full description

Saved in:

Bibliographic Details
Main Authors:	A-Hyeon Jo, Keun-Chang Kwak
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Feature-map fusion Korean speech data speech emotion recognition temporal convolutional network pretrained convolutional neural networks
Online Access:	https://ieeexplore.ieee.org/document/10854478/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832575584829440000
author	A-Hyeon Jo Keun-Chang Kwak
author_facet	A-Hyeon Jo Keun-Chang Kwak
author_sort	A-Hyeon Jo
collection	DOAJ
description	In this paper, we propose a method for designing a classification model of speech emotional state based on the feature-map fusion of temporal convolutional network (TCN) and the pretrained convolutional neural networks (CNN) from Korean speech database. For this purpose, the proposed approach is comprised of four main stages. In the first stage, we extract Mel-frequency cepstral coefficient (MFCC) and gammatone cepstral coefficient features (GFCC) in the frequency domain as well as log-Mel spectrogram in the time-frequency domain. From these features, the second stage performs training process using TCN and the yet another audio Mobile Net network (YAMNet), respectively. In the third stage, we perform feature-map fusion using canonical correlation analysis (CCA), stationary wavelet transform (SWT), and fuzzy c-means-based principal component averaging (FCMPCA), respectively. From these steps, speech emotion recognition model is effectively designed through the fusion model of TCN and YAMNet as well as feature-map fusion methods. Finally, we evaluate the performance comparison from five databases: the AI-Hub speech emotion dataset built in Korea and Korean speech emotional state classification dataset built from Chosun University as well as Emo-DB, RAVDESS, and TESS datasets. The experimental results showed that the proposed model revealed good performance in comparison to other previous works in most datasets.
format	Article
id	doaj-art-5ea0d76aece148b188b7ec161bf63edb
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-5ea0d76aece148b188b7ec161bf63edb2025-01-31T23:04:42ZengIEEEIEEE Access2169-35362025-01-0113199471996310.1109/ACCESS.2025.353417610854478Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion DataA-Hyeon Jo0https://orcid.org/0000-0003-1909-5655Keun-Chang Kwak1https://orcid.org/0000-0002-3821-0711Department of Electronic Engineering, Interdisciplinary Program in IT-Bio Convergence System, Chosun University, Gwangju, South KoreaSchool of Electronic Engineering, Chosun University, Gwangju, South KoreaIn this paper, we propose a method for designing a classification model of speech emotional state based on the feature-map fusion of temporal convolutional network (TCN) and the pretrained convolutional neural networks (CNN) from Korean speech database. For this purpose, the proposed approach is comprised of four main stages. In the first stage, we extract Mel-frequency cepstral coefficient (MFCC) and gammatone cepstral coefficient features (GFCC) in the frequency domain as well as log-Mel spectrogram in the time-frequency domain. From these features, the second stage performs training process using TCN and the yet another audio Mobile Net network (YAMNet), respectively. In the third stage, we perform feature-map fusion using canonical correlation analysis (CCA), stationary wavelet transform (SWT), and fuzzy c-means-based principal component averaging (FCMPCA), respectively. From these steps, speech emotion recognition model is effectively designed through the fusion model of TCN and YAMNet as well as feature-map fusion methods. Finally, we evaluate the performance comparison from five databases: the AI-Hub speech emotion dataset built in Korea and Korean speech emotional state classification dataset built from Chosun University as well as Emo-DB, RAVDESS, and TESS datasets. The experimental results showed that the proposed model revealed good performance in comparison to other previous works in most datasets.https://ieeexplore.ieee.org/document/10854478/Feature-map fusionKorean speech dataspeech emotion recognitiontemporal convolutional networkpretrained convolutional neural networks
spellingShingle	A-Hyeon Jo Keun-Chang Kwak Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data IEEE Access Feature-map fusion Korean speech data speech emotion recognition temporal convolutional network pretrained convolutional neural networks
title	Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data
title_full	Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data
title_fullStr	Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data
title_full_unstemmed	Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data
title_short	Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data
title_sort	classification of speech emotion state based on feature map fusion of tcn and pretrained cnn model from korean speech emotion data
topic	Feature-map fusion Korean speech data speech emotion recognition temporal convolutional network pretrained convolutional neural networks
url	https://ieeexplore.ieee.org/document/10854478/
work_keys_str_mv	AT ahyeonjo classificationofspeechemotionstatebasedonfeaturemapfusionoftcnandpretrainedcnnmodelfromkoreanspeechemotiondata AT keunchangkwak classificationofspeechemotionstatebasedonfeaturemapfusionoftcnandpretrainedcnnmodelfromkoreanspeechemotiondata

Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data

Similar Items