Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data

In this paper, we propose a method for designing a classification model of speech emotional state based on the feature-map fusion of temporal convolutional network (TCN) and the pretrained convolutional neural networks (CNN) from Korean speech database. For this purpose, the proposed approach is com...

Full description

Saved in:
Bibliographic Details
Main Authors: A-Hyeon Jo, Keun-Chang Kwak
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10854478/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832575584829440000
author A-Hyeon Jo
Keun-Chang Kwak
author_facet A-Hyeon Jo
Keun-Chang Kwak
author_sort A-Hyeon Jo
collection DOAJ
description In this paper, we propose a method for designing a classification model of speech emotional state based on the feature-map fusion of temporal convolutional network (TCN) and the pretrained convolutional neural networks (CNN) from Korean speech database. For this purpose, the proposed approach is comprised of four main stages. In the first stage, we extract Mel-frequency cepstral coefficient (MFCC) and gammatone cepstral coefficient features (GFCC) in the frequency domain as well as log-Mel spectrogram in the time-frequency domain. From these features, the second stage performs training process using TCN and the yet another audio Mobile Net network (YAMNet), respectively. In the third stage, we perform feature-map fusion using canonical correlation analysis (CCA), stationary wavelet transform (SWT), and fuzzy c-means-based principal component averaging (FCMPCA), respectively. From these steps, speech emotion recognition model is effectively designed through the fusion model of TCN and YAMNet as well as feature-map fusion methods. Finally, we evaluate the performance comparison from five databases: the AI-Hub speech emotion dataset built in Korea and Korean speech emotional state classification dataset built from Chosun University as well as Emo-DB, RAVDESS, and TESS datasets. The experimental results showed that the proposed model revealed good performance in comparison to other previous works in most datasets.
format Article
id doaj-art-5ea0d76aece148b188b7ec161bf63edb
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-5ea0d76aece148b188b7ec161bf63edb2025-01-31T23:04:42ZengIEEEIEEE Access2169-35362025-01-0113199471996310.1109/ACCESS.2025.353417610854478Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion DataA-Hyeon Jo0https://orcid.org/0000-0003-1909-5655Keun-Chang Kwak1https://orcid.org/0000-0002-3821-0711Department of Electronic Engineering, Interdisciplinary Program in IT-Bio Convergence System, Chosun University, Gwangju, South KoreaSchool of Electronic Engineering, Chosun University, Gwangju, South KoreaIn this paper, we propose a method for designing a classification model of speech emotional state based on the feature-map fusion of temporal convolutional network (TCN) and the pretrained convolutional neural networks (CNN) from Korean speech database. For this purpose, the proposed approach is comprised of four main stages. In the first stage, we extract Mel-frequency cepstral coefficient (MFCC) and gammatone cepstral coefficient features (GFCC) in the frequency domain as well as log-Mel spectrogram in the time-frequency domain. From these features, the second stage performs training process using TCN and the yet another audio Mobile Net network (YAMNet), respectively. In the third stage, we perform feature-map fusion using canonical correlation analysis (CCA), stationary wavelet transform (SWT), and fuzzy c-means-based principal component averaging (FCMPCA), respectively. From these steps, speech emotion recognition model is effectively designed through the fusion model of TCN and YAMNet as well as feature-map fusion methods. Finally, we evaluate the performance comparison from five databases: the AI-Hub speech emotion dataset built in Korea and Korean speech emotional state classification dataset built from Chosun University as well as Emo-DB, RAVDESS, and TESS datasets. The experimental results showed that the proposed model revealed good performance in comparison to other previous works in most datasets.https://ieeexplore.ieee.org/document/10854478/Feature-map fusionKorean speech dataspeech emotion recognitiontemporal convolutional networkpretrained convolutional neural networks
spellingShingle A-Hyeon Jo
Keun-Chang Kwak
Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data
IEEE Access
Feature-map fusion
Korean speech data
speech emotion recognition
temporal convolutional network
pretrained convolutional neural networks
title Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data
title_full Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data
title_fullStr Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data
title_full_unstemmed Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data
title_short Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data
title_sort classification of speech emotion state based on feature map fusion of tcn and pretrained cnn model from korean speech emotion data
topic Feature-map fusion
Korean speech data
speech emotion recognition
temporal convolutional network
pretrained convolutional neural networks
url https://ieeexplore.ieee.org/document/10854478/
work_keys_str_mv AT ahyeonjo classificationofspeechemotionstatebasedonfeaturemapfusionoftcnandpretrainedcnnmodelfromkoreanspeechemotiondata
AT keunchangkwak classificationofspeechemotionstatebasedonfeaturemapfusionoftcnandpretrainedcnnmodelfromkoreanspeechemotiondata