A Methodical Framework Utilizing Transforms and Biomimetic Intelligence-Based Optimization with Machine Learning for Speech Emotion Recognition

Speech emotion recognition (SER) tasks are conducted to extract emotional features from speech signals. The characteristic parameters are analyzed, and the speech emotional states are judged. At present, SER is an important aspect of artificial psychology and artificial intelligence, as it is widely...

Full description

Saved in:

Bibliographic Details
Main Authors:	Sunil Kumar Prabhakar, Dong-Ok Won
Format:	Article
Language:	English
Published:	MDPI AG 2024-08-01
Series:	Biomimetics
Subjects:	SER transforms feature selection classification ELM
Online Access:	https://www.mdpi.com/2313-7673/9/9/513
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850258652595748864
author	Sunil Kumar Prabhakar Dong-Ok Won
author_facet	Sunil Kumar Prabhakar Dong-Ok Won
author_sort	Sunil Kumar Prabhakar
collection	DOAJ
description	Speech emotion recognition (SER) tasks are conducted to extract emotional features from speech signals. The characteristic parameters are analyzed, and the speech emotional states are judged. At present, SER is an important aspect of artificial psychology and artificial intelligence, as it is widely implemented in many applications in the human–computer interface, medical, and entertainment fields. In this work, six transforms, namely, the synchrosqueezing transform, fractional Stockwell transform (FST), K-sine transform-dependent integrated system (KSTDIS), flexible analytic wavelet transform (FAWT), chirplet transform, and superlet transform, are initially applied to speech emotion signals. Once the transforms are applied and the features are extracted, the essential features are selected using three techniques: the Overlapping Information Feature Selection (OIFS) technique followed by two biomimetic intelligence-based optimization techniques, namely, Harris Hawks Optimization (HHO) and the Chameleon Swarm Algorithm (CSA). The selected features are then classified with the help of ten basic machine learning classifiers, with special emphasis given to the extreme learning machine (ELM) and twin extreme learning machine (TELM) classifiers. An experiment is conducted on four publicly available datasets, namely, EMOVO, RAVDESS, SAVEE, and Berlin Emo-DB. The best results are obtained as follows: the Chirplet + CSA + TELM combination obtains a classification accuracy of 80.63% on the EMOVO dataset, the FAWT + HHO + TELM combination obtains a classification accuracy of 85.76% on the RAVDESS dataset, the Chirplet + OIFS + TELM combination obtains a classification accuracy of 83.94% on the SAVEE dataset, and, finally, the KSTDIS + CSA + TELM combination obtains a classification accuracy of 89.77% on the Berlin Emo-DB dataset.
format	Article
id	doaj-art-9e2b1494268a4a6891eaa4145c98e943
institution	OA Journals
issn	2313-7673
language	English
publishDate	2024-08-01
publisher	MDPI AG
record_format	Article
series	Biomimetics
spelling	doaj-art-9e2b1494268a4a6891eaa4145c98e9432025-08-20T01:56:05ZengMDPI AGBiomimetics2313-76732024-08-019951310.3390/biomimetics9090513A Methodical Framework Utilizing Transforms and Biomimetic Intelligence-Based Optimization with Machine Learning for Speech Emotion RecognitionSunil Kumar Prabhakar0Dong-Ok Won1Department of Artificial Intelligence Convergence, Chuncheon 24252, Republic of KoreaDepartment of Artificial Intelligence Convergence, Chuncheon 24252, Republic of KoreaSpeech emotion recognition (SER) tasks are conducted to extract emotional features from speech signals. The characteristic parameters are analyzed, and the speech emotional states are judged. At present, SER is an important aspect of artificial psychology and artificial intelligence, as it is widely implemented in many applications in the human–computer interface, medical, and entertainment fields. In this work, six transforms, namely, the synchrosqueezing transform, fractional Stockwell transform (FST), K-sine transform-dependent integrated system (KSTDIS), flexible analytic wavelet transform (FAWT), chirplet transform, and superlet transform, are initially applied to speech emotion signals. Once the transforms are applied and the features are extracted, the essential features are selected using three techniques: the Overlapping Information Feature Selection (OIFS) technique followed by two biomimetic intelligence-based optimization techniques, namely, Harris Hawks Optimization (HHO) and the Chameleon Swarm Algorithm (CSA). The selected features are then classified with the help of ten basic machine learning classifiers, with special emphasis given to the extreme learning machine (ELM) and twin extreme learning machine (TELM) classifiers. An experiment is conducted on four publicly available datasets, namely, EMOVO, RAVDESS, SAVEE, and Berlin Emo-DB. The best results are obtained as follows: the Chirplet + CSA + TELM combination obtains a classification accuracy of 80.63% on the EMOVO dataset, the FAWT + HHO + TELM combination obtains a classification accuracy of 85.76% on the RAVDESS dataset, the Chirplet + OIFS + TELM combination obtains a classification accuracy of 83.94% on the SAVEE dataset, and, finally, the KSTDIS + CSA + TELM combination obtains a classification accuracy of 89.77% on the Berlin Emo-DB dataset.https://www.mdpi.com/2313-7673/9/9/513SERtransformsfeature selectionclassificationELM
spellingShingle	Sunil Kumar Prabhakar Dong-Ok Won A Methodical Framework Utilizing Transforms and Biomimetic Intelligence-Based Optimization with Machine Learning for Speech Emotion Recognition Biomimetics SER transforms feature selection classification ELM
title	A Methodical Framework Utilizing Transforms and Biomimetic Intelligence-Based Optimization with Machine Learning for Speech Emotion Recognition
title_full	A Methodical Framework Utilizing Transforms and Biomimetic Intelligence-Based Optimization with Machine Learning for Speech Emotion Recognition
title_fullStr	A Methodical Framework Utilizing Transforms and Biomimetic Intelligence-Based Optimization with Machine Learning for Speech Emotion Recognition
title_full_unstemmed	A Methodical Framework Utilizing Transforms and Biomimetic Intelligence-Based Optimization with Machine Learning for Speech Emotion Recognition
title_short	A Methodical Framework Utilizing Transforms and Biomimetic Intelligence-Based Optimization with Machine Learning for Speech Emotion Recognition
title_sort	methodical framework utilizing transforms and biomimetic intelligence based optimization with machine learning for speech emotion recognition
topic	SER transforms feature selection classification ELM
url	https://www.mdpi.com/2313-7673/9/9/513
work_keys_str_mv	AT sunilkumarprabhakar amethodicalframeworkutilizingtransformsandbiomimeticintelligencebasedoptimizationwithmachinelearningforspeechemotionrecognition AT dongokwon amethodicalframeworkutilizingtransformsandbiomimeticintelligencebasedoptimizationwithmachinelearningforspeechemotionrecognition AT sunilkumarprabhakar methodicalframeworkutilizingtransformsandbiomimeticintelligencebasedoptimizationwithmachinelearningforspeechemotionrecognition AT dongokwon methodicalframeworkutilizingtransformsandbiomimeticintelligencebasedoptimizationwithmachinelearningforspeechemotionrecognition

A Methodical Framework Utilizing Transforms and Biomimetic Intelligence-Based Optimization with Machine Learning for Speech Emotion Recognition

Similar Items