A Methodical Framework Utilizing Transforms and Biomimetic Intelligence-Based Optimization with Machine Learning for Speech Emotion Recognition

Speech emotion recognition (SER) tasks are conducted to extract emotional features from speech signals. The characteristic parameters are analyzed, and the speech emotional states are judged. At present, SER is an important aspect of artificial psychology and artificial intelligence, as it is widely...

Full description

Saved in:
Bibliographic Details
Main Authors: Sunil Kumar Prabhakar, Dong-Ok Won
Format: Article
Language:English
Published: MDPI AG 2024-08-01
Series:Biomimetics
Subjects:
Online Access:https://www.mdpi.com/2313-7673/9/9/513
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850258652595748864
author Sunil Kumar Prabhakar
Dong-Ok Won
author_facet Sunil Kumar Prabhakar
Dong-Ok Won
author_sort Sunil Kumar Prabhakar
collection DOAJ
description Speech emotion recognition (SER) tasks are conducted to extract emotional features from speech signals. The characteristic parameters are analyzed, and the speech emotional states are judged. At present, SER is an important aspect of artificial psychology and artificial intelligence, as it is widely implemented in many applications in the human–computer interface, medical, and entertainment fields. In this work, six transforms, namely, the synchrosqueezing transform, fractional Stockwell transform (FST), K-sine transform-dependent integrated system (KSTDIS), flexible analytic wavelet transform (FAWT), chirplet transform, and superlet transform, are initially applied to speech emotion signals. Once the transforms are applied and the features are extracted, the essential features are selected using three techniques: the Overlapping Information Feature Selection (OIFS) technique followed by two biomimetic intelligence-based optimization techniques, namely, Harris Hawks Optimization (HHO) and the Chameleon Swarm Algorithm (CSA). The selected features are then classified with the help of ten basic machine learning classifiers, with special emphasis given to the extreme learning machine (ELM) and twin extreme learning machine (TELM) classifiers. An experiment is conducted on four publicly available datasets, namely, EMOVO, RAVDESS, SAVEE, and Berlin Emo-DB. The best results are obtained as follows: the Chirplet + CSA + TELM combination obtains a classification accuracy of 80.63% on the EMOVO dataset, the FAWT + HHO + TELM combination obtains a classification accuracy of 85.76% on the RAVDESS dataset, the Chirplet + OIFS + TELM combination obtains a classification accuracy of 83.94% on the SAVEE dataset, and, finally, the KSTDIS + CSA + TELM combination obtains a classification accuracy of 89.77% on the Berlin Emo-DB dataset.
format Article
id doaj-art-9e2b1494268a4a6891eaa4145c98e943
institution OA Journals
issn 2313-7673
language English
publishDate 2024-08-01
publisher MDPI AG
record_format Article
series Biomimetics
spelling doaj-art-9e2b1494268a4a6891eaa4145c98e9432025-08-20T01:56:05ZengMDPI AGBiomimetics2313-76732024-08-019951310.3390/biomimetics9090513A Methodical Framework Utilizing Transforms and Biomimetic Intelligence-Based Optimization with Machine Learning for Speech Emotion RecognitionSunil Kumar Prabhakar0Dong-Ok Won1Department of Artificial Intelligence Convergence, Chuncheon 24252, Republic of KoreaDepartment of Artificial Intelligence Convergence, Chuncheon 24252, Republic of KoreaSpeech emotion recognition (SER) tasks are conducted to extract emotional features from speech signals. The characteristic parameters are analyzed, and the speech emotional states are judged. At present, SER is an important aspect of artificial psychology and artificial intelligence, as it is widely implemented in many applications in the human–computer interface, medical, and entertainment fields. In this work, six transforms, namely, the synchrosqueezing transform, fractional Stockwell transform (FST), K-sine transform-dependent integrated system (KSTDIS), flexible analytic wavelet transform (FAWT), chirplet transform, and superlet transform, are initially applied to speech emotion signals. Once the transforms are applied and the features are extracted, the essential features are selected using three techniques: the Overlapping Information Feature Selection (OIFS) technique followed by two biomimetic intelligence-based optimization techniques, namely, Harris Hawks Optimization (HHO) and the Chameleon Swarm Algorithm (CSA). The selected features are then classified with the help of ten basic machine learning classifiers, with special emphasis given to the extreme learning machine (ELM) and twin extreme learning machine (TELM) classifiers. An experiment is conducted on four publicly available datasets, namely, EMOVO, RAVDESS, SAVEE, and Berlin Emo-DB. The best results are obtained as follows: the Chirplet + CSA + TELM combination obtains a classification accuracy of 80.63% on the EMOVO dataset, the FAWT + HHO + TELM combination obtains a classification accuracy of 85.76% on the RAVDESS dataset, the Chirplet + OIFS + TELM combination obtains a classification accuracy of 83.94% on the SAVEE dataset, and, finally, the KSTDIS + CSA + TELM combination obtains a classification accuracy of 89.77% on the Berlin Emo-DB dataset.https://www.mdpi.com/2313-7673/9/9/513SERtransformsfeature selectionclassificationELM
spellingShingle Sunil Kumar Prabhakar
Dong-Ok Won
A Methodical Framework Utilizing Transforms and Biomimetic Intelligence-Based Optimization with Machine Learning for Speech Emotion Recognition
Biomimetics
SER
transforms
feature selection
classification
ELM
title A Methodical Framework Utilizing Transforms and Biomimetic Intelligence-Based Optimization with Machine Learning for Speech Emotion Recognition
title_full A Methodical Framework Utilizing Transforms and Biomimetic Intelligence-Based Optimization with Machine Learning for Speech Emotion Recognition
title_fullStr A Methodical Framework Utilizing Transforms and Biomimetic Intelligence-Based Optimization with Machine Learning for Speech Emotion Recognition
title_full_unstemmed A Methodical Framework Utilizing Transforms and Biomimetic Intelligence-Based Optimization with Machine Learning for Speech Emotion Recognition
title_short A Methodical Framework Utilizing Transforms and Biomimetic Intelligence-Based Optimization with Machine Learning for Speech Emotion Recognition
title_sort methodical framework utilizing transforms and biomimetic intelligence based optimization with machine learning for speech emotion recognition
topic SER
transforms
feature selection
classification
ELM
url https://www.mdpi.com/2313-7673/9/9/513
work_keys_str_mv AT sunilkumarprabhakar amethodicalframeworkutilizingtransformsandbiomimeticintelligencebasedoptimizationwithmachinelearningforspeechemotionrecognition
AT dongokwon amethodicalframeworkutilizingtransformsandbiomimeticintelligencebasedoptimizationwithmachinelearningforspeechemotionrecognition
AT sunilkumarprabhakar methodicalframeworkutilizingtransformsandbiomimeticintelligencebasedoptimizationwithmachinelearningforspeechemotionrecognition
AT dongokwon methodicalframeworkutilizingtransformsandbiomimeticintelligencebasedoptimizationwithmachinelearningforspeechemotionrecognition