Audio Feature Space Analysis for Emotion Recognition from Spoken Sentences

An analysis of low-level feature space for emotion recognition from the speech is presented. The main goal was to determine how the statistical properties computed from contours of low-level features influence the emotion recognition from speech signals. We have conducted several experiments to redu...

Full description

Saved in:
Bibliographic Details
Main Authors: Lukasz SMIETANKA, Tomasz MAKA
Format: Article
Language:English
Published: Institute of Fundamental Technological Research Polish Academy of Sciences 2021-06-01
Series:Archives of Acoustics
Subjects:
Online Access:https://acoustics.ippt.pan.pl/index.php/aa/article/view/2833
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849247855638740992
author Lukasz SMIETANKA
Tomasz MAKA
author_facet Lukasz SMIETANKA
Tomasz MAKA
author_sort Lukasz SMIETANKA
collection DOAJ
description An analysis of low-level feature space for emotion recognition from the speech is presented. The main goal was to determine how the statistical properties computed from contours of low-level features influence the emotion recognition from speech signals. We have conducted several experiments to reduce and tune our initial feature set and to configure the classification stage. In the process of analysis of the audio feature space, we have employed the univariate feature selection using the chi-squared test. Then, in the first stage of classification, a default set of parameters was selected for every classifier. For the classifier that obtained the best results with the default settings, the hyperparameter tuning using cross-validation was exploited. In the result, we compared the classification results for two different languages to find out the difference between emotional states expressed in spoken sentences. The results show that from an initial feature set containing 3198 attributes we have obtained the dimensionality reduction about 80% using feature selection algorithm. The most dominant attributes selected at this stage based on the mel and bark frequency scales filterbanks with its variability described mainly by variance, median absolute deviation and standard and average deviations. Finally, the classification accuracy using tuned SVM classifier was equal to 72.5% and 88.27% for emotional spoken sentences in Polish and German languages, respectively.
format Article
id doaj-art-10fd9505735541fb8bab34e19cdfa13a
institution Kabale University
issn 0137-5075
2300-262X
language English
publishDate 2021-06-01
publisher Institute of Fundamental Technological Research Polish Academy of Sciences
record_format Article
series Archives of Acoustics
spelling doaj-art-10fd9505735541fb8bab34e19cdfa13a2025-08-20T03:58:07ZengInstitute of Fundamental Technological Research Polish Academy of SciencesArchives of Acoustics0137-50752300-262X2021-06-0146210.24425/aoa.2021.136581Audio Feature Space Analysis for Emotion Recognition from Spoken SentencesLukasz SMIETANKA0Tomasz MAKA1West Pomeranian University of TechnologyWest Pomeranian University of TechnologyAn analysis of low-level feature space for emotion recognition from the speech is presented. The main goal was to determine how the statistical properties computed from contours of low-level features influence the emotion recognition from speech signals. We have conducted several experiments to reduce and tune our initial feature set and to configure the classification stage. In the process of analysis of the audio feature space, we have employed the univariate feature selection using the chi-squared test. Then, in the first stage of classification, a default set of parameters was selected for every classifier. For the classifier that obtained the best results with the default settings, the hyperparameter tuning using cross-validation was exploited. In the result, we compared the classification results for two different languages to find out the difference between emotional states expressed in spoken sentences. The results show that from an initial feature set containing 3198 attributes we have obtained the dimensionality reduction about 80% using feature selection algorithm. The most dominant attributes selected at this stage based on the mel and bark frequency scales filterbanks with its variability described mainly by variance, median absolute deviation and standard and average deviations. Finally, the classification accuracy using tuned SVM classifier was equal to 72.5% and 88.27% for emotional spoken sentences in Polish and German languages, respectively.https://acoustics.ippt.pan.pl/index.php/aa/article/view/2833speech analysisclassificationemotional speech
spellingShingle Lukasz SMIETANKA
Tomasz MAKA
Audio Feature Space Analysis for Emotion Recognition from Spoken Sentences
Archives of Acoustics
speech analysis
classification
emotional speech
title Audio Feature Space Analysis for Emotion Recognition from Spoken Sentences
title_full Audio Feature Space Analysis for Emotion Recognition from Spoken Sentences
title_fullStr Audio Feature Space Analysis for Emotion Recognition from Spoken Sentences
title_full_unstemmed Audio Feature Space Analysis for Emotion Recognition from Spoken Sentences
title_short Audio Feature Space Analysis for Emotion Recognition from Spoken Sentences
title_sort audio feature space analysis for emotion recognition from spoken sentences
topic speech analysis
classification
emotional speech
url https://acoustics.ippt.pan.pl/index.php/aa/article/view/2833
work_keys_str_mv AT lukaszsmietanka audiofeaturespaceanalysisforemotionrecognitionfromspokensentences
AT tomaszmaka audiofeaturespaceanalysisforemotionrecognitionfromspokensentences