Audio Feature Space Analysis for Emotion Recognition from Spoken Sentences
An analysis of low-level feature space for emotion recognition from the speech is presented. The main goal was to determine how the statistical properties computed from contours of low-level features influence the emotion recognition from speech signals. We have conducted several experiments to redu...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Institute of Fundamental Technological Research Polish Academy of Sciences
2021-06-01
|
| Series: | Archives of Acoustics |
| Subjects: | |
| Online Access: | https://acoustics.ippt.pan.pl/index.php/aa/article/view/2833 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849247855638740992 |
|---|---|
| author | Lukasz SMIETANKA Tomasz MAKA |
| author_facet | Lukasz SMIETANKA Tomasz MAKA |
| author_sort | Lukasz SMIETANKA |
| collection | DOAJ |
| description | An analysis of low-level feature space for emotion recognition from the speech is presented. The main goal was to determine how the statistical properties computed from contours of low-level features influence the emotion recognition from speech signals. We have conducted several experiments to reduce and tune our initial feature set and to configure the classification stage. In the process of analysis of the audio feature space, we have employed the univariate feature selection using the chi-squared test. Then, in the first stage of classification, a default set of parameters was selected for every classifier. For the classifier that obtained the best results with the default settings, the hyperparameter tuning using cross-validation was exploited. In the result, we compared the classification results for two different languages to find out the difference between emotional states expressed in spoken sentences. The results show that from an initial feature set containing 3198 attributes we have obtained the dimensionality reduction about 80% using feature selection algorithm. The most dominant attributes selected at this stage based on the mel and bark frequency scales filterbanks with its variability described mainly by variance, median absolute deviation and standard and average deviations. Finally, the classification accuracy using tuned SVM classifier was equal to 72.5% and 88.27% for emotional spoken sentences in Polish and German languages, respectively. |
| format | Article |
| id | doaj-art-10fd9505735541fb8bab34e19cdfa13a |
| institution | Kabale University |
| issn | 0137-5075 2300-262X |
| language | English |
| publishDate | 2021-06-01 |
| publisher | Institute of Fundamental Technological Research Polish Academy of Sciences |
| record_format | Article |
| series | Archives of Acoustics |
| spelling | doaj-art-10fd9505735541fb8bab34e19cdfa13a2025-08-20T03:58:07ZengInstitute of Fundamental Technological Research Polish Academy of SciencesArchives of Acoustics0137-50752300-262X2021-06-0146210.24425/aoa.2021.136581Audio Feature Space Analysis for Emotion Recognition from Spoken SentencesLukasz SMIETANKA0Tomasz MAKA1West Pomeranian University of TechnologyWest Pomeranian University of TechnologyAn analysis of low-level feature space for emotion recognition from the speech is presented. The main goal was to determine how the statistical properties computed from contours of low-level features influence the emotion recognition from speech signals. We have conducted several experiments to reduce and tune our initial feature set and to configure the classification stage. In the process of analysis of the audio feature space, we have employed the univariate feature selection using the chi-squared test. Then, in the first stage of classification, a default set of parameters was selected for every classifier. For the classifier that obtained the best results with the default settings, the hyperparameter tuning using cross-validation was exploited. In the result, we compared the classification results for two different languages to find out the difference between emotional states expressed in spoken sentences. The results show that from an initial feature set containing 3198 attributes we have obtained the dimensionality reduction about 80% using feature selection algorithm. The most dominant attributes selected at this stage based on the mel and bark frequency scales filterbanks with its variability described mainly by variance, median absolute deviation and standard and average deviations. Finally, the classification accuracy using tuned SVM classifier was equal to 72.5% and 88.27% for emotional spoken sentences in Polish and German languages, respectively.https://acoustics.ippt.pan.pl/index.php/aa/article/view/2833speech analysisclassificationemotional speech |
| spellingShingle | Lukasz SMIETANKA Tomasz MAKA Audio Feature Space Analysis for Emotion Recognition from Spoken Sentences Archives of Acoustics speech analysis classification emotional speech |
| title | Audio Feature Space Analysis for Emotion Recognition from Spoken Sentences |
| title_full | Audio Feature Space Analysis for Emotion Recognition from Spoken Sentences |
| title_fullStr | Audio Feature Space Analysis for Emotion Recognition from Spoken Sentences |
| title_full_unstemmed | Audio Feature Space Analysis for Emotion Recognition from Spoken Sentences |
| title_short | Audio Feature Space Analysis for Emotion Recognition from Spoken Sentences |
| title_sort | audio feature space analysis for emotion recognition from spoken sentences |
| topic | speech analysis classification emotional speech |
| url | https://acoustics.ippt.pan.pl/index.php/aa/article/view/2833 |
| work_keys_str_mv | AT lukaszsmietanka audiofeaturespaceanalysisforemotionrecognitionfromspokensentences AT tomaszmaka audiofeaturespaceanalysisforemotionrecognitionfromspokensentences |