Speech Emotion Recognition Using a Multi-Time-Scale Approach to Feature Aggregation and an Ensemble of SVM Classifiers

Due to its relevant real-life applications, the recognition of emotions from speech signals constitutes a popular research topic. In the traditional methods applied for speech emotion recognition, audio features are typically aggregated using a fixed-duration time window, potentially discarding info...

Full description

Saved in:
Bibliographic Details
Main Authors: Antonina STEFANOWSKA, Sławomir Krzysztof ZIELIŃSKI
Format: Article
Language:English
Published: Institute of Fundamental Technological Research Polish Academy of Sciences 2024-03-01
Series:Archives of Acoustics
Subjects:
Online Access:https://acoustics.ippt.pan.pl/index.php/aa/article/view/3790
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Due to its relevant real-life applications, the recognition of emotions from speech signals constitutes a popular research topic. In the traditional methods applied for speech emotion recognition, audio features are typically aggregated using a fixed-duration time window, potentially discarding information conveyed by speech at various signal durations. By contrast, in the proposed method, audio features are aggregated simultaneously using time windows of different lengths (a multi-time-scale approach), hence, potentially better utilizing information carried at phonemic, syllabic, and prosodic levels compared to the traditional approach. A genetic algorithm is employed to optimize the feature extraction procedure. The features aggregated at different time windows are subsequently classified by an ensemble of support vector machine (SVM) classifiers. To enhance the generalization property of the method, a data augmentation technique based on pitch shifting and time stretching is applied. According to the obtained results, the developed method outperforms the traditional one for the selected datasets, demonstrating the benefits of using a multi-time-scale approach to feature aggregation.
ISSN:0137-5075
2300-262X