Domain Adapting Deep Reinforcement Learning for Real-World Speech Emotion Recognition
Speech-emotion recognition (SER) enables computers to engage with people in an emotionally intelligent way. The inability to adapt an existing model to a new domain is one of the significant limitations of SER methods. To overcome this challenge, domain adaptation techniques have been developed to t...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10806705/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850103949862895616 |
|---|---|
| author | Thejan Rajapakshe Rajib Rana Sara Khalifa Bjorn W. Schuller |
| author_facet | Thejan Rajapakshe Rajib Rana Sara Khalifa Bjorn W. Schuller |
| author_sort | Thejan Rajapakshe |
| collection | DOAJ |
| description | Speech-emotion recognition (SER) enables computers to engage with people in an emotionally intelligent way. The inability to adapt an existing model to a new domain is one of the significant limitations of SER methods. To overcome this challenge, domain adaptation techniques have been developed to transfer the knowledge learnt by a model across domains. Although existing domain adaptation techniques have improved the performance of SER models across domains, there is a need to improve their ability to adapt to real-world situations where models can self-tune while deployed. This paper presents a deep reinforcement learning-based strategy (RL-DA) for adapting a pre-trained SER model to a real-world setting by interacting with the environment and collecting continuous feedback. The proposed RL-DA technique is evaluated on SER tasks, including cross-corpus and cross-language domain adaptation scenarios. Our evaluation results show that RL-DA achieves significant improvements of 11% and 14% in testing accuracy over a fully supervised baseline for cross-corpus and cross-language scenarios, respectively, in the real-world setting. This technique also outperforms the baseline model’s performance for both speaker independent and speaker dependent SER tasks. |
| format | Article |
| id | doaj-art-42c28a74866e471fa00ed48b75d970e6 |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-42c28a74866e471fa00ed48b75d970e62025-08-20T02:39:25ZengIEEEIEEE Access2169-35362024-01-011219310119311410.1109/ACCESS.2024.351976110806705Domain Adapting Deep Reinforcement Learning for Real-World Speech Emotion RecognitionThejan Rajapakshe0https://orcid.org/0000-0003-3156-3327Rajib Rana1Sara Khalifa2https://orcid.org/0000-0002-3417-2834Bjorn W. Schuller3https://orcid.org/0000-0002-6478-8699School of Mathematics, Physics and Computing, University of Southern Queensland, Toowoomba, QLD, AustraliaSchool of Mathematics, Physics and Computing, University of Southern Queensland, Toowoomba, QLD, AustraliaSchool of Information Systems, Queensland University of Technology, Brisbane, QLD, AustraliaEmbedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, GermanySpeech-emotion recognition (SER) enables computers to engage with people in an emotionally intelligent way. The inability to adapt an existing model to a new domain is one of the significant limitations of SER methods. To overcome this challenge, domain adaptation techniques have been developed to transfer the knowledge learnt by a model across domains. Although existing domain adaptation techniques have improved the performance of SER models across domains, there is a need to improve their ability to adapt to real-world situations where models can self-tune while deployed. This paper presents a deep reinforcement learning-based strategy (RL-DA) for adapting a pre-trained SER model to a real-world setting by interacting with the environment and collecting continuous feedback. The proposed RL-DA technique is evaluated on SER tasks, including cross-corpus and cross-language domain adaptation scenarios. Our evaluation results show that RL-DA achieves significant improvements of 11% and 14% in testing accuracy over a fully supervised baseline for cross-corpus and cross-language scenarios, respectively, in the real-world setting. This technique also outperforms the baseline model’s performance for both speaker independent and speaker dependent SER tasks.https://ieeexplore.ieee.org/document/10806705/Reinforcement learningspeech emotion recognitiondomain adaptation |
| spellingShingle | Thejan Rajapakshe Rajib Rana Sara Khalifa Bjorn W. Schuller Domain Adapting Deep Reinforcement Learning for Real-World Speech Emotion Recognition IEEE Access Reinforcement learning speech emotion recognition domain adaptation |
| title | Domain Adapting Deep Reinforcement Learning for Real-World Speech Emotion Recognition |
| title_full | Domain Adapting Deep Reinforcement Learning for Real-World Speech Emotion Recognition |
| title_fullStr | Domain Adapting Deep Reinforcement Learning for Real-World Speech Emotion Recognition |
| title_full_unstemmed | Domain Adapting Deep Reinforcement Learning for Real-World Speech Emotion Recognition |
| title_short | Domain Adapting Deep Reinforcement Learning for Real-World Speech Emotion Recognition |
| title_sort | domain adapting deep reinforcement learning for real world speech emotion recognition |
| topic | Reinforcement learning speech emotion recognition domain adaptation |
| url | https://ieeexplore.ieee.org/document/10806705/ |
| work_keys_str_mv | AT thejanrajapakshe domainadaptingdeepreinforcementlearningforrealworldspeechemotionrecognition AT rajibrana domainadaptingdeepreinforcementlearningforrealworldspeechemotionrecognition AT sarakhalifa domainadaptingdeepreinforcementlearningforrealworldspeechemotionrecognition AT bjornwschuller domainadaptingdeepreinforcementlearningforrealworldspeechemotionrecognition |