Domain Adapting Deep Reinforcement Learning for Real-World Speech Emotion Recognition

Speech-emotion recognition (SER) enables computers to engage with people in an emotionally intelligent way. The inability to adapt an existing model to a new domain is one of the significant limitations of SER methods. To overcome this challenge, domain adaptation techniques have been developed to t...

Full description

Saved in:
Bibliographic Details
Main Authors: Thejan Rajapakshe, Rajib Rana, Sara Khalifa, Bjorn W. Schuller
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10806705/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850103949862895616
author Thejan Rajapakshe
Rajib Rana
Sara Khalifa
Bjorn W. Schuller
author_facet Thejan Rajapakshe
Rajib Rana
Sara Khalifa
Bjorn W. Schuller
author_sort Thejan Rajapakshe
collection DOAJ
description Speech-emotion recognition (SER) enables computers to engage with people in an emotionally intelligent way. The inability to adapt an existing model to a new domain is one of the significant limitations of SER methods. To overcome this challenge, domain adaptation techniques have been developed to transfer the knowledge learnt by a model across domains. Although existing domain adaptation techniques have improved the performance of SER models across domains, there is a need to improve their ability to adapt to real-world situations where models can self-tune while deployed. This paper presents a deep reinforcement learning-based strategy (RL-DA) for adapting a pre-trained SER model to a real-world setting by interacting with the environment and collecting continuous feedback. The proposed RL-DA technique is evaluated on SER tasks, including cross-corpus and cross-language domain adaptation scenarios. Our evaluation results show that RL-DA achieves significant improvements of 11% and 14% in testing accuracy over a fully supervised baseline for cross-corpus and cross-language scenarios, respectively, in the real-world setting. This technique also outperforms the baseline model’s performance for both speaker independent and speaker dependent SER tasks.
format Article
id doaj-art-42c28a74866e471fa00ed48b75d970e6
institution DOAJ
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-42c28a74866e471fa00ed48b75d970e62025-08-20T02:39:25ZengIEEEIEEE Access2169-35362024-01-011219310119311410.1109/ACCESS.2024.351976110806705Domain Adapting Deep Reinforcement Learning for Real-World Speech Emotion RecognitionThejan Rajapakshe0https://orcid.org/0000-0003-3156-3327Rajib Rana1Sara Khalifa2https://orcid.org/0000-0002-3417-2834Bjorn W. Schuller3https://orcid.org/0000-0002-6478-8699School of Mathematics, Physics and Computing, University of Southern Queensland, Toowoomba, QLD, AustraliaSchool of Mathematics, Physics and Computing, University of Southern Queensland, Toowoomba, QLD, AustraliaSchool of Information Systems, Queensland University of Technology, Brisbane, QLD, AustraliaEmbedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, GermanySpeech-emotion recognition (SER) enables computers to engage with people in an emotionally intelligent way. The inability to adapt an existing model to a new domain is one of the significant limitations of SER methods. To overcome this challenge, domain adaptation techniques have been developed to transfer the knowledge learnt by a model across domains. Although existing domain adaptation techniques have improved the performance of SER models across domains, there is a need to improve their ability to adapt to real-world situations where models can self-tune while deployed. This paper presents a deep reinforcement learning-based strategy (RL-DA) for adapting a pre-trained SER model to a real-world setting by interacting with the environment and collecting continuous feedback. The proposed RL-DA technique is evaluated on SER tasks, including cross-corpus and cross-language domain adaptation scenarios. Our evaluation results show that RL-DA achieves significant improvements of 11% and 14% in testing accuracy over a fully supervised baseline for cross-corpus and cross-language scenarios, respectively, in the real-world setting. This technique also outperforms the baseline model’s performance for both speaker independent and speaker dependent SER tasks.https://ieeexplore.ieee.org/document/10806705/Reinforcement learningspeech emotion recognitiondomain adaptation
spellingShingle Thejan Rajapakshe
Rajib Rana
Sara Khalifa
Bjorn W. Schuller
Domain Adapting Deep Reinforcement Learning for Real-World Speech Emotion Recognition
IEEE Access
Reinforcement learning
speech emotion recognition
domain adaptation
title Domain Adapting Deep Reinforcement Learning for Real-World Speech Emotion Recognition
title_full Domain Adapting Deep Reinforcement Learning for Real-World Speech Emotion Recognition
title_fullStr Domain Adapting Deep Reinforcement Learning for Real-World Speech Emotion Recognition
title_full_unstemmed Domain Adapting Deep Reinforcement Learning for Real-World Speech Emotion Recognition
title_short Domain Adapting Deep Reinforcement Learning for Real-World Speech Emotion Recognition
title_sort domain adapting deep reinforcement learning for real world speech emotion recognition
topic Reinforcement learning
speech emotion recognition
domain adaptation
url https://ieeexplore.ieee.org/document/10806705/
work_keys_str_mv AT thejanrajapakshe domainadaptingdeepreinforcementlearningforrealworldspeechemotionrecognition
AT rajibrana domainadaptingdeepreinforcementlearningforrealworldspeechemotionrecognition
AT sarakhalifa domainadaptingdeepreinforcementlearningforrealworldspeechemotionrecognition
AT bjornwschuller domainadaptingdeepreinforcementlearningforrealworldspeechemotionrecognition