Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models

Recent advancements in <italic>dynamic facial expression recognition</italic> (DFER) have predominantly utilized static features, which are theoretically inferior to dynamic features. However, models fully trained with dynamic features often suffer from over-fitting due to the limited si...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ali N. Salman, Karen Rosero, Lucas Goncalves, Carlos Busso
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Open Journal of Signal Processing
Subjects:	Video facial expression recognition emotion recognition affective computing ensemble model
Online Access:	https://ieeexplore.ieee.org/document/10843404/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850222539342610432
author	Ali N. Salman Karen Rosero Lucas Goncalves Carlos Busso
author_facet	Ali N. Salman Karen Rosero Lucas Goncalves Carlos Busso
author_sort	Ali N. Salman
collection	DOAJ
description	Recent advancements in <italic>dynamic facial expression recognition</italic> (DFER) have predominantly utilized static features, which are theoretically inferior to dynamic features. However, models fully trained with dynamic features often suffer from over-fitting due to the limited size and diversity of the training data for fully <italic>supervised learning</italic> (SL) models. A significant challenge with existing models based on static features in recognizing emotions from videos is their tendency to form biased representations, often unbalanced or skewed towards more prevalent or basic emotional features present in the static domain, especially with posed expression. Therefore, this approach under-represents the nuances present in the dynamic domain. To address this issue, our study introduces a novel approach that we refer to as <italic>mixture of emotion-dependent experts</italic> (MoEDE). This strategy relies on emotion-specific feature extractors to produce more diverse emotional static features to train DFER systems. Each emotion-dependent expert focuses exclusively on one emotional category, formulating the problem as binary classifiers. Our DFER model combines these static representations with recurrent models, modeling their temporal relationships. The proposed MoEDE DFER approach achieves a macro F1-score of 74.5%, marking a significant improvement over the baseline, which presented a macro F1-score of 70.9% . The DFER baseline is similar to MoEDE, but it uses a single static feature extractor rather than stacked extractors. Additionally, our proposed approach shows consistent improvements compared to other four popular baselines.
format	Article
id	doaj-art-428cfca3a8894db3a5125249b520ea72
institution	OA Journals
issn	2644-1322
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Open Journal of Signal Processing
spelling	doaj-art-428cfca3a8894db3a5125249b520ea722025-08-20T02:06:17ZengIEEEIEEE Open Journal of Signal Processing2644-13222025-01-01632333210.1109/OJSP.2025.353079310843404Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert ModelsAli N. Salman0https://orcid.org/0000-0003-1395-0382Karen Rosero1Lucas Goncalves2https://orcid.org/0000-0001-9613-1002Carlos Busso3https://orcid.org/0000-0002-4075-4072The University of Texas at Dallas, Richardson, TX, USALanguage Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USAThe University of Texas at Dallas, Richardson, TX, USAThe University of Texas at Dallas, Richardson, TX, USARecent advancements in <italic>dynamic facial expression recognition</italic> (DFER) have predominantly utilized static features, which are theoretically inferior to dynamic features. However, models fully trained with dynamic features often suffer from over-fitting due to the limited size and diversity of the training data for fully <italic>supervised learning</italic> (SL) models. A significant challenge with existing models based on static features in recognizing emotions from videos is their tendency to form biased representations, often unbalanced or skewed towards more prevalent or basic emotional features present in the static domain, especially with posed expression. Therefore, this approach under-represents the nuances present in the dynamic domain. To address this issue, our study introduces a novel approach that we refer to as <italic>mixture of emotion-dependent experts</italic> (MoEDE). This strategy relies on emotion-specific feature extractors to produce more diverse emotional static features to train DFER systems. Each emotion-dependent expert focuses exclusively on one emotional category, formulating the problem as binary classifiers. Our DFER model combines these static representations with recurrent models, modeling their temporal relationships. The proposed MoEDE DFER approach achieves a macro F1-score of 74.5%, marking a significant improvement over the baseline, which presented a macro F1-score of 70.9% . The DFER baseline is similar to MoEDE, but it uses a single static feature extractor rather than stacked extractors. Additionally, our proposed approach shows consistent improvements compared to other four popular baselines.https://ieeexplore.ieee.org/document/10843404/Video facial expression recognitionemotion recognitionaffective computingensemble model
spellingShingle	Ali N. Salman Karen Rosero Lucas Goncalves Carlos Busso Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models IEEE Open Journal of Signal Processing Video facial expression recognition emotion recognition affective computing ensemble model
title	Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models
title_full	Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models
title_fullStr	Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models
title_full_unstemmed	Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models
title_short	Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models
title_sort	mixture of emotion dependent experts facial expressions recognition in videos through stacked expert models
topic	Video facial expression recognition emotion recognition affective computing ensemble model
url	https://ieeexplore.ieee.org/document/10843404/
work_keys_str_mv	AT alinsalman mixtureofemotiondependentexpertsfacialexpressionsrecognitioninvideosthroughstackedexpertmodels AT karenrosero mixtureofemotiondependentexpertsfacialexpressionsrecognitioninvideosthroughstackedexpertmodels AT lucasgoncalves mixtureofemotiondependentexpertsfacialexpressionsrecognitioninvideosthroughstackedexpertmodels AT carlosbusso mixtureofemotiondependentexpertsfacialexpressionsrecognitioninvideosthroughstackedexpertmodels

Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models

Similar Items