Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models

Recent advancements in <italic>dynamic facial expression recognition</italic> (DFER) have predominantly utilized static features, which are theoretically inferior to dynamic features. However, models fully trained with dynamic features often suffer from over-fitting due to the limited si...

Full description

Saved in:
Bibliographic Details
Main Authors: Ali N. Salman, Karen Rosero, Lucas Goncalves, Carlos Busso
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Open Journal of Signal Processing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10843404/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850222539342610432
author Ali N. Salman
Karen Rosero
Lucas Goncalves
Carlos Busso
author_facet Ali N. Salman
Karen Rosero
Lucas Goncalves
Carlos Busso
author_sort Ali N. Salman
collection DOAJ
description Recent advancements in <italic>dynamic facial expression recognition</italic> (DFER) have predominantly utilized static features, which are theoretically inferior to dynamic features. However, models fully trained with dynamic features often suffer from over-fitting due to the limited size and diversity of the training data for fully <italic>supervised learning</italic> (SL) models. A significant challenge with existing models based on static features in recognizing emotions from videos is their tendency to form biased representations, often unbalanced or skewed towards more prevalent or basic emotional features present in the static domain, especially with posed expression. Therefore, this approach under-represents the nuances present in the dynamic domain. To address this issue, our study introduces a novel approach that we refer to as <italic>mixture of emotion-dependent experts</italic> (MoEDE). This strategy relies on emotion-specific feature extractors to produce more diverse emotional static features to train DFER systems. Each emotion-dependent expert focuses exclusively on one emotional category, formulating the problem as binary classifiers. Our DFER model combines these static representations with recurrent models, modeling their temporal relationships. The proposed MoEDE DFER approach achieves a macro F1-score of 74.5%, marking a significant improvement over the baseline, which presented a macro F1-score of 70.9% . The DFER baseline is similar to MoEDE, but it uses a single static feature extractor rather than stacked extractors. Additionally, our proposed approach shows consistent improvements compared to other four popular baselines.
format Article
id doaj-art-428cfca3a8894db3a5125249b520ea72
institution OA Journals
issn 2644-1322
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Open Journal of Signal Processing
spelling doaj-art-428cfca3a8894db3a5125249b520ea722025-08-20T02:06:17ZengIEEEIEEE Open Journal of Signal Processing2644-13222025-01-01632333210.1109/OJSP.2025.353079310843404Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert ModelsAli N. Salman0https://orcid.org/0000-0003-1395-0382Karen Rosero1Lucas Goncalves2https://orcid.org/0000-0001-9613-1002Carlos Busso3https://orcid.org/0000-0002-4075-4072The University of Texas at Dallas, Richardson, TX, USALanguage Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USAThe University of Texas at Dallas, Richardson, TX, USAThe University of Texas at Dallas, Richardson, TX, USARecent advancements in <italic>dynamic facial expression recognition</italic> (DFER) have predominantly utilized static features, which are theoretically inferior to dynamic features. However, models fully trained with dynamic features often suffer from over-fitting due to the limited size and diversity of the training data for fully <italic>supervised learning</italic> (SL) models. A significant challenge with existing models based on static features in recognizing emotions from videos is their tendency to form biased representations, often unbalanced or skewed towards more prevalent or basic emotional features present in the static domain, especially with posed expression. Therefore, this approach under-represents the nuances present in the dynamic domain. To address this issue, our study introduces a novel approach that we refer to as <italic>mixture of emotion-dependent experts</italic> (MoEDE). This strategy relies on emotion-specific feature extractors to produce more diverse emotional static features to train DFER systems. Each emotion-dependent expert focuses exclusively on one emotional category, formulating the problem as binary classifiers. Our DFER model combines these static representations with recurrent models, modeling their temporal relationships. The proposed MoEDE DFER approach achieves a macro F1-score of 74.5%, marking a significant improvement over the baseline, which presented a macro F1-score of 70.9% . The DFER baseline is similar to MoEDE, but it uses a single static feature extractor rather than stacked extractors. Additionally, our proposed approach shows consistent improvements compared to other four popular baselines.https://ieeexplore.ieee.org/document/10843404/Video facial expression recognitionemotion recognitionaffective computingensemble model
spellingShingle Ali N. Salman
Karen Rosero
Lucas Goncalves
Carlos Busso
Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models
IEEE Open Journal of Signal Processing
Video facial expression recognition
emotion recognition
affective computing
ensemble model
title Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models
title_full Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models
title_fullStr Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models
title_full_unstemmed Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models
title_short Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models
title_sort mixture of emotion dependent experts facial expressions recognition in videos through stacked expert models
topic Video facial expression recognition
emotion recognition
affective computing
ensemble model
url https://ieeexplore.ieee.org/document/10843404/
work_keys_str_mv AT alinsalman mixtureofemotiondependentexpertsfacialexpressionsrecognitioninvideosthroughstackedexpertmodels
AT karenrosero mixtureofemotiondependentexpertsfacialexpressionsrecognitioninvideosthroughstackedexpertmodels
AT lucasgoncalves mixtureofemotiondependentexpertsfacialexpressionsrecognitioninvideosthroughstackedexpertmodels
AT carlosbusso mixtureofemotiondependentexpertsfacialexpressionsrecognitioninvideosthroughstackedexpertmodels