Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models

Recent advancements in <italic>dynamic facial expression recognition</italic> (DFER) have predominantly utilized static features, which are theoretically inferior to dynamic features. However, models fully trained with dynamic features often suffer from over-fitting due to the limited si...

Full description

Saved in:
Bibliographic Details
Main Authors: Ali N. Salman, Karen Rosero, Lucas Goncalves, Carlos Busso
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Open Journal of Signal Processing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10843404/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recent advancements in <italic>dynamic facial expression recognition</italic> (DFER) have predominantly utilized static features, which are theoretically inferior to dynamic features. However, models fully trained with dynamic features often suffer from over-fitting due to the limited size and diversity of the training data for fully <italic>supervised learning</italic> (SL) models. A significant challenge with existing models based on static features in recognizing emotions from videos is their tendency to form biased representations, often unbalanced or skewed towards more prevalent or basic emotional features present in the static domain, especially with posed expression. Therefore, this approach under-represents the nuances present in the dynamic domain. To address this issue, our study introduces a novel approach that we refer to as <italic>mixture of emotion-dependent experts</italic> (MoEDE). This strategy relies on emotion-specific feature extractors to produce more diverse emotional static features to train DFER systems. Each emotion-dependent expert focuses exclusively on one emotional category, formulating the problem as binary classifiers. Our DFER model combines these static representations with recurrent models, modeling their temporal relationships. The proposed MoEDE DFER approach achieves a macro F1-score of 74.5%, marking a significant improvement over the baseline, which presented a macro F1-score of 70.9% . The DFER baseline is similar to MoEDE, but it uses a single static feature extractor rather than stacked extractors. Additionally, our proposed approach shows consistent improvements compared to other four popular baselines.
ISSN:2644-1322