Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models
Recent advancements in <italic>dynamic facial expression recognition</italic> (DFER) have predominantly utilized static features, which are theoretically inferior to dynamic features. However, models fully trained with dynamic features often suffer from over-fitting due to the limited si...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Open Journal of Signal Processing |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10843404/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850222539342610432 |
|---|---|
| author | Ali N. Salman Karen Rosero Lucas Goncalves Carlos Busso |
| author_facet | Ali N. Salman Karen Rosero Lucas Goncalves Carlos Busso |
| author_sort | Ali N. Salman |
| collection | DOAJ |
| description | Recent advancements in <italic>dynamic facial expression recognition</italic> (DFER) have predominantly utilized static features, which are theoretically inferior to dynamic features. However, models fully trained with dynamic features often suffer from over-fitting due to the limited size and diversity of the training data for fully <italic>supervised learning</italic> (SL) models. A significant challenge with existing models based on static features in recognizing emotions from videos is their tendency to form biased representations, often unbalanced or skewed towards more prevalent or basic emotional features present in the static domain, especially with posed expression. Therefore, this approach under-represents the nuances present in the dynamic domain. To address this issue, our study introduces a novel approach that we refer to as <italic>mixture of emotion-dependent experts</italic> (MoEDE). This strategy relies on emotion-specific feature extractors to produce more diverse emotional static features to train DFER systems. Each emotion-dependent expert focuses exclusively on one emotional category, formulating the problem as binary classifiers. Our DFER model combines these static representations with recurrent models, modeling their temporal relationships. The proposed MoEDE DFER approach achieves a macro F1-score of 74.5%, marking a significant improvement over the baseline, which presented a macro F1-score of 70.9% . The DFER baseline is similar to MoEDE, but it uses a single static feature extractor rather than stacked extractors. Additionally, our proposed approach shows consistent improvements compared to other four popular baselines. |
| format | Article |
| id | doaj-art-428cfca3a8894db3a5125249b520ea72 |
| institution | OA Journals |
| issn | 2644-1322 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Open Journal of Signal Processing |
| spelling | doaj-art-428cfca3a8894db3a5125249b520ea722025-08-20T02:06:17ZengIEEEIEEE Open Journal of Signal Processing2644-13222025-01-01632333210.1109/OJSP.2025.353079310843404Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert ModelsAli N. Salman0https://orcid.org/0000-0003-1395-0382Karen Rosero1Lucas Goncalves2https://orcid.org/0000-0001-9613-1002Carlos Busso3https://orcid.org/0000-0002-4075-4072The University of Texas at Dallas, Richardson, TX, USALanguage Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USAThe University of Texas at Dallas, Richardson, TX, USAThe University of Texas at Dallas, Richardson, TX, USARecent advancements in <italic>dynamic facial expression recognition</italic> (DFER) have predominantly utilized static features, which are theoretically inferior to dynamic features. However, models fully trained with dynamic features often suffer from over-fitting due to the limited size and diversity of the training data for fully <italic>supervised learning</italic> (SL) models. A significant challenge with existing models based on static features in recognizing emotions from videos is their tendency to form biased representations, often unbalanced or skewed towards more prevalent or basic emotional features present in the static domain, especially with posed expression. Therefore, this approach under-represents the nuances present in the dynamic domain. To address this issue, our study introduces a novel approach that we refer to as <italic>mixture of emotion-dependent experts</italic> (MoEDE). This strategy relies on emotion-specific feature extractors to produce more diverse emotional static features to train DFER systems. Each emotion-dependent expert focuses exclusively on one emotional category, formulating the problem as binary classifiers. Our DFER model combines these static representations with recurrent models, modeling their temporal relationships. The proposed MoEDE DFER approach achieves a macro F1-score of 74.5%, marking a significant improvement over the baseline, which presented a macro F1-score of 70.9% . The DFER baseline is similar to MoEDE, but it uses a single static feature extractor rather than stacked extractors. Additionally, our proposed approach shows consistent improvements compared to other four popular baselines.https://ieeexplore.ieee.org/document/10843404/Video facial expression recognitionemotion recognitionaffective computingensemble model |
| spellingShingle | Ali N. Salman Karen Rosero Lucas Goncalves Carlos Busso Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models IEEE Open Journal of Signal Processing Video facial expression recognition emotion recognition affective computing ensemble model |
| title | Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models |
| title_full | Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models |
| title_fullStr | Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models |
| title_full_unstemmed | Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models |
| title_short | Mixture of Emotion Dependent Experts: Facial Expressions Recognition in Videos Through Stacked Expert Models |
| title_sort | mixture of emotion dependent experts facial expressions recognition in videos through stacked expert models |
| topic | Video facial expression recognition emotion recognition affective computing ensemble model |
| url | https://ieeexplore.ieee.org/document/10843404/ |
| work_keys_str_mv | AT alinsalman mixtureofemotiondependentexpertsfacialexpressionsrecognitioninvideosthroughstackedexpertmodels AT karenrosero mixtureofemotiondependentexpertsfacialexpressionsrecognitioninvideosthroughstackedexpertmodels AT lucasgoncalves mixtureofemotiondependentexpertsfacialexpressionsrecognitioninvideosthroughstackedexpertmodels AT carlosbusso mixtureofemotiondependentexpertsfacialexpressionsrecognitioninvideosthroughstackedexpertmodels |