From One Domain to Another: The Pitfalls of Gender Recognition in Unseen Environments

Gender recognition from pedestrian imagery is acknowledged by many as a quasi-solved problem, yet most existing approaches evaluate performance in a within-domain setting, i.e., when the test and training data, though disjoint, closely resemble each other. This work provides the first exhaustive cro...

Full description

Saved in:
Bibliographic Details
Main Authors: Nzakiese Mbongo, Kailash A. Hambarde, Hugo Proença
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/13/4161
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Gender recognition from pedestrian imagery is acknowledged by many as a quasi-solved problem, yet most existing approaches evaluate performance in a within-domain setting, i.e., when the test and training data, though disjoint, closely resemble each other. This work provides the first exhaustive cross-domain assessment of six architectures considered to represent the state of the art: ALM, VAC, Rethinking, LML, YinYang-Net, and MAMBA, across three widely known benchmarks: <span style="font-variant: small-caps;">PA-100K</span>, <span style="font-variant: small-caps;">PETA</span>, and <span style="font-variant: small-caps;">RAP</span>. All train/test combinations between datasets were evaluated, yielding 54 comparable experiments. The results revealed a performance split: median in-domain F1 approached 90% in most models, while the average drop under domain shift was up to 16.4 percentage points, with the most recent approaches degrading the most. The adaptive-masking ALM achieved an F1 above 80% in most transfer scenarios, particularly those involving high-resolution or pose-stable domains, highlighting the importance of strong inductive biases over architectural novelty alone. Further, to characterize robustness quantitatively, we introduced the <i>Unified Robustness Metric</i> (URM), which integrates the average cross-domain degradation performance into a single score. A qualitative saliency analysis also corroborated the numerical findings by exposing over-confidence and contextual bias in misclassifications. Overall, this study suggests that challenges in gender recognition are much more evident in cross-domain settings than under the commonly reported within-domain context. Finally, we formalize an open evaluation protocol that can serve as a baseline for future works of this kind.
ISSN:1424-8220