ViTMa: A Novel Hybrid Vision Transformer and Mamba for Kinship Recognition in Indonesian Facial Micro-Expressions

Kinship recognition that primarily focuses on exploiting facial micro expressions is an interesting and challenging problem that aims to determine whether multiple individuals belong to the same family. Previous approaches have been limited by model capacity and insufficient training data, resulting...

Full description

Saved in:
Bibliographic Details
Main Authors: Ike Fibriani, Eko Mulyanto Yuniarno, Ronny Mardiyanto, Mauridhi Hery Purnomo
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10737083/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846170407420297216
author Ike Fibriani
Eko Mulyanto Yuniarno
Ronny Mardiyanto
Mauridhi Hery Purnomo
author_facet Ike Fibriani
Eko Mulyanto Yuniarno
Ronny Mardiyanto
Mauridhi Hery Purnomo
author_sort Ike Fibriani
collection DOAJ
description Kinship recognition that primarily focuses on exploiting facial micro expressions is an interesting and challenging problem that aims to determine whether multiple individuals belong to the same family. Previous approaches have been limited by model capacity and insufficient training data, resulting in low-level features and shallow model learning. These common manual features cannot capture information effectively, leading to suboptimal accuracy. In this paper, we propose a kinship recognition that exploits facial micro expressions using a hybrid Vision Transformer and Mamba (ViTMa) model with modified Deep Feature Fusion, which combines different backbone architectures and feature fusion strategies. The ViTMa model is pre-trained on a large dataset and adapted to Indonesian facial images. The Siamese architecture processes two input images, extracts features fused with feature fusion, and passes them to a classification network. Experiments on the FIW-Local Indonesia dataset demonstrate the effectiveness of this method, with the best model using B16 quadratic features and multiplicative fusion achieving an average accuracy of 85.18% across all kinship categories, outperforming previous approaches. We found that B16, despite being the smallest backbone, has the best performance compared to larger backbones such as L16 with an average accuracy of 67.99%, B32 with an average accuracy of 72.98%, and L32 with an average accuracy of 71.69%. Thus, the ViTMa model with our proposed B16 quadratic feature fusion and multiplicative fusion strategy achieves the best performance and achieves better accuracy outperforming previous studies.
format Article
id doaj-art-c4066232d556401aa355f1d12a481e2d
institution Kabale University
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-c4066232d556401aa355f1d12a481e2d2024-11-12T00:01:41ZengIEEEIEEE Access2169-35362024-01-011216400216401710.1109/ACCESS.2024.348718010737083ViTMa: A Novel Hybrid Vision Transformer and Mamba for Kinship Recognition in Indonesian Facial Micro-ExpressionsIke Fibriani0Eko Mulyanto Yuniarno1https://orcid.org/0000-0003-1243-3025Ronny Mardiyanto2Mauridhi Hery Purnomo3https://orcid.org/0000-0002-6221-7382Department of Electrical Engineering, Sepuluh Nopember Institute of Technology, Surabaya, IndonesiaDepartment of Electrical Engineering, Sepuluh Nopember Institute of Technology, Surabaya, IndonesiaDepartment of Electrical Engineering, Sepuluh Nopember Institute of Technology, Surabaya, IndonesiaDepartment of Electrical Engineering, Sepuluh Nopember Institute of Technology, Surabaya, IndonesiaKinship recognition that primarily focuses on exploiting facial micro expressions is an interesting and challenging problem that aims to determine whether multiple individuals belong to the same family. Previous approaches have been limited by model capacity and insufficient training data, resulting in low-level features and shallow model learning. These common manual features cannot capture information effectively, leading to suboptimal accuracy. In this paper, we propose a kinship recognition that exploits facial micro expressions using a hybrid Vision Transformer and Mamba (ViTMa) model with modified Deep Feature Fusion, which combines different backbone architectures and feature fusion strategies. The ViTMa model is pre-trained on a large dataset and adapted to Indonesian facial images. The Siamese architecture processes two input images, extracts features fused with feature fusion, and passes them to a classification network. Experiments on the FIW-Local Indonesia dataset demonstrate the effectiveness of this method, with the best model using B16 quadratic features and multiplicative fusion achieving an average accuracy of 85.18% across all kinship categories, outperforming previous approaches. We found that B16, despite being the smallest backbone, has the best performance compared to larger backbones such as L16 with an average accuracy of 67.99%, B32 with an average accuracy of 72.98%, and L32 with an average accuracy of 71.69%. Thus, the ViTMa model with our proposed B16 quadratic feature fusion and multiplicative fusion strategy achieves the best performance and achieves better accuracy outperforming previous studies.https://ieeexplore.ieee.org/document/10737083/Vision transformersMambasiamese neural networkfeature fusionkinship recognitionmicro-expressions
spellingShingle Ike Fibriani
Eko Mulyanto Yuniarno
Ronny Mardiyanto
Mauridhi Hery Purnomo
ViTMa: A Novel Hybrid Vision Transformer and Mamba for Kinship Recognition in Indonesian Facial Micro-Expressions
IEEE Access
Vision transformers
Mamba
siamese neural network
feature fusion
kinship recognition
micro-expressions
title ViTMa: A Novel Hybrid Vision Transformer and Mamba for Kinship Recognition in Indonesian Facial Micro-Expressions
title_full ViTMa: A Novel Hybrid Vision Transformer and Mamba for Kinship Recognition in Indonesian Facial Micro-Expressions
title_fullStr ViTMa: A Novel Hybrid Vision Transformer and Mamba for Kinship Recognition in Indonesian Facial Micro-Expressions
title_full_unstemmed ViTMa: A Novel Hybrid Vision Transformer and Mamba for Kinship Recognition in Indonesian Facial Micro-Expressions
title_short ViTMa: A Novel Hybrid Vision Transformer and Mamba for Kinship Recognition in Indonesian Facial Micro-Expressions
title_sort vitma a novel hybrid vision transformer and mamba for kinship recognition in indonesian facial micro expressions
topic Vision transformers
Mamba
siamese neural network
feature fusion
kinship recognition
micro-expressions
url https://ieeexplore.ieee.org/document/10737083/
work_keys_str_mv AT ikefibriani vitmaanovelhybridvisiontransformerandmambaforkinshiprecognitioninindonesianfacialmicroexpressions
AT ekomulyantoyuniarno vitmaanovelhybridvisiontransformerandmambaforkinshiprecognitioninindonesianfacialmicroexpressions
AT ronnymardiyanto vitmaanovelhybridvisiontransformerandmambaforkinshiprecognitioninindonesianfacialmicroexpressions
AT mauridhiherypurnomo vitmaanovelhybridvisiontransformerandmambaforkinshiprecognitioninindonesianfacialmicroexpressions