Robustifying Routers Against Input Perturbations for Sparse Mixture-of-Experts Vision Transformers

Mixture of experts with a sparse expert selection rule has been gaining much attention recently because of its scalability without compromising inference time. However, unlike standard neural networks, sparse mixture-of-experts models inherently exhibit discontinuities in the output space, which may...

Full description

Saved in:
Bibliographic Details
Main Authors: Masahiro Kada, Ryota Yoshihashi, Satoshi Ikehata, Rei Kawakami, Ikuro Sato
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Open Journal of Signal Processing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10858379/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850098648564629504
author Masahiro Kada
Ryota Yoshihashi
Satoshi Ikehata
Rei Kawakami
Ikuro Sato
author_facet Masahiro Kada
Ryota Yoshihashi
Satoshi Ikehata
Rei Kawakami
Ikuro Sato
author_sort Masahiro Kada
collection DOAJ
description Mixture of experts with a sparse expert selection rule has been gaining much attention recently because of its scalability without compromising inference time. However, unlike standard neural networks, sparse mixture-of-experts models inherently exhibit discontinuities in the output space, which may impede the acquisition of appropriate invariance to the input perturbations, leading to a deterioration of model performance for tasks such as classification. To address this issue, we propose Pairwise Router Consistency (PRC) that effectively penalizes the discontinuities occurring under natural deformations of input images. With the supervised loss, the use of PRC loss empirically improves classification accuracy on ImageNet-1 K, CIFAR-10, and CIFAR-100 datasets, compared to a baseline method. Notably, our method with 1-expert selection slightly outperforms the baseline method using 2-expert selection. We also confirmed that models trained with our method experience discontinuous changes less frequently under input perturbations.
format Article
id doaj-art-8f43fc912aba471380b27a1fcef04246
institution DOAJ
issn 2644-1322
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Open Journal of Signal Processing
spelling doaj-art-8f43fc912aba471380b27a1fcef042462025-08-20T02:40:40ZengIEEEIEEE Open Journal of Signal Processing2644-13222025-01-01627628310.1109/OJSP.2025.353685310858379Robustifying Routers Against Input Perturbations for Sparse Mixture-of-Experts Vision TransformersMasahiro Kada0https://orcid.org/0009-0004-8479-2849Ryota Yoshihashi1https://orcid.org/0000-0002-1194-9663Satoshi Ikehata2Rei Kawakami3Ikuro Sato4https://orcid.org/0000-0001-5234-3177Institute of Science Tokyo, Tokyo, JapanInstitute of Science Tokyo, Tokyo, JapanInstitute of Science Tokyo, Tokyo, JapanInstitute of Science Tokyo, Tokyo, JapanInstitute of Science Tokyo, Tokyo, JapanMixture of experts with a sparse expert selection rule has been gaining much attention recently because of its scalability without compromising inference time. However, unlike standard neural networks, sparse mixture-of-experts models inherently exhibit discontinuities in the output space, which may impede the acquisition of appropriate invariance to the input perturbations, leading to a deterioration of model performance for tasks such as classification. To address this issue, we propose Pairwise Router Consistency (PRC) that effectively penalizes the discontinuities occurring under natural deformations of input images. With the supervised loss, the use of PRC loss empirically improves classification accuracy on ImageNet-1 K, CIFAR-10, and CIFAR-100 datasets, compared to a baseline method. Notably, our method with 1-expert selection slightly outperforms the baseline method using 2-expert selection. We also confirmed that models trained with our method experience discontinuous changes less frequently under input perturbations.https://ieeexplore.ieee.org/document/10858379/Mixture of expertsdynamic neural networkimage classificationvision transformer
spellingShingle Masahiro Kada
Ryota Yoshihashi
Satoshi Ikehata
Rei Kawakami
Ikuro Sato
Robustifying Routers Against Input Perturbations for Sparse Mixture-of-Experts Vision Transformers
IEEE Open Journal of Signal Processing
Mixture of experts
dynamic neural network
image classification
vision transformer
title Robustifying Routers Against Input Perturbations for Sparse Mixture-of-Experts Vision Transformers
title_full Robustifying Routers Against Input Perturbations for Sparse Mixture-of-Experts Vision Transformers
title_fullStr Robustifying Routers Against Input Perturbations for Sparse Mixture-of-Experts Vision Transformers
title_full_unstemmed Robustifying Routers Against Input Perturbations for Sparse Mixture-of-Experts Vision Transformers
title_short Robustifying Routers Against Input Perturbations for Sparse Mixture-of-Experts Vision Transformers
title_sort robustifying routers against input perturbations for sparse mixture of experts vision transformers
topic Mixture of experts
dynamic neural network
image classification
vision transformer
url https://ieeexplore.ieee.org/document/10858379/
work_keys_str_mv AT masahirokada robustifyingroutersagainstinputperturbationsforsparsemixtureofexpertsvisiontransformers
AT ryotayoshihashi robustifyingroutersagainstinputperturbationsforsparsemixtureofexpertsvisiontransformers
AT satoshiikehata robustifyingroutersagainstinputperturbationsforsparsemixtureofexpertsvisiontransformers
AT reikawakami robustifyingroutersagainstinputperturbationsforsparsemixtureofexpertsvisiontransformers
AT ikurosato robustifyingroutersagainstinputperturbationsforsparsemixtureofexpertsvisiontransformers