Dynamic Mixture of Experts for Adaptive Computation in Character-Level Transformers
This paper challenges the prevailing assumption that Mixture of Experts (MoE) consistently improves computational efficiency through a systematic evaluation of MoE variants in Transformer models. We implement and compare three approaches: basic MoE, top-<i>k</i> routing, and capacity-fac...
Saved in:
| Main Authors: | Zhigao Huang, Musheng Chen, Shiyan Zheng |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-06-01
|
| Series: | Information |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2078-2489/16/6/483 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset
by: Thanh-Dung Le, et al.
Published: (2025-01-01) -
Plant disease classification in the wild using vision transformers and mixture of experts
by: Zafar Salman, et al.
Published: (2025-06-01) -
Robustifying Routers Against Input Perturbations for Sparse Mixture-of-Experts Vision Transformers
by: Masahiro Kada, et al.
Published: (2025-01-01) -
Mixture of Expert Large Language Model for Legal Case Element Recognition
by: YIN Hua, WU Zihao, LIU Tingting, ZHANG Jiajia, GAO Ziqian
Published: (2024-12-01) -
TabMoE: A General Framework for Diverse Table-Based Reasoning with Mixture-of-Experts
by: Jie Wu, et al.
Published: (2024-09-01)