Dynamic Mixture of Experts for Adaptive Computation in Character-Level Transformers

This paper challenges the prevailing assumption that Mixture of Experts (MoE) consistently improves computational efficiency through a systematic evaluation of MoE variants in Transformer models. We implement and compare three approaches: basic MoE, top-<i>k</i> routing, and capacity-fac...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhigao Huang, Musheng Chen, Shiyan Zheng
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/16/6/483
Tags: Add Tag
No Tags, Be the first to tag this record!

Similar Items