Dynamic Mixture of Experts for Adaptive Computation in Character-Level Transformers

Dynamic Mixture of Experts for Adaptive Computation in Character-Level Transformers

This paper challenges the prevailing assumption that Mixture of Experts (MoE) consistently improves computational efficiency through a systematic evaluation of MoE variants in Transformer models. We implement and compare three approaches: basic MoE, top-<i>k</i> routing, and capacity-fac...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhigao Huang, Musheng Chen, Shiyan Zheng
Format:	Article
Language:	English
Published:	MDPI AG 2025-06-01
Series:	Information
Subjects:	mixture of experts transformers natural language processing adaptive computation computational efficiency
Online Access:	https://www.mdpi.com/2078-2489/16/6/483
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset
by: Thanh-Dung Le, et al.
Published: (2025-01-01)

Plant disease classification in the wild using vision transformers and mixture of experts
by: Zafar Salman, et al.
Published: (2025-06-01)

Robustifying Routers Against Input Perturbations for Sparse Mixture-of-Experts Vision Transformers
by: Masahiro Kada, et al.
Published: (2025-01-01)

Mixture of Expert Large Language Model for Legal Case Element Recognition
by: YIN Hua, WU Zihao, LIU Tingting, ZHANG Jiajia, GAO Ziqian
Published: (2024-12-01)

TabMoE: A General Framework for Diverse Table-Based Reasoning with Mixture-of-Experts
by: Jie Wu, et al.
Published: (2024-09-01)

LLaMA-UTP: Knowledge-Guided Expert Mixture for Analyzing Uncertain Tax Positions
by: Yutong Tan, et al.
Published: (2025-01-01)

Exploring the impact of fixed theta values in RoPE on character-level language model performance and efficiency
by: Zhigao Huang, et al.
Published: (2025-08-01)

CAG-MoE: Multimodal Emotion Recognition with Cross-Attention Gated Mixture of Experts
by: Axel Gedeon Mengara Mengara, et al.
Published: (2025-06-01)

A Vegetable-Price Forecasting Method Based on Mixture of Experts
by: Chenyun Zhao, et al.
Published: (2025-01-01)

Multimodal Temporal Knowledge Graph Embedding Method Based on Mixture of Experts for Recommendation
by: Bingchen Liu, et al.
Published: (2025-08-01)

MixtureRS: A Mixture of Expert Network Based Remote Sensing Land Classification
by: Yimei Liu, et al.
Published: (2025-07-01)

Using Fuzzy Logic to Construction of Expert Computer Model for Forecast of Compressive Strength of the Portland Cement
by: Basil Al-Khayyat, et al.
Published: (2005-12-01)

CombinatorixPy: Advancing mixture descriptors for computational chemistry
by: Rahil Ashtari Mahini, et al.
Published: (2025-02-01)

PhA-MOE: Enhancing Hyperspectral Retrievals for Phytoplankton Absorption Using Mixture-of-Experts
by: Weiwei Wang, et al.
Published: (2025-06-01)

Expert information technology in the collection and analysis of computer data
by: M. H. Shcherbakovskyi, et al.
Published: (2025-04-01)

IMPROVEMENT OF EXPERT ANALYSIS FOR ROAD TRAFFIC ACCIDENTS USING COMPUTER SIMULATION PROGRAMS
by: S. A. Azemsha, et al.
Published: (2015-08-01)

Multi-view mixture-of-experts for predicting molecular properties using SMILES, SELFIES, and graph-based representations
by: Eduardo Soares, et al.
Published: (2025-01-01)

Mixture of prompts learning for vision-language models
by: Yu Du, et al.
Published: (2025-06-01)

Leveraging Mixture of Experts and Deep Learning-Based Data Rebalancing to Improve Credit Fraud Detection
by: Zeyuan Yang, et al.
Published: (2024-11-01)

Super-utilisateurs ou super-spécialistes ? Cartographie des catalyseurs de la transformation numérique en agence d’architecture
by: Aurélie de Boissieu
Published: (2020-12-01)

DMoVGPE: predicting gut microbial associated metabolites profiles with deep mixture of variational Gaussian Process experts
by: Qinghui Weng, et al.
Published: (2025-03-01)

Multimodal Gated Mixture of Experts Using Whole Slide Image and Flow Cytometry for Multiple Instance Learning Classification of Lymphoma
by: Noriaki Hashimoto, et al.
Published: (2024-12-01)

Spectral Adaptive Dropout: Frequency-Based Regularization for Improved Generalization
by: Zhigao Huang, et al.
Published: (2025-06-01)

LLM4WM: Adapting LLM for Wireless Multi-Tasking
by: Xuanyu Liu, et al.
Published: (2025-01-01)

Context-Aware Markov Sensors and Finite Mixture Models for Adaptive Stochastic Dynamics Analysis of Tourist Behavior
by: Xiaolong Chen, et al.
Published: (2025-06-01)

A Computational Approach to Understanding Agglutinative Structures in Urdu
by: Muhammad Shoaib Tahir, et al.
Published: (2024-09-01)

Multimodal Sentiment Analysis Based on Expert Mixing of Subtask Representations
by: Ling Lei, et al.
Published: (2025-01-01)

Adaptive Transformer-Based Deep Learning Framework for Continuous Sign Language Recognition and Translation
by: Yahia Said, et al.
Published: (2025-03-01)

Consistency of expert product reviews: an application to wine guides
by: Gabriel I. Penagos-Londoño, et al.
Published: (2022-11-01)

Cloud Computing for Improving Work Efficiency at the Belibi Village Government Office in the Era of Transformation
by: Kurniasih Mika, et al.
Published: (2025-08-01)

Express - a method of forming a representative expert coalition based on an extended correlation matrix
by: R. A. Zhilin
Published: (2023-10-01)

Two Level Kazakh Morphology
by: Züleyha Yiner, et al.
Published: (2021-06-01)

Optimizing the Learnable RoPE Theta Parameter in Transformers
by: Zhigao Huang, et al.
Published: (2025-01-01)

Advanced articulated motion prediction
by: Anthony Belessis, et al.
Published: (2025-04-01)

Expert Systems in Foreign Language Training and Learning
by: Kseniya Sukhikh, et al.
Published: (2025-05-01)

Comparative analysis of Mixture-of-Agents models for natural language inference with ANLI data
by: Swathi Sowmya Bavirthi, et al.
Published: (2025-06-01)

The Language of the Linguistic Expert’s Opinion
by: V. O. Kuznetsov
Published: (2023-05-01)

False expert report
by: Jakub Matis
Published: (2025-05-01)

Character-word level ensemble integrated model for power transformer defect recording text mining method
by: LI Yuan, et al.
Published: (2024-11-01)

Computational linguistics and natural language processing techniques for semantic field extraction in Arabic online news
by: Maulana Ihsan Ahmad, et al.
Published: (2024-09-01)