Diversifying Multi-Head Attention in the Transformer Model

Recent studies have shown that, due to redundancy, some heads of the Transformer model can be pruned without diminishing the efficiency of the model. In this paper, we propose a constrained optimization algorithm based on Hebbian learning, which trains specific layers in the Transformer architecture...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nicholas Ampazis, Flora Sakketou
Format:	Article
Language:	English
Published:	MDPI AG 2024-11-01
Series:	Machine Learning and Knowledge Extraction
Subjects:	deep learning transformer multi-head attention
Online Access:	https://www.mdpi.com/2504-4990/6/4/126
Tags:	Add Tag No Tags, Be the first to tag this record!

Internet

https://www.mdpi.com/2504-4990/6/4/126

Diversifying Multi-Head Attention in the Transformer Model

Internet

Similar Items