Diversifying Multi-Head Attention in the Transformer Model

Recent studies have shown that, due to redundancy, some heads of the Transformer model can be pruned without diminishing the efficiency of the model. In this paper, we propose a constrained optimization algorithm based on Hebbian learning, which trains specific layers in the Transformer architecture...

Full description

Saved in:
Bibliographic Details
Main Authors: Nicholas Ampazis, Flora Sakketou
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Machine Learning and Knowledge Extraction
Subjects:
Online Access:https://www.mdpi.com/2504-4990/6/4/126
Tags: Add Tag
No Tags, Be the first to tag this record!