Diversifying Multi-Head Attention in the Transformer Model
Recent studies have shown that, due to redundancy, some heads of the Transformer model can be pruned without diminishing the efficiency of the model. In this paper, we propose a constrained optimization algorithm based on Hebbian learning, which trains specific layers in the Transformer architecture...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-11-01
|
| Series: | Machine Learning and Knowledge Extraction |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2504-4990/6/4/126 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|