Head information bottleneck (HIB): leveraging information bottleneck for efficient transformer head attribution and pruning

Abstract Multi-head attention mechanisms have been widely applied in speech pre-training. However, their roles and effectiveness in various downstream tasks have not been fully explored. Attention heads may vary in importance depending on the downstream task. We assume that the attention allocation...

Full description

Saved in:
Bibliographic Details
Main Authors: Yukun Qian, Xuyi Zhuang, Mingjiang Wang
Format: Article
Language:English
Published: SpringerOpen 2025-07-01
Series:EURASIP Journal on Audio, Speech, and Music Processing
Subjects:
Online Access:https://doi.org/10.1186/s13636-025-00411-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849334674189451264
author Yukun Qian
Xuyi Zhuang
Mingjiang Wang
author_facet Yukun Qian
Xuyi Zhuang
Mingjiang Wang
author_sort Yukun Qian
collection DOAJ
description Abstract Multi-head attention mechanisms have been widely applied in speech pre-training. However, their roles and effectiveness in various downstream tasks have not been fully explored. Attention heads may vary in importance depending on the downstream task. We assume that the attention allocation in the attention mechanism is similar to the information bottleneck, aiming to highlight the parts that are important for the task. We introduce the information bottleneck into multi-head attention to estimate the degree of mutual information between each attention head’s output and the input, guiding it to focus on useful information. Additionally, we propose a method to measure the contribution of attention heads to the tasks. We also prune the model heads based on their contributions, offering interpretable direction for model pruning. Our experiments, which compared the pruning effectiveness of our method with that of the traditional Taylor expansion method and the integrated gradients method, show that our approach significantly outperforms the former and achieves comparable results with the latter on multiple tasks.
format Article
id doaj-art-134775e6ecd2425aae40638c313e07a8
institution Kabale University
issn 1687-4722
language English
publishDate 2025-07-01
publisher SpringerOpen
record_format Article
series EURASIP Journal on Audio, Speech, and Music Processing
spelling doaj-art-134775e6ecd2425aae40638c313e07a82025-08-20T03:45:31ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47222025-07-012025111310.1186/s13636-025-00411-8Head information bottleneck (HIB): leveraging information bottleneck for efficient transformer head attribution and pruningYukun Qian0Xuyi Zhuang1Mingjiang Wang2Key Laboratory for Key Technologies of IoT Terminals, Harbin Institute of TechnologyKey Laboratory for Key Technologies of IoT Terminals, Harbin Institute of TechnologyKey Laboratory for Key Technologies of IoT Terminals, Harbin Institute of TechnologyAbstract Multi-head attention mechanisms have been widely applied in speech pre-training. However, their roles and effectiveness in various downstream tasks have not been fully explored. Attention heads may vary in importance depending on the downstream task. We assume that the attention allocation in the attention mechanism is similar to the information bottleneck, aiming to highlight the parts that are important for the task. We introduce the information bottleneck into multi-head attention to estimate the degree of mutual information between each attention head’s output and the input, guiding it to focus on useful information. Additionally, we propose a method to measure the contribution of attention heads to the tasks. We also prune the model heads based on their contributions, offering interpretable direction for model pruning. Our experiments, which compared the pruning effectiveness of our method with that of the traditional Taylor expansion method and the integrated gradients method, show that our approach significantly outperforms the former and achieves comparable results with the latter on multiple tasks.https://doi.org/10.1186/s13636-025-00411-8AttributionInformational bottleneckMulti-head attentionExplainable AI
spellingShingle Yukun Qian
Xuyi Zhuang
Mingjiang Wang
Head information bottleneck (HIB): leveraging information bottleneck for efficient transformer head attribution and pruning
EURASIP Journal on Audio, Speech, and Music Processing
Attribution
Informational bottleneck
Multi-head attention
Explainable AI
title Head information bottleneck (HIB): leveraging information bottleneck for efficient transformer head attribution and pruning
title_full Head information bottleneck (HIB): leveraging information bottleneck for efficient transformer head attribution and pruning
title_fullStr Head information bottleneck (HIB): leveraging information bottleneck for efficient transformer head attribution and pruning
title_full_unstemmed Head information bottleneck (HIB): leveraging information bottleneck for efficient transformer head attribution and pruning
title_short Head information bottleneck (HIB): leveraging information bottleneck for efficient transformer head attribution and pruning
title_sort head information bottleneck hib leveraging information bottleneck for efficient transformer head attribution and pruning
topic Attribution
Informational bottleneck
Multi-head attention
Explainable AI
url https://doi.org/10.1186/s13636-025-00411-8
work_keys_str_mv AT yukunqian headinformationbottleneckhibleveraginginformationbottleneckforefficienttransformerheadattributionandpruning
AT xuyizhuang headinformationbottleneckhibleveraginginformationbottleneckforefficienttransformerheadattributionandpruning
AT mingjiangwang headinformationbottleneckhibleveraginginformationbottleneckforefficienttransformerheadattributionandpruning