Frozen Weights as Prior for Parameter-Efficient Fine-Tuning

In the fields of natural language processing and computer vision, the emergence of large pre-trained models has led to the adoption of fine-tuning them for downstream tasks as an important paradigm. However, the full fine-tuning approach often comes with a hefty cost, which is not feasible for many...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaolong Ma, Peishun Liu, Haojie Gao, Zikang Yan, Ningning Ma, Wenqiang Liu, Xuefang Wang, Ruichun Tang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10840174/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In the fields of natural language processing and computer vision, the emergence of large pre-trained models has led to the adoption of fine-tuning them for downstream tasks as an important paradigm. However, the full fine-tuning approach often comes with a hefty cost, which is not feasible for many researchers. Therefore, in recent years, numerous fine-tuning methods have been proposed to efficiently learn incremental updates of pre-trained weights in a more parameter-efficient way (e.g., employing low-rank increments or introducing adapters to modify the network architecture). However, most of these methods involve adding a set of incrementally learned parameters from scratch. From the perspective of full fine-tuning, these approaches often fail to fully exploit the connection between incremental changes during fine-tuning and the frozen weights of the pre-trained model. In order to delve into how to more effectively harness the weights of pre-trained models during the fine-tuning process to fully acquire new knowledge, we propose a novel parameter-efficient approach, which reuses the Frozen Weights as a prior (FoWA). We adapt the incremental matrix to the unitary matrix obtained from the singular value decomposition of the frozen weights, and further fine-tune the model by incorporating prior information. Through the frozen weight prior, FoWA can automatically select the appropriate rank and decouple the number of trainable parameters from the rank. We have conducted extensive experiments on various tasks, including natural language processing, question answering, natural language generation, and visual classification, demonstrating the effectiveness of FoWA.
ISSN:2169-3536