LoRA Fusion: Enhancing Image Generation

Recent advancements in low-rank adaptation (LoRA) have shown its effectiveness in fine-tuning diffusion models for generating images tailored to new downstream tasks. Research on integrating multiple LoRA modules to accommodate new tasks has also gained traction. One emerging approach constructs sev...

Full description

Saved in:

Bibliographic Details
Main Authors:	Dooho Choi, Jeonghyeon Im, Yunsick Sung
Format:	Article
Language:	English
Published:	MDPI AG 2024-11-01
Series:	Mathematics
Subjects:	low-rank adaptation (LoRA) image generation merging LoRA modules
Online Access:	https://www.mdpi.com/2227-7390/12/22/3474
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850068394049536000
author	Dooho Choi Jeonghyeon Im Yunsick Sung
author_facet	Dooho Choi Jeonghyeon Im Yunsick Sung
author_sort	Dooho Choi
collection	DOAJ
description	Recent advancements in low-rank adaptation (LoRA) have shown its effectiveness in fine-tuning diffusion models for generating images tailored to new downstream tasks. Research on integrating multiple LoRA modules to accommodate new tasks has also gained traction. One emerging approach constructs several LoRA modules, but more than three typically decrease the generation performance of pre-trained models. The mixture-of-experts model solves the performance issue, but LoRA modules are not combined using text prompts; hence, generating images by combining LoRA modules does not dynamically reflect the user’s desired requirements. This paper proposes a LoRA fusion method that applies an attention mechanism to effectively capture the user’s text-prompting intent. This method computes the cosine similarity between predefined keys and queries and uses the weighted sum of the corresponding values to generate task-specific LoRA modules without the need for retraining. This method ensures stability when merging multiple LoRA modules and performs comparably to fully retrained LoRA models. The technique offers a more efficient and scalable solution for domain adaptation in large language models, effectively maintaining stability and performance as it adapts to new tasks. In the experiments, the proposed method outperformed existing methods in text–image alignment and image similarity. Specifically, the proposed method achieved a text–image alignment score of 0.744, surpassing an SVDiff score of 0.724, and a normalized linear arithmetic composition score of 0.698. Moreover, the proposed method generates superior semantically accurate and visually coherent images.
format	Article
id	doaj-art-7842d6447f264ad9942bbdc558a57eba
institution	DOAJ
issn	2227-7390
language	English
publishDate	2024-11-01
publisher	MDPI AG
record_format	Article
series	Mathematics
spelling	doaj-art-7842d6447f264ad9942bbdc558a57eba2025-08-20T02:48:05ZengMDPI AGMathematics2227-73902024-11-011222347410.3390/math12223474LoRA Fusion: Enhancing Image GenerationDooho Choi0Jeonghyeon Im1Yunsick Sung2Department of Computer Science and Artificial Intelligence, Dongguk University-Seoul, Seoul 04620, Republic of KoreaDepartment of Computer Science and Artificial Intelligence, Dongguk University-Seoul, Seoul 04620, Republic of KoreaDepartment of Computer Science and Artificial Intelligence, Dongguk University-Seoul, Seoul 04620, Republic of KoreaRecent advancements in low-rank adaptation (LoRA) have shown its effectiveness in fine-tuning diffusion models for generating images tailored to new downstream tasks. Research on integrating multiple LoRA modules to accommodate new tasks has also gained traction. One emerging approach constructs several LoRA modules, but more than three typically decrease the generation performance of pre-trained models. The mixture-of-experts model solves the performance issue, but LoRA modules are not combined using text prompts; hence, generating images by combining LoRA modules does not dynamically reflect the user’s desired requirements. This paper proposes a LoRA fusion method that applies an attention mechanism to effectively capture the user’s text-prompting intent. This method computes the cosine similarity between predefined keys and queries and uses the weighted sum of the corresponding values to generate task-specific LoRA modules without the need for retraining. This method ensures stability when merging multiple LoRA modules and performs comparably to fully retrained LoRA models. The technique offers a more efficient and scalable solution for domain adaptation in large language models, effectively maintaining stability and performance as it adapts to new tasks. In the experiments, the proposed method outperformed existing methods in text–image alignment and image similarity. Specifically, the proposed method achieved a text–image alignment score of 0.744, surpassing an SVDiff score of 0.724, and a normalized linear arithmetic composition score of 0.698. Moreover, the proposed method generates superior semantically accurate and visually coherent images.https://www.mdpi.com/2227-7390/12/22/3474low-rank adaptation (LoRA)image generationmerging LoRA modules
spellingShingle	Dooho Choi Jeonghyeon Im Yunsick Sung LoRA Fusion: Enhancing Image Generation Mathematics low-rank adaptation (LoRA) image generation merging LoRA modules
title	LoRA Fusion: Enhancing Image Generation
title_full	LoRA Fusion: Enhancing Image Generation
title_fullStr	LoRA Fusion: Enhancing Image Generation
title_full_unstemmed	LoRA Fusion: Enhancing Image Generation
title_short	LoRA Fusion: Enhancing Image Generation
title_sort	lora fusion enhancing image generation
topic	low-rank adaptation (LoRA) image generation merging LoRA modules
url	https://www.mdpi.com/2227-7390/12/22/3474
work_keys_str_mv	AT doohochoi lorafusionenhancingimagegeneration AT jeonghyeonim lorafusionenhancingimagegeneration AT yunsicksung lorafusionenhancingimagegeneration

LoRA Fusion: Enhancing Image Generation

Similar Items