LoRA-Tuned Multimodal RAG System for Technical Manual QA: A Case Study on Hyundai Staria

This study develops a domain-adaptive multimodal RAG (Retrieval-Augmented Generation) system to improve the accuracy and efficiency of technical question answering based on large-scale structured manuals. Using Hyundai Staria maintenance documents as a case study, we extracted text and images from P...

Full description

Saved in:
Bibliographic Details
Main Authors: Yerin Nam, Hansun Choi, Jonggeun Choi, Hyukjin Kwon
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/15/8387
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This study develops a domain-adaptive multimodal RAG (Retrieval-Augmented Generation) system to improve the accuracy and efficiency of technical question answering based on large-scale structured manuals. Using Hyundai Staria maintenance documents as a case study, we extracted text and images from PDF manuals and constructed QA, RAG, and Multi-Turn datasets to reflect realistic troubleshooting scenarios. To overcome limitations of baseline RAG models, we proposed an enhanced architecture that incorporates sentence-level similarity annotations and parameter-efficient fine-tuning via LoRA (Low-Rank Adaptation) using the bLLossom-8B language model and BAAI-bge-m3 embedding model. Experimental results show that the proposed system achieved improvements of 3.0%p in BERTScore, 3.0%p in cosine similarity, and 18.0%p in ROUGE-L compared to existing RAG systems, with notable gains in image-guided response accuracy. A qualitative evaluation by 20 domain experts yielded an average satisfaction score of 4.4 out of 5. This study presents a practical and extensible AI framework for multimodal document understanding, with broad applicability across automotive, industrial, and defense-related technical documentation.
ISSN:2076-3417