Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering

Previous works employ the Large Language Model (LLM) like GPT-3 for knowledge-based Visual Question Answering (VQA). We argue that the inferential capacity of LLM can be enhanced through knowledge injection. Although methods that utilize knowledge graphs to enhance LLM have been explored in various...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhongjian Hu, Peng Yang, Fengyuan Liu, Yuan Meng, Xingyu Liu
Format: Article
Language:English
Published: Tsinghua University Press 2024-09-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2024.9020026
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832572872681324544
author Zhongjian Hu
Peng Yang
Fengyuan Liu
Yuan Meng
Xingyu Liu
author_facet Zhongjian Hu
Peng Yang
Fengyuan Liu
Yuan Meng
Xingyu Liu
author_sort Zhongjian Hu
collection DOAJ
description Previous works employ the Large Language Model (LLM) like GPT-3 for knowledge-based Visual Question Answering (VQA). We argue that the inferential capacity of LLM can be enhanced through knowledge injection. Although methods that utilize knowledge graphs to enhance LLM have been explored in various tasks, they may have some limitations, such as the possibility of not being able to retrieve the required knowledge. In this paper, we introduce a novel framework for knowledge-based VQA titled “Prompting Large Language Models with Knowledge-Injection” (PLLMKI). We use vanilla VQA model to inspire the LLM and further enhance the LLM with knowledge injection. Unlike earlier approaches, we adopt the LLM for knowledge enhancement instead of relying on knowledge graphs. Furthermore, we leverage open LLMs, incurring no additional costs. In comparison to existing baselines, our approach exhibits the accuracy improvement of over 1.3 and 1.7 on two knowledge-based VQA datasets, namely OK-VQA and A-OKVQA, respectively.
format Article
id doaj-art-4a5c02e765b940a88f6ed65e4d31f4d3
institution Kabale University
issn 2096-0654
language English
publishDate 2024-09-01
publisher Tsinghua University Press
record_format Article
series Big Data Mining and Analytics
spelling doaj-art-4a5c02e765b940a88f6ed65e4d31f4d32025-02-02T06:29:08ZengTsinghua University PressBig Data Mining and Analytics2096-06542024-09-017384385710.26599/BDMA.2024.9020026Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question AnsweringZhongjian Hu0Peng Yang1Fengyuan Liu2Yuan Meng3Xingyu Liu4School of Computer Science and Engineering, Southeast University, and also with the Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education of the People’s Republic of China, Nanjing 211189, ChinaSchool of Computer Science and Engineering, Southeast University, and also with the Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education of the People’s Republic of China, Nanjing 211189, ChinaSoutheast University - Monash University Joint Graduate School (Suzhou), Southeast University, Suzhou 215125, ChinaSchool of Computer Science and Engineering, Southeast University, and also with the Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education of the People’s Republic of China, Nanjing 211189, ChinaSoutheast University - Monash University Joint Graduate School (Suzhou), Southeast University, Suzhou 215125, ChinaPrevious works employ the Large Language Model (LLM) like GPT-3 for knowledge-based Visual Question Answering (VQA). We argue that the inferential capacity of LLM can be enhanced through knowledge injection. Although methods that utilize knowledge graphs to enhance LLM have been explored in various tasks, they may have some limitations, such as the possibility of not being able to retrieve the required knowledge. In this paper, we introduce a novel framework for knowledge-based VQA titled “Prompting Large Language Models with Knowledge-Injection” (PLLMKI). We use vanilla VQA model to inspire the LLM and further enhance the LLM with knowledge injection. Unlike earlier approaches, we adopt the LLM for knowledge enhancement instead of relying on knowledge graphs. Furthermore, we leverage open LLMs, incurring no additional costs. In comparison to existing baselines, our approach exhibits the accuracy improvement of over 1.3 and 1.7 on two knowledge-based VQA datasets, namely OK-VQA and A-OKVQA, respectively.https://www.sciopen.com/article/10.26599/BDMA.2024.9020026visual question answeringknowledge-based visual question answeringlarge language modelknowledge injection
spellingShingle Zhongjian Hu
Peng Yang
Fengyuan Liu
Yuan Meng
Xingyu Liu
Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering
Big Data Mining and Analytics
visual question answering
knowledge-based visual question answering
large language model
knowledge injection
title Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering
title_full Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering
title_fullStr Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering
title_full_unstemmed Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering
title_short Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering
title_sort prompting large language models with knowledge injection for knowledge based visual question answering
topic visual question answering
knowledge-based visual question answering
large language model
knowledge injection
url https://www.sciopen.com/article/10.26599/BDMA.2024.9020026
work_keys_str_mv AT zhongjianhu promptinglargelanguagemodelswithknowledgeinjectionforknowledgebasedvisualquestionanswering
AT pengyang promptinglargelanguagemodelswithknowledgeinjectionforknowledgebasedvisualquestionanswering
AT fengyuanliu promptinglargelanguagemodelswithknowledgeinjectionforknowledgebasedvisualquestionanswering
AT yuanmeng promptinglargelanguagemodelswithknowledgeinjectionforknowledgebasedvisualquestionanswering
AT xingyuliu promptinglargelanguagemodelswithknowledgeinjectionforknowledgebasedvisualquestionanswering