Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering

Previous works employ the Large Language Model (LLM) like GPT-3 for knowledge-based Visual Question Answering (VQA). We argue that the inferential capacity of LLM can be enhanced through knowledge injection. Although methods that utilize knowledge graphs to enhance LLM have been explored in various...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhongjian Hu, Peng Yang, Fengyuan Liu, Yuan Meng, Xingyu Liu
Format:	Article
Language:	English
Published:	Tsinghua University Press 2024-09-01
Series:	Big Data Mining and Analytics
Subjects:	visual question answering knowledge-based visual question answering large language model knowledge injection
Online Access:	https://www.sciopen.com/article/10.26599/BDMA.2024.9020026
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832572872681324544
author	Zhongjian Hu Peng Yang Fengyuan Liu Yuan Meng Xingyu Liu
author_facet	Zhongjian Hu Peng Yang Fengyuan Liu Yuan Meng Xingyu Liu
author_sort	Zhongjian Hu
collection	DOAJ
description	Previous works employ the Large Language Model (LLM) like GPT-3 for knowledge-based Visual Question Answering (VQA). We argue that the inferential capacity of LLM can be enhanced through knowledge injection. Although methods that utilize knowledge graphs to enhance LLM have been explored in various tasks, they may have some limitations, such as the possibility of not being able to retrieve the required knowledge. In this paper, we introduce a novel framework for knowledge-based VQA titled “Prompting Large Language Models with Knowledge-Injection” (PLLMKI). We use vanilla VQA model to inspire the LLM and further enhance the LLM with knowledge injection. Unlike earlier approaches, we adopt the LLM for knowledge enhancement instead of relying on knowledge graphs. Furthermore, we leverage open LLMs, incurring no additional costs. In comparison to existing baselines, our approach exhibits the accuracy improvement of over 1.3 and 1.7 on two knowledge-based VQA datasets, namely OK-VQA and A-OKVQA, respectively.
format	Article
id	doaj-art-4a5c02e765b940a88f6ed65e4d31f4d3
institution	Kabale University
issn	2096-0654
language	English
publishDate	2024-09-01
publisher	Tsinghua University Press
record_format	Article
series	Big Data Mining and Analytics
spelling	doaj-art-4a5c02e765b940a88f6ed65e4d31f4d32025-02-02T06:29:08ZengTsinghua University PressBig Data Mining and Analytics2096-06542024-09-017384385710.26599/BDMA.2024.9020026Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question AnsweringZhongjian Hu0Peng Yang1Fengyuan Liu2Yuan Meng3Xingyu Liu4School of Computer Science and Engineering, Southeast University, and also with the Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education of the People’s Republic of China, Nanjing 211189, ChinaSchool of Computer Science and Engineering, Southeast University, and also with the Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education of the People’s Republic of China, Nanjing 211189, ChinaSoutheast University - Monash University Joint Graduate School (Suzhou), Southeast University, Suzhou 215125, ChinaSchool of Computer Science and Engineering, Southeast University, and also with the Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education of the People’s Republic of China, Nanjing 211189, ChinaSoutheast University - Monash University Joint Graduate School (Suzhou), Southeast University, Suzhou 215125, ChinaPrevious works employ the Large Language Model (LLM) like GPT-3 for knowledge-based Visual Question Answering (VQA). We argue that the inferential capacity of LLM can be enhanced through knowledge injection. Although methods that utilize knowledge graphs to enhance LLM have been explored in various tasks, they may have some limitations, such as the possibility of not being able to retrieve the required knowledge. In this paper, we introduce a novel framework for knowledge-based VQA titled “Prompting Large Language Models with Knowledge-Injection” (PLLMKI). We use vanilla VQA model to inspire the LLM and further enhance the LLM with knowledge injection. Unlike earlier approaches, we adopt the LLM for knowledge enhancement instead of relying on knowledge graphs. Furthermore, we leverage open LLMs, incurring no additional costs. In comparison to existing baselines, our approach exhibits the accuracy improvement of over 1.3 and 1.7 on two knowledge-based VQA datasets, namely OK-VQA and A-OKVQA, respectively.https://www.sciopen.com/article/10.26599/BDMA.2024.9020026visual question answeringknowledge-based visual question answeringlarge language modelknowledge injection
spellingShingle	Zhongjian Hu Peng Yang Fengyuan Liu Yuan Meng Xingyu Liu Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering Big Data Mining and Analytics visual question answering knowledge-based visual question answering large language model knowledge injection
title	Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering
title_full	Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering
title_fullStr	Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering
title_full_unstemmed	Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering
title_short	Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering
title_sort	prompting large language models with knowledge injection for knowledge based visual question answering
topic	visual question answering knowledge-based visual question answering large language model knowledge injection
url	https://www.sciopen.com/article/10.26599/BDMA.2024.9020026
work_keys_str_mv	AT zhongjianhu promptinglargelanguagemodelswithknowledgeinjectionforknowledgebasedvisualquestionanswering AT pengyang promptinglargelanguagemodelswithknowledgeinjectionforknowledgebasedvisualquestionanswering AT fengyuanliu promptinglargelanguagemodelswithknowledgeinjectionforknowledgebasedvisualquestionanswering AT yuanmeng promptinglargelanguagemodelswithknowledgeinjectionforknowledgebasedvisualquestionanswering AT xingyuliu promptinglargelanguagemodelswithknowledgeinjectionforknowledgebasedvisualquestionanswering

Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering

Similar Items