Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering
Previous works employ the Large Language Model (LLM) like GPT-3 for knowledge-based Visual Question Answering (VQA). We argue that the inferential capacity of LLM can be enhanced through knowledge injection. Although methods that utilize knowledge graphs to enhance LLM have been explored in various...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Tsinghua University Press
2024-09-01
|
Series: | Big Data Mining and Analytics |
Subjects: | |
Online Access: | https://www.sciopen.com/article/10.26599/BDMA.2024.9020026 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832572872681324544 |
---|---|
author | Zhongjian Hu Peng Yang Fengyuan Liu Yuan Meng Xingyu Liu |
author_facet | Zhongjian Hu Peng Yang Fengyuan Liu Yuan Meng Xingyu Liu |
author_sort | Zhongjian Hu |
collection | DOAJ |
description | Previous works employ the Large Language Model (LLM) like GPT-3 for knowledge-based Visual Question Answering (VQA). We argue that the inferential capacity of LLM can be enhanced through knowledge injection. Although methods that utilize knowledge graphs to enhance LLM have been explored in various tasks, they may have some limitations, such as the possibility of not being able to retrieve the required knowledge. In this paper, we introduce a novel framework for knowledge-based VQA titled “Prompting Large Language Models with Knowledge-Injection” (PLLMKI). We use vanilla VQA model to inspire the LLM and further enhance the LLM with knowledge injection. Unlike earlier approaches, we adopt the LLM for knowledge enhancement instead of relying on knowledge graphs. Furthermore, we leverage open LLMs, incurring no additional costs. In comparison to existing baselines, our approach exhibits the accuracy improvement of over 1.3 and 1.7 on two knowledge-based VQA datasets, namely OK-VQA and A-OKVQA, respectively. |
format | Article |
id | doaj-art-4a5c02e765b940a88f6ed65e4d31f4d3 |
institution | Kabale University |
issn | 2096-0654 |
language | English |
publishDate | 2024-09-01 |
publisher | Tsinghua University Press |
record_format | Article |
series | Big Data Mining and Analytics |
spelling | doaj-art-4a5c02e765b940a88f6ed65e4d31f4d32025-02-02T06:29:08ZengTsinghua University PressBig Data Mining and Analytics2096-06542024-09-017384385710.26599/BDMA.2024.9020026Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question AnsweringZhongjian Hu0Peng Yang1Fengyuan Liu2Yuan Meng3Xingyu Liu4School of Computer Science and Engineering, Southeast University, and also with the Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education of the People’s Republic of China, Nanjing 211189, ChinaSchool of Computer Science and Engineering, Southeast University, and also with the Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education of the People’s Republic of China, Nanjing 211189, ChinaSoutheast University - Monash University Joint Graduate School (Suzhou), Southeast University, Suzhou 215125, ChinaSchool of Computer Science and Engineering, Southeast University, and also with the Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education of the People’s Republic of China, Nanjing 211189, ChinaSoutheast University - Monash University Joint Graduate School (Suzhou), Southeast University, Suzhou 215125, ChinaPrevious works employ the Large Language Model (LLM) like GPT-3 for knowledge-based Visual Question Answering (VQA). We argue that the inferential capacity of LLM can be enhanced through knowledge injection. Although methods that utilize knowledge graphs to enhance LLM have been explored in various tasks, they may have some limitations, such as the possibility of not being able to retrieve the required knowledge. In this paper, we introduce a novel framework for knowledge-based VQA titled “Prompting Large Language Models with Knowledge-Injection” (PLLMKI). We use vanilla VQA model to inspire the LLM and further enhance the LLM with knowledge injection. Unlike earlier approaches, we adopt the LLM for knowledge enhancement instead of relying on knowledge graphs. Furthermore, we leverage open LLMs, incurring no additional costs. In comparison to existing baselines, our approach exhibits the accuracy improvement of over 1.3 and 1.7 on two knowledge-based VQA datasets, namely OK-VQA and A-OKVQA, respectively.https://www.sciopen.com/article/10.26599/BDMA.2024.9020026visual question answeringknowledge-based visual question answeringlarge language modelknowledge injection |
spellingShingle | Zhongjian Hu Peng Yang Fengyuan Liu Yuan Meng Xingyu Liu Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering Big Data Mining and Analytics visual question answering knowledge-based visual question answering large language model knowledge injection |
title | Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering |
title_full | Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering |
title_fullStr | Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering |
title_full_unstemmed | Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering |
title_short | Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering |
title_sort | prompting large language models with knowledge injection for knowledge based visual question answering |
topic | visual question answering knowledge-based visual question answering large language model knowledge injection |
url | https://www.sciopen.com/article/10.26599/BDMA.2024.9020026 |
work_keys_str_mv | AT zhongjianhu promptinglargelanguagemodelswithknowledgeinjectionforknowledgebasedvisualquestionanswering AT pengyang promptinglargelanguagemodelswithknowledgeinjectionforknowledgebasedvisualquestionanswering AT fengyuanliu promptinglargelanguagemodelswithknowledgeinjectionforknowledgebasedvisualquestionanswering AT yuanmeng promptinglargelanguagemodelswithknowledgeinjectionforknowledgebasedvisualquestionanswering AT xingyuliu promptinglargelanguagemodelswithknowledgeinjectionforknowledgebasedvisualquestionanswering |