Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks

Is learning more knowledge always better for vision-and-language models? In this paper, we study knowledge transferability in multi-modal tasks. The current tendency in machine learning is to assume that by joining multiple datasets from different tasks, their overall performance improves. However,...

Full description

Saved in:

Bibliographic Details
Main Authors:	Tianwei Chen, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima, Hajime Nagahara
Format:	Article
Language:	English
Published:	MDPI AG 2024-11-01
Series:	Journal of Imaging
Subjects:	vision and language knowledge transferability analysis multi-modal learning
Online Access:	https://www.mdpi.com/2313-433X/10/12/300
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850241598351212544
author	Tianwei Chen Noa Garcia Mayu Otani Chenhui Chu Yuta Nakashima Hajime Nagahara
author_facet	Tianwei Chen Noa Garcia Mayu Otani Chenhui Chu Yuta Nakashima Hajime Nagahara
author_sort	Tianwei Chen
collection	DOAJ
description	Is learning more knowledge always better for vision-and-language models? In this paper, we study knowledge transferability in multi-modal tasks. The current tendency in machine learning is to assume that by joining multiple datasets from different tasks, their overall performance improves. However, we show that not all knowledge transfers well or has a positive impact on related tasks, even when they share a common goal. We conducted an exhaustive analysis based on hundreds of cross-experiments on twelve vision-and-language tasks categorized into four groups. While tasks in the same group are prone to improve each other, results show that this is not always the case. In addition, other factors, such as dataset size or the pre-training stage, may have a great impact on how well the knowledge is transferred.
format	Article
id	doaj-art-398ad3dcd2614cacbebd404c8226ea2b
institution	OA Journals
issn	2313-433X
language	English
publishDate	2024-11-01
publisher	MDPI AG
record_format	Article
series	Journal of Imaging
spelling	doaj-art-398ad3dcd2614cacbebd404c8226ea2b2025-08-20T02:00:34ZengMDPI AGJournal of Imaging2313-433X2024-11-01101230010.3390/jimaging10120300Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language TasksTianwei Chen0Noa Garcia1Mayu Otani2Chenhui Chu3Yuta Nakashima4Hajime Nagahara5Institute for Datability Science, Osaka University, Osaka 565-0871, JapanInstitute for Datability Science, Osaka University, Osaka 565-0871, JapanCyberAgent Inc., Tokyo 150-0042, JapanGraduate School of Informatics, Kyoto University, Kyoto 606-8501, JapanInstitute for Datability Science, Osaka University, Osaka 565-0871, JapanInstitute for Datability Science, Osaka University, Osaka 565-0871, JapanIs learning more knowledge always better for vision-and-language models? In this paper, we study knowledge transferability in multi-modal tasks. The current tendency in machine learning is to assume that by joining multiple datasets from different tasks, their overall performance improves. However, we show that not all knowledge transfers well or has a positive impact on related tasks, even when they share a common goal. We conducted an exhaustive analysis based on hundreds of cross-experiments on twelve vision-and-language tasks categorized into four groups. While tasks in the same group are prone to improve each other, results show that this is not always the case. In addition, other factors, such as dataset size or the pre-training stage, may have a great impact on how well the knowledge is transferred.https://www.mdpi.com/2313-433X/10/12/300vision and languageknowledge transferability analysismulti-modal learning
spellingShingle	Tianwei Chen Noa Garcia Mayu Otani Chenhui Chu Yuta Nakashima Hajime Nagahara Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks Journal of Imaging vision and language knowledge transferability analysis multi-modal learning
title	Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks
title_full	Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks
title_fullStr	Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks
title_full_unstemmed	Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks
title_short	Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks
title_sort	learning more may not be better knowledge transferability in vision and language tasks
topic	vision and language knowledge transferability analysis multi-modal learning
url	https://www.mdpi.com/2313-433X/10/12/300
work_keys_str_mv	AT tianweichen learningmoremaynotbebetterknowledgetransferabilityinvisionandlanguagetasks AT noagarcia learningmoremaynotbebetterknowledgetransferabilityinvisionandlanguagetasks AT mayuotani learningmoremaynotbebetterknowledgetransferabilityinvisionandlanguagetasks AT chenhuichu learningmoremaynotbebetterknowledgetransferabilityinvisionandlanguagetasks AT yutanakashima learningmoremaynotbebetterknowledgetransferabilityinvisionandlanguagetasks AT hajimenagahara learningmoremaynotbebetterknowledgetransferabilityinvisionandlanguagetasks

Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks

Similar Items