Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks

Is learning more knowledge always better for vision-and-language models? In this paper, we study knowledge transferability in multi-modal tasks. The current tendency in machine learning is to assume that by joining multiple datasets from different tasks, their overall performance improves. However,...

Full description

Saved in:
Bibliographic Details
Main Authors: Tianwei Chen, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima, Hajime Nagahara
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Journal of Imaging
Subjects:
Online Access:https://www.mdpi.com/2313-433X/10/12/300
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850241598351212544
author Tianwei Chen
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
Hajime Nagahara
author_facet Tianwei Chen
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
Hajime Nagahara
author_sort Tianwei Chen
collection DOAJ
description Is learning more knowledge always better for vision-and-language models? In this paper, we study knowledge transferability in multi-modal tasks. The current tendency in machine learning is to assume that by joining multiple datasets from different tasks, their overall performance improves. However, we show that not all knowledge transfers well or has a positive impact on related tasks, even when they share a common goal. We conducted an exhaustive analysis based on hundreds of cross-experiments on twelve vision-and-language tasks categorized into four groups. While tasks in the same group are prone to improve each other, results show that this is not always the case. In addition, other factors, such as dataset size or the pre-training stage, may have a great impact on how well the knowledge is transferred.
format Article
id doaj-art-398ad3dcd2614cacbebd404c8226ea2b
institution OA Journals
issn 2313-433X
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Journal of Imaging
spelling doaj-art-398ad3dcd2614cacbebd404c8226ea2b2025-08-20T02:00:34ZengMDPI AGJournal of Imaging2313-433X2024-11-01101230010.3390/jimaging10120300Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language TasksTianwei Chen0Noa Garcia1Mayu Otani2Chenhui Chu3Yuta Nakashima4Hajime Nagahara5Institute for Datability Science, Osaka University, Osaka 565-0871, JapanInstitute for Datability Science, Osaka University, Osaka 565-0871, JapanCyberAgent Inc., Tokyo 150-0042, JapanGraduate School of Informatics, Kyoto University, Kyoto 606-8501, JapanInstitute for Datability Science, Osaka University, Osaka 565-0871, JapanInstitute for Datability Science, Osaka University, Osaka 565-0871, JapanIs learning more knowledge always better for vision-and-language models? In this paper, we study knowledge transferability in multi-modal tasks. The current tendency in machine learning is to assume that by joining multiple datasets from different tasks, their overall performance improves. However, we show that not all knowledge transfers well or has a positive impact on related tasks, even when they share a common goal. We conducted an exhaustive analysis based on hundreds of cross-experiments on twelve vision-and-language tasks categorized into four groups. While tasks in the same group are prone to improve each other, results show that this is not always the case. In addition, other factors, such as dataset size or the pre-training stage, may have a great impact on how well the knowledge is transferred.https://www.mdpi.com/2313-433X/10/12/300vision and languageknowledge transferability analysismulti-modal learning
spellingShingle Tianwei Chen
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
Hajime Nagahara
Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks
Journal of Imaging
vision and language
knowledge transferability analysis
multi-modal learning
title Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks
title_full Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks
title_fullStr Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks
title_full_unstemmed Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks
title_short Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks
title_sort learning more may not be better knowledge transferability in vision and language tasks
topic vision and language
knowledge transferability analysis
multi-modal learning
url https://www.mdpi.com/2313-433X/10/12/300
work_keys_str_mv AT tianweichen learningmoremaynotbebetterknowledgetransferabilityinvisionandlanguagetasks
AT noagarcia learningmoremaynotbebetterknowledgetransferabilityinvisionandlanguagetasks
AT mayuotani learningmoremaynotbebetterknowledgetransferabilityinvisionandlanguagetasks
AT chenhuichu learningmoremaynotbebetterknowledgetransferabilityinvisionandlanguagetasks
AT yutanakashima learningmoremaynotbebetterknowledgetransferabilityinvisionandlanguagetasks
AT hajimenagahara learningmoremaynotbebetterknowledgetransferabilityinvisionandlanguagetasks