Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks
Is learning more knowledge always better for vision-and-language models? In this paper, we study knowledge transferability in multi-modal tasks. The current tendency in machine learning is to assume that by joining multiple datasets from different tasks, their overall performance improves. However,...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-11-01
|
| Series: | Journal of Imaging |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2313-433X/10/12/300 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850241598351212544 |
|---|---|
| author | Tianwei Chen Noa Garcia Mayu Otani Chenhui Chu Yuta Nakashima Hajime Nagahara |
| author_facet | Tianwei Chen Noa Garcia Mayu Otani Chenhui Chu Yuta Nakashima Hajime Nagahara |
| author_sort | Tianwei Chen |
| collection | DOAJ |
| description | Is learning more knowledge always better for vision-and-language models? In this paper, we study knowledge transferability in multi-modal tasks. The current tendency in machine learning is to assume that by joining multiple datasets from different tasks, their overall performance improves. However, we show that not all knowledge transfers well or has a positive impact on related tasks, even when they share a common goal. We conducted an exhaustive analysis based on hundreds of cross-experiments on twelve vision-and-language tasks categorized into four groups. While tasks in the same group are prone to improve each other, results show that this is not always the case. In addition, other factors, such as dataset size or the pre-training stage, may have a great impact on how well the knowledge is transferred. |
| format | Article |
| id | doaj-art-398ad3dcd2614cacbebd404c8226ea2b |
| institution | OA Journals |
| issn | 2313-433X |
| language | English |
| publishDate | 2024-11-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Journal of Imaging |
| spelling | doaj-art-398ad3dcd2614cacbebd404c8226ea2b2025-08-20T02:00:34ZengMDPI AGJournal of Imaging2313-433X2024-11-01101230010.3390/jimaging10120300Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language TasksTianwei Chen0Noa Garcia1Mayu Otani2Chenhui Chu3Yuta Nakashima4Hajime Nagahara5Institute for Datability Science, Osaka University, Osaka 565-0871, JapanInstitute for Datability Science, Osaka University, Osaka 565-0871, JapanCyberAgent Inc., Tokyo 150-0042, JapanGraduate School of Informatics, Kyoto University, Kyoto 606-8501, JapanInstitute for Datability Science, Osaka University, Osaka 565-0871, JapanInstitute for Datability Science, Osaka University, Osaka 565-0871, JapanIs learning more knowledge always better for vision-and-language models? In this paper, we study knowledge transferability in multi-modal tasks. The current tendency in machine learning is to assume that by joining multiple datasets from different tasks, their overall performance improves. However, we show that not all knowledge transfers well or has a positive impact on related tasks, even when they share a common goal. We conducted an exhaustive analysis based on hundreds of cross-experiments on twelve vision-and-language tasks categorized into four groups. While tasks in the same group are prone to improve each other, results show that this is not always the case. In addition, other factors, such as dataset size or the pre-training stage, may have a great impact on how well the knowledge is transferred.https://www.mdpi.com/2313-433X/10/12/300vision and languageknowledge transferability analysismulti-modal learning |
| spellingShingle | Tianwei Chen Noa Garcia Mayu Otani Chenhui Chu Yuta Nakashima Hajime Nagahara Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks Journal of Imaging vision and language knowledge transferability analysis multi-modal learning |
| title | Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks |
| title_full | Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks |
| title_fullStr | Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks |
| title_full_unstemmed | Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks |
| title_short | Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks |
| title_sort | learning more may not be better knowledge transferability in vision and language tasks |
| topic | vision and language knowledge transferability analysis multi-modal learning |
| url | https://www.mdpi.com/2313-433X/10/12/300 |
| work_keys_str_mv | AT tianweichen learningmoremaynotbebetterknowledgetransferabilityinvisionandlanguagetasks AT noagarcia learningmoremaynotbebetterknowledgetransferabilityinvisionandlanguagetasks AT mayuotani learningmoremaynotbebetterknowledgetransferabilityinvisionandlanguagetasks AT chenhuichu learningmoremaynotbebetterknowledgetransferabilityinvisionandlanguagetasks AT yutanakashima learningmoremaynotbebetterknowledgetransferabilityinvisionandlanguagetasks AT hajimenagahara learningmoremaynotbebetterknowledgetransferabilityinvisionandlanguagetasks |