Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models

This study investigates the performance of eight large multimodal model (LMM)-based chatbots on the Test of Understanding Graphs in Kinematics (TUG-K), a research-based concept inventory. Graphs are a widely used representation in STEM and medical fields, making them a relevant topic for exploring L...

Full description

Saved in:

Bibliographic Details
Main Authors:	Giulia Polverini, Bor Gregorcic
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2024-10-01
Series:	Frontiers in Education
Subjects:	generative AI large language models large multimodal models chatbots vision STEM education
Online Access:	https://www.frontiersin.org/articles/10.3389/feduc.2024.1452414/full
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850210204804710400
author	Giulia Polverini Bor Gregorcic
author_facet	Giulia Polverini Bor Gregorcic
author_sort	Giulia Polverini
collection	DOAJ
description	This study investigates the performance of eight large multimodal model (LMM)-based chatbots on the Test of Understanding Graphs in Kinematics (TUG-K), a research-based concept inventory. Graphs are a widely used representation in STEM and medical fields, making them a relevant topic for exploring LMM-based chatbots’ visual interpretation abilities. We evaluated both freely available chatbots (Gemini 1.0 Pro, Claude 3 Sonnet, Microsoft Copilot, and ChatGPT-4o) and subscription-based ones (Gemini 1.0 Ultra, Gemini 1.5 Pro API, Claude 3 Opus, and ChatGPT-4). We found that OpenAI’s chatbots outperform all the others, with ChatGPT-4o showing the overall best performance. Contrary to expectations, we found no notable differences in the overall performance between freely available and subscription-based versions of Gemini and Claude 3 chatbots, with the exception of Gemini 1.5 Pro, available via API. In addition, we found that tasks relying more heavily on linguistic input were generally easier for chatbots than those requiring visual interpretation. The study provides a basis for considerations of LMM-based chatbot applications in STEM and medical education, and suggests directions for future research.
format	Article
id	doaj-art-959b974b2d3248f78de6f1ce16e610af
institution	OA Journals
issn	2504-284X
language	English
publishDate	2024-10-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Education
spelling	doaj-art-959b974b2d3248f78de6f1ce16e610af2025-08-20T02:09:48ZengFrontiers Media S.A.Frontiers in Education2504-284X2024-10-01910.3389/feduc.2024.14524141452414Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based modelsGiulia PolveriniBor GregorcicThis study investigates the performance of eight large multimodal model (LMM)-based chatbots on the Test of Understanding Graphs in Kinematics (TUG-K), a research-based concept inventory. Graphs are a widely used representation in STEM and medical fields, making them a relevant topic for exploring LMM-based chatbots’ visual interpretation abilities. We evaluated both freely available chatbots (Gemini 1.0 Pro, Claude 3 Sonnet, Microsoft Copilot, and ChatGPT-4o) and subscription-based ones (Gemini 1.0 Ultra, Gemini 1.5 Pro API, Claude 3 Opus, and ChatGPT-4). We found that OpenAI’s chatbots outperform all the others, with ChatGPT-4o showing the overall best performance. Contrary to expectations, we found no notable differences in the overall performance between freely available and subscription-based versions of Gemini and Claude 3 chatbots, with the exception of Gemini 1.5 Pro, available via API. In addition, we found that tasks relying more heavily on linguistic input were generally easier for chatbots than those requiring visual interpretation. The study provides a basis for considerations of LMM-based chatbot applications in STEM and medical education, and suggests directions for future research.https://www.frontiersin.org/articles/10.3389/feduc.2024.1452414/fullgenerative AIlarge language modelslarge multimodal modelschatbotsvisionSTEM education
spellingShingle	Giulia Polverini Bor Gregorcic Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models Frontiers in Education generative AI large language models large multimodal models chatbots vision STEM education
title	Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models
title_full	Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models
title_fullStr	Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models
title_full_unstemmed	Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models
title_short	Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models
title_sort	evaluating vision capable chatbots in interpreting kinematics graphs a comparative study of free and subscription based models
topic	generative AI large language models large multimodal models chatbots vision STEM education
url	https://www.frontiersin.org/articles/10.3389/feduc.2024.1452414/full
work_keys_str_mv	AT giuliapolverini evaluatingvisioncapablechatbotsininterpretingkinematicsgraphsacomparativestudyoffreeandsubscriptionbasedmodels AT borgregorcic evaluatingvisioncapablechatbotsininterpretingkinematicsgraphsacomparativestudyoffreeandsubscriptionbasedmodels

Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models

Similar Items