Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models

This study investigates the performance of eight large multimodal model (LMM)-based chatbots on the Test of Understanding Graphs in Kinematics (TUG-K), a research-based concept inventory. Graphs are a widely used representation in STEM and medical fields, making them a relevant topic for exploring L...

Full description

Saved in:
Bibliographic Details
Main Authors: Giulia Polverini, Bor Gregorcic
Format: Article
Language:English
Published: Frontiers Media S.A. 2024-10-01
Series:Frontiers in Education
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/feduc.2024.1452414/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850210204804710400
author Giulia Polverini
Bor Gregorcic
author_facet Giulia Polverini
Bor Gregorcic
author_sort Giulia Polverini
collection DOAJ
description This study investigates the performance of eight large multimodal model (LMM)-based chatbots on the Test of Understanding Graphs in Kinematics (TUG-K), a research-based concept inventory. Graphs are a widely used representation in STEM and medical fields, making them a relevant topic for exploring LMM-based chatbots’ visual interpretation abilities. We evaluated both freely available chatbots (Gemini 1.0 Pro, Claude 3 Sonnet, Microsoft Copilot, and ChatGPT-4o) and subscription-based ones (Gemini 1.0 Ultra, Gemini 1.5 Pro API, Claude 3 Opus, and ChatGPT-4). We found that OpenAI’s chatbots outperform all the others, with ChatGPT-4o showing the overall best performance. Contrary to expectations, we found no notable differences in the overall performance between freely available and subscription-based versions of Gemini and Claude 3 chatbots, with the exception of Gemini 1.5 Pro, available via API. In addition, we found that tasks relying more heavily on linguistic input were generally easier for chatbots than those requiring visual interpretation. The study provides a basis for considerations of LMM-based chatbot applications in STEM and medical education, and suggests directions for future research.
format Article
id doaj-art-959b974b2d3248f78de6f1ce16e610af
institution OA Journals
issn 2504-284X
language English
publishDate 2024-10-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Education
spelling doaj-art-959b974b2d3248f78de6f1ce16e610af2025-08-20T02:09:48ZengFrontiers Media S.A.Frontiers in Education2504-284X2024-10-01910.3389/feduc.2024.14524141452414Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based modelsGiulia PolveriniBor GregorcicThis study investigates the performance of eight large multimodal model (LMM)-based chatbots on the Test of Understanding Graphs in Kinematics (TUG-K), a research-based concept inventory. Graphs are a widely used representation in STEM and medical fields, making them a relevant topic for exploring LMM-based chatbots’ visual interpretation abilities. We evaluated both freely available chatbots (Gemini 1.0 Pro, Claude 3 Sonnet, Microsoft Copilot, and ChatGPT-4o) and subscription-based ones (Gemini 1.0 Ultra, Gemini 1.5 Pro API, Claude 3 Opus, and ChatGPT-4). We found that OpenAI’s chatbots outperform all the others, with ChatGPT-4o showing the overall best performance. Contrary to expectations, we found no notable differences in the overall performance between freely available and subscription-based versions of Gemini and Claude 3 chatbots, with the exception of Gemini 1.5 Pro, available via API. In addition, we found that tasks relying more heavily on linguistic input were generally easier for chatbots than those requiring visual interpretation. The study provides a basis for considerations of LMM-based chatbot applications in STEM and medical education, and suggests directions for future research.https://www.frontiersin.org/articles/10.3389/feduc.2024.1452414/fullgenerative AIlarge language modelslarge multimodal modelschatbotsvisionSTEM education
spellingShingle Giulia Polverini
Bor Gregorcic
Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models
Frontiers in Education
generative AI
large language models
large multimodal models
chatbots
vision
STEM education
title Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models
title_full Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models
title_fullStr Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models
title_full_unstemmed Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models
title_short Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models
title_sort evaluating vision capable chatbots in interpreting kinematics graphs a comparative study of free and subscription based models
topic generative AI
large language models
large multimodal models
chatbots
vision
STEM education
url https://www.frontiersin.org/articles/10.3389/feduc.2024.1452414/full
work_keys_str_mv AT giuliapolverini evaluatingvisioncapablechatbotsininterpretingkinematicsgraphsacomparativestudyoffreeandsubscriptionbasedmodels
AT borgregorcic evaluatingvisioncapablechatbotsininterpretingkinematicsgraphsacomparativestudyoffreeandsubscriptionbasedmodels