Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models
This study investigates the performance of eight large multimodal model (LMM)-based chatbots on the Test of Understanding Graphs in Kinematics (TUG-K), a research-based concept inventory. Graphs are a widely used representation in STEM and medical fields, making them a relevant topic for exploring L...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Frontiers Media S.A.
2024-10-01
|
| Series: | Frontiers in Education |
| Subjects: | |
| Online Access: | https://www.frontiersin.org/articles/10.3389/feduc.2024.1452414/full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850210204804710400 |
|---|---|
| author | Giulia Polverini Bor Gregorcic |
| author_facet | Giulia Polverini Bor Gregorcic |
| author_sort | Giulia Polverini |
| collection | DOAJ |
| description | This study investigates the performance of eight large multimodal model (LMM)-based chatbots on the Test of Understanding Graphs in Kinematics (TUG-K), a research-based concept inventory. Graphs are a widely used representation in STEM and medical fields, making them a relevant topic for exploring LMM-based chatbots’ visual interpretation abilities. We evaluated both freely available chatbots (Gemini 1.0 Pro, Claude 3 Sonnet, Microsoft Copilot, and ChatGPT-4o) and subscription-based ones (Gemini 1.0 Ultra, Gemini 1.5 Pro API, Claude 3 Opus, and ChatGPT-4). We found that OpenAI’s chatbots outperform all the others, with ChatGPT-4o showing the overall best performance. Contrary to expectations, we found no notable differences in the overall performance between freely available and subscription-based versions of Gemini and Claude 3 chatbots, with the exception of Gemini 1.5 Pro, available via API. In addition, we found that tasks relying more heavily on linguistic input were generally easier for chatbots than those requiring visual interpretation. The study provides a basis for considerations of LMM-based chatbot applications in STEM and medical education, and suggests directions for future research. |
| format | Article |
| id | doaj-art-959b974b2d3248f78de6f1ce16e610af |
| institution | OA Journals |
| issn | 2504-284X |
| language | English |
| publishDate | 2024-10-01 |
| publisher | Frontiers Media S.A. |
| record_format | Article |
| series | Frontiers in Education |
| spelling | doaj-art-959b974b2d3248f78de6f1ce16e610af2025-08-20T02:09:48ZengFrontiers Media S.A.Frontiers in Education2504-284X2024-10-01910.3389/feduc.2024.14524141452414Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based modelsGiulia PolveriniBor GregorcicThis study investigates the performance of eight large multimodal model (LMM)-based chatbots on the Test of Understanding Graphs in Kinematics (TUG-K), a research-based concept inventory. Graphs are a widely used representation in STEM and medical fields, making them a relevant topic for exploring LMM-based chatbots’ visual interpretation abilities. We evaluated both freely available chatbots (Gemini 1.0 Pro, Claude 3 Sonnet, Microsoft Copilot, and ChatGPT-4o) and subscription-based ones (Gemini 1.0 Ultra, Gemini 1.5 Pro API, Claude 3 Opus, and ChatGPT-4). We found that OpenAI’s chatbots outperform all the others, with ChatGPT-4o showing the overall best performance. Contrary to expectations, we found no notable differences in the overall performance between freely available and subscription-based versions of Gemini and Claude 3 chatbots, with the exception of Gemini 1.5 Pro, available via API. In addition, we found that tasks relying more heavily on linguistic input were generally easier for chatbots than those requiring visual interpretation. The study provides a basis for considerations of LMM-based chatbot applications in STEM and medical education, and suggests directions for future research.https://www.frontiersin.org/articles/10.3389/feduc.2024.1452414/fullgenerative AIlarge language modelslarge multimodal modelschatbotsvisionSTEM education |
| spellingShingle | Giulia Polverini Bor Gregorcic Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models Frontiers in Education generative AI large language models large multimodal models chatbots vision STEM education |
| title | Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models |
| title_full | Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models |
| title_fullStr | Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models |
| title_full_unstemmed | Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models |
| title_short | Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models |
| title_sort | evaluating vision capable chatbots in interpreting kinematics graphs a comparative study of free and subscription based models |
| topic | generative AI large language models large multimodal models chatbots vision STEM education |
| url | https://www.frontiersin.org/articles/10.3389/feduc.2024.1452414/full |
| work_keys_str_mv | AT giuliapolverini evaluatingvisioncapablechatbotsininterpretingkinematicsgraphsacomparativestudyoffreeandsubscriptionbasedmodels AT borgregorcic evaluatingvisioncapablechatbotsininterpretingkinematicsgraphsacomparativestudyoffreeandsubscriptionbasedmodels |