Performance of ChatGPT on tasks involving physics visual representations: The case of the brief electricity and magnetism assessment

[This paper is part of the Focused Collection in Artificial Intelligence Tools in Physics Teaching and Physics Education Research.] Artificial intelligence-based chatbots are increasingly influencing physics education because of their ability to interpret and respond to textual and visual inputs. Th...

Full description

Saved in:

Bibliographic Details
Main Authors:	Giulia Polverini, Jakob Melin, Elias Önerud, Bor Gregorcic
Format:	Article
Language:	English
Published:	American Physical Society 2025-05-01
Series:	Physical Review Physics Education Research
Online Access:	http://doi.org/10.1103/PhysRevPhysEducRes.21.010154
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849327154352881664
author	Giulia Polverini Jakob Melin Elias Önerud Bor Gregorcic
author_facet	Giulia Polverini Jakob Melin Elias Önerud Bor Gregorcic
author_sort	Giulia Polverini
collection	DOAJ
description	[This paper is part of the Focused Collection in Artificial Intelligence Tools in Physics Teaching and Physics Education Research.] Artificial intelligence-based chatbots are increasingly influencing physics education because of their ability to interpret and respond to textual and visual inputs. This study evaluates the performance of two large multimodal model-based chatbots, ChatGPT-4 and ChatGPT-4o, on the brief electricity and magnetism assessment (BEMA), a conceptual physics inventory rich in visual representations such as vector fields, circuit diagrams, and graphs. Quantitative analysis shows that ChatGPT-4o outperforms both ChatGPT-4 and a large sample of university students, and demonstrates improvements in ChatGPT-4o’s vision interpretation ability over its predecessor ChatGPT-4. However, qualitative analysis of ChatGPT-4o’s responses reveals persistent challenges. We identified three types of difficulties in the chatbot’s responses to tasks on BEMA: (i) difficulties with visual interpretation, (ii) difficulties in providing correct physics laws or rules, and (iii) difficulties with spatial coordination and application of physics representations. Spatial reasoning tasks, particularly those requiring the use of the right-hand rule, proved especially problematic. These findings highlight that the most broadly used large multimodal model-based chatbot, ChatGPT-4o, still exhibits significant difficulties in engaging with physics tasks involving visual representations. While the chatbot shows potential for educational applications, including personalized tutoring and accessibility support for students who are blind or have low vision, its limitations necessitate caution. On the other hand, our findings can also be leveraged to design assessments that are difficult for chatbots to solve.
format	Article
id	doaj-art-2f99718c05564fe8806385e5dcc3a45e
institution	Kabale University
issn	2469-9896
language	English
publishDate	2025-05-01
publisher	American Physical Society
record_format	Article
series	Physical Review Physics Education Research
spelling	doaj-art-2f99718c05564fe8806385e5dcc3a45e2025-08-20T03:47:58ZengAmerican Physical SocietyPhysical Review Physics Education Research2469-98962025-05-0121101015410.1103/PhysRevPhysEducRes.21.010154Performance of ChatGPT on tasks involving physics visual representations: The case of the brief electricity and magnetism assessmentGiulia PolveriniJakob MelinElias ÖnerudBor Gregorcic[This paper is part of the Focused Collection in Artificial Intelligence Tools in Physics Teaching and Physics Education Research.] Artificial intelligence-based chatbots are increasingly influencing physics education because of their ability to interpret and respond to textual and visual inputs. This study evaluates the performance of two large multimodal model-based chatbots, ChatGPT-4 and ChatGPT-4o, on the brief electricity and magnetism assessment (BEMA), a conceptual physics inventory rich in visual representations such as vector fields, circuit diagrams, and graphs. Quantitative analysis shows that ChatGPT-4o outperforms both ChatGPT-4 and a large sample of university students, and demonstrates improvements in ChatGPT-4o’s vision interpretation ability over its predecessor ChatGPT-4. However, qualitative analysis of ChatGPT-4o’s responses reveals persistent challenges. We identified three types of difficulties in the chatbot’s responses to tasks on BEMA: (i) difficulties with visual interpretation, (ii) difficulties in providing correct physics laws or rules, and (iii) difficulties with spatial coordination and application of physics representations. Spatial reasoning tasks, particularly those requiring the use of the right-hand rule, proved especially problematic. These findings highlight that the most broadly used large multimodal model-based chatbot, ChatGPT-4o, still exhibits significant difficulties in engaging with physics tasks involving visual representations. While the chatbot shows potential for educational applications, including personalized tutoring and accessibility support for students who are blind or have low vision, its limitations necessitate caution. On the other hand, our findings can also be leveraged to design assessments that are difficult for chatbots to solve.http://doi.org/10.1103/PhysRevPhysEducRes.21.010154
spellingShingle	Giulia Polverini Jakob Melin Elias Önerud Bor Gregorcic Performance of ChatGPT on tasks involving physics visual representations: The case of the brief electricity and magnetism assessment Physical Review Physics Education Research
title	Performance of ChatGPT on tasks involving physics visual representations: The case of the brief electricity and magnetism assessment
title_full	Performance of ChatGPT on tasks involving physics visual representations: The case of the brief electricity and magnetism assessment
title_fullStr	Performance of ChatGPT on tasks involving physics visual representations: The case of the brief electricity and magnetism assessment
title_full_unstemmed	Performance of ChatGPT on tasks involving physics visual representations: The case of the brief electricity and magnetism assessment
title_short	Performance of ChatGPT on tasks involving physics visual representations: The case of the brief electricity and magnetism assessment
title_sort	performance of chatgpt on tasks involving physics visual representations the case of the brief electricity and magnetism assessment
url	http://doi.org/10.1103/PhysRevPhysEducRes.21.010154
work_keys_str_mv	AT giuliapolverini performanceofchatgptontasksinvolvingphysicsvisualrepresentationsthecaseofthebriefelectricityandmagnetismassessment AT jakobmelin performanceofchatgptontasksinvolvingphysicsvisualrepresentationsthecaseofthebriefelectricityandmagnetismassessment AT eliasonerud performanceofchatgptontasksinvolvingphysicsvisualrepresentationsthecaseofthebriefelectricityandmagnetismassessment AT borgregorcic performanceofchatgptontasksinvolvingphysicsvisualrepresentationsthecaseofthebriefelectricityandmagnetismassessment

Performance of ChatGPT on tasks involving physics visual representations: The case of the brief electricity and magnetism assessment

Similar Items