Comparing large language models for supervised analysis of students’ lab notes

Recent advancements in large language models (LLMs) hold significant promise for improving physics education research that uses machine learning. In this study, we compare the application of various models for conducting a large-scale analysis of written text grounded in a physics education research...

Full description

Saved in:
Bibliographic Details
Main Authors: Rebeckah K. Fussell, Megan Flynn, Anil Damle, Michael F. J. Fox, N. G. Holmes
Format: Article
Language:English
Published: American Physical Society 2025-03-01
Series:Physical Review Physics Education Research
Online Access:http://doi.org/10.1103/PhysRevPhysEducRes.21.010128
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849390932340768768
author Rebeckah K. Fussell
Megan Flynn
Anil Damle
Michael F. J. Fox
N. G. Holmes
author_facet Rebeckah K. Fussell
Megan Flynn
Anil Damle
Michael F. J. Fox
N. G. Holmes
author_sort Rebeckah K. Fussell
collection DOAJ
description Recent advancements in large language models (LLMs) hold significant promise for improving physics education research that uses machine learning. In this study, we compare the application of various models for conducting a large-scale analysis of written text grounded in a physics education research classification problem: identifying skills in students’ typed lab notes through sentence-level labeling. Specifically, we use training data to fine-tune two different LLMs, BERT and LLaMA, and compare the performance of these models to both a traditional bag-of-words approach and a few-shot LLM (without fine-tuning). We evaluate the models based on their resource use, performance metrics, and research outcomes when identifying skills in lab notes. We find that higher-resource models often, but not necessarily, perform better than lower-resource models. We also find that all models report similar trends in research outcomes, although the absolute values of the estimated measurements are not always within uncertainties of each other. We use the results to discuss relevant considerations for education researchers seeking to select a model type for use as a classifier.
format Article
id doaj-art-be2d25df70344e658eb38f2dda4aa8c4
institution Kabale University
issn 2469-9896
language English
publishDate 2025-03-01
publisher American Physical Society
record_format Article
series Physical Review Physics Education Research
spelling doaj-art-be2d25df70344e658eb38f2dda4aa8c42025-08-20T03:41:15ZengAmerican Physical SocietyPhysical Review Physics Education Research2469-98962025-03-0121101012810.1103/PhysRevPhysEducRes.21.010128Comparing large language models for supervised analysis of students’ lab notesRebeckah K. FussellMegan FlynnAnil DamleMichael F. J. FoxN. G. HolmesRecent advancements in large language models (LLMs) hold significant promise for improving physics education research that uses machine learning. In this study, we compare the application of various models for conducting a large-scale analysis of written text grounded in a physics education research classification problem: identifying skills in students’ typed lab notes through sentence-level labeling. Specifically, we use training data to fine-tune two different LLMs, BERT and LLaMA, and compare the performance of these models to both a traditional bag-of-words approach and a few-shot LLM (without fine-tuning). We evaluate the models based on their resource use, performance metrics, and research outcomes when identifying skills in lab notes. We find that higher-resource models often, but not necessarily, perform better than lower-resource models. We also find that all models report similar trends in research outcomes, although the absolute values of the estimated measurements are not always within uncertainties of each other. We use the results to discuss relevant considerations for education researchers seeking to select a model type for use as a classifier.http://doi.org/10.1103/PhysRevPhysEducRes.21.010128
spellingShingle Rebeckah K. Fussell
Megan Flynn
Anil Damle
Michael F. J. Fox
N. G. Holmes
Comparing large language models for supervised analysis of students’ lab notes
Physical Review Physics Education Research
title Comparing large language models for supervised analysis of students’ lab notes
title_full Comparing large language models for supervised analysis of students’ lab notes
title_fullStr Comparing large language models for supervised analysis of students’ lab notes
title_full_unstemmed Comparing large language models for supervised analysis of students’ lab notes
title_short Comparing large language models for supervised analysis of students’ lab notes
title_sort comparing large language models for supervised analysis of students lab notes
url http://doi.org/10.1103/PhysRevPhysEducRes.21.010128
work_keys_str_mv AT rebeckahkfussell comparinglargelanguagemodelsforsupervisedanalysisofstudentslabnotes
AT meganflynn comparinglargelanguagemodelsforsupervisedanalysisofstudentslabnotes
AT anildamle comparinglargelanguagemodelsforsupervisedanalysisofstudentslabnotes
AT michaelfjfox comparinglargelanguagemodelsforsupervisedanalysisofstudentslabnotes
AT ngholmes comparinglargelanguagemodelsforsupervisedanalysisofstudentslabnotes