Evaluating large language models for criterion-based grading from agreement to consistency
Abstract This study evaluates the ability of large language models (LLMs) to deliver criterion-based grading and examines the impact of prompt engineering with detailed criteria on grading. Using well-established human benchmarks and quantitative analyses, we found that even free LLMs achieve criter...
Saved in:
| Main Authors: | Da-Wei Zhang, Melissa Boey, Yan Yu Tan, Alexis Hoh Sheng Jia |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2024-12-01
|
| Series: | npj Science of Learning |
| Online Access: | https://doi.org/10.1038/s41539-024-00291-1 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Climate Warming in Response to Emission Reductions Consistent with the Paris Agreement
by: Fang Wang, et al.
Published: (2018-01-01) -
Is This Reliable Enough? Examining Classification Consistency and Accuracy in a Criterion-Referenced Test
by: Susanne Alger
Published: (2016-04-01) -
Reliability Analysis of Horizontal Curves Using Geometric Design Consistency Assessment Criterion
by: Hossein Saedi, et al.
Published: (2024-01-01) -
Is This Reliable Enough? Examining Classification Consistency and Accuracy in a Criterion-Referenced Test
by: Susanne Alger
Published: (2016-07-01) -
Agreement among the energy expenditure prediction equations with the criterion model in the exhaustive treadmill test protocols
by: معرفت سیاه کوهیان, et al.
Published: (2016-11-01)