Evaluating large language models for criterion-based grading from agreement to consistency
Abstract This study evaluates the ability of large language models (LLMs) to deliver criterion-based grading and examines the impact of prompt engineering with detailed criteria on grading. Using well-established human benchmarks and quantitative analyses, we found that even free LLMs achieve criter...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2024-12-01
|
| Series: | npj Science of Learning |
| Online Access: | https://doi.org/10.1038/s41539-024-00291-1 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|