Assisting quality assurance of examination tasks: Using a GPT model and Bayesian testing for formative assessment
Formative quality assurance in the creation of examination tasks has always been an extremely time-consuming process. Especially due to the changing and short-lived content of computer science, new questions have to be created regularly, which in turn requires quality assurance. With the emergence o...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-06-01
|
| Series: | Computers and Education: Artificial Intelligence |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2666920X24001462 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Formative quality assurance in the creation of examination tasks has always been an extremely time-consuming process. Especially due to the changing and short-lived content of computer science, new questions have to be created regularly, which in turn requires quality assurance. With the emergence of artificial intelligence (AI) systems such as ChatGPT and their ability to solve a range of different tasks, the question arises as to what extent this ability can also be utilized as part of a quality assurance process. One aspect of the formative quality assurance of multiple-choice questions involves checking the correct classification of alternative answers into correct and incorrect answers. As AI systems inherently lack transparency and predictability in their output, we present a simplified approach using Bayesian hypothesis testing to estimate the tendencies of an AI towards the classification. To evaluate the approach, the process is implemented and connected to the OpenAI API to handle inconsistent responses and other aspects that contribute to the robustness and reliability. This research is concluded by an evaluation carried out by means of the gpt-3.5-turbo model, using the examination tasks of two programming courses. This provides insights into the response scheme of the AI in relation to the prompt pattern used and the usability of AI for the subsequent quality assurance process. |
|---|---|
| ISSN: | 2666-920X |