Multi-Examiner: A Knowledge Graph-Driven System for Generating Comprehensive IT Questions with Higher-Order Thinking
The question generation system (QGS) for information technology (IT) education, designed to create, evaluate, and improve Multiple-Choice Questions (MCQs) using knowledge graphs (KGs) and large language models (LLMs), encounters three major needs: ensuring the generation of contextually relevant and...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/10/5719 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The question generation system (QGS) for information technology (IT) education, designed to create, evaluate, and improve Multiple-Choice Questions (MCQs) using knowledge graphs (KGs) and large language models (LLMs), encounters three major needs: ensuring the generation of contextually relevant and accurate distractors, enhancing the diversity of generated questions, and balancing the higher-order thinking of questions to match various learning levels. To address these needs, we proposed a multi-agent system named Multi-Examiner, which integrates KGs, domain-specific search tools, and local knowledge bases, categorized according to Bloom’s taxonomy, to enhance the contextual relevance, diversity, and higher-order thinking of automatically generated information technology MCQs. Our methodology employed a mixed-methods approach combining system development with experimental evaluation. We first constructed a specialized architecture combining knowledge graphs with LLMs, then implemented a comparative study generating questions across six knowledge points from K-12 Computer Science Standard. We designed a multidimensional evaluation rubric to assess the semantic coherence, answer correctness, question validity, distractor relevance, question diversity, and higher-order thinking, and conducted a statistical analysis of ratings provided by 30 high school IT teachers. Results showed statistically significant improvements (<i>p</i> < 0.01) with Multi-Examiner outperforming GPT-4 by an average of 0.87 points (on a 5-point scale) for evaluation-level questions and 1.12 points for creation-level questions. The results demonstrated that: (i) overall, questions generated by the Multi-Examiner system outperformed those generated by GPT-4 across all dimensions and closely matched the quality of human-crafted questions in several dimensions; (ii) domain-specific search tools significantly enhanced the diversity of questions generated by Multi-Examiner; and (iii) GPT-4 generated better questions for knowledge points at the “remembering” and “understanding” levels, while Multi-Examiner significantly improved the higher-order thinking of questions for the “evaluating” and “creating” levels. This study contributes to the growing body of research on AI-supported educational assessment by demonstrating how specialized knowledge structures can enhance automated generation of higher-order thinking questions beyond what general-purpose language models can achieve. |
|---|---|
| ISSN: | 2076-3417 |