Multi-Examiner: A Knowledge Graph-Driven System for Generating Comprehensive IT Questions with Higher-Order Thinking
The question generation system (QGS) for information technology (IT) education, designed to create, evaluate, and improve Multiple-Choice Questions (MCQs) using knowledge graphs (KGs) and large language models (LLMs), encounters three major needs: ensuring the generation of contextually relevant and...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/10/5719 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849711071576719360 |
|---|---|
| author | Yonggu Wang Zeyu Yu Zihan Wang Zengyi Yu Jue Wang |
| author_facet | Yonggu Wang Zeyu Yu Zihan Wang Zengyi Yu Jue Wang |
| author_sort | Yonggu Wang |
| collection | DOAJ |
| description | The question generation system (QGS) for information technology (IT) education, designed to create, evaluate, and improve Multiple-Choice Questions (MCQs) using knowledge graphs (KGs) and large language models (LLMs), encounters three major needs: ensuring the generation of contextually relevant and accurate distractors, enhancing the diversity of generated questions, and balancing the higher-order thinking of questions to match various learning levels. To address these needs, we proposed a multi-agent system named Multi-Examiner, which integrates KGs, domain-specific search tools, and local knowledge bases, categorized according to Bloom’s taxonomy, to enhance the contextual relevance, diversity, and higher-order thinking of automatically generated information technology MCQs. Our methodology employed a mixed-methods approach combining system development with experimental evaluation. We first constructed a specialized architecture combining knowledge graphs with LLMs, then implemented a comparative study generating questions across six knowledge points from K-12 Computer Science Standard. We designed a multidimensional evaluation rubric to assess the semantic coherence, answer correctness, question validity, distractor relevance, question diversity, and higher-order thinking, and conducted a statistical analysis of ratings provided by 30 high school IT teachers. Results showed statistically significant improvements (<i>p</i> < 0.01) with Multi-Examiner outperforming GPT-4 by an average of 0.87 points (on a 5-point scale) for evaluation-level questions and 1.12 points for creation-level questions. The results demonstrated that: (i) overall, questions generated by the Multi-Examiner system outperformed those generated by GPT-4 across all dimensions and closely matched the quality of human-crafted questions in several dimensions; (ii) domain-specific search tools significantly enhanced the diversity of questions generated by Multi-Examiner; and (iii) GPT-4 generated better questions for knowledge points at the “remembering” and “understanding” levels, while Multi-Examiner significantly improved the higher-order thinking of questions for the “evaluating” and “creating” levels. This study contributes to the growing body of research on AI-supported educational assessment by demonstrating how specialized knowledge structures can enhance automated generation of higher-order thinking questions beyond what general-purpose language models can achieve. |
| format | Article |
| id | doaj-art-9a2b24e2603944dc8a553428cc6a983e |
| institution | DOAJ |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-9a2b24e2603944dc8a553428cc6a983e2025-08-20T03:14:42ZengMDPI AGApplied Sciences2076-34172025-05-011510571910.3390/app15105719Multi-Examiner: A Knowledge Graph-Driven System for Generating Comprehensive IT Questions with Higher-Order ThinkingYonggu Wang0Zeyu Yu1Zihan Wang2Zengyi Yu3Jue Wang4College of Education, Zhejiang University of Technology, Hangzhou 310023, ChinaCollege of Education, Zhejiang University of Technology, Hangzhou 310023, ChinaCollege of Education, Zhejiang University of Technology, Hangzhou 310023, ChinaCollege of Education, Zhejiang University of Technology, Hangzhou 310023, ChinaFaculty of Applied Science and Engineering, University of Toronto, 35 St. George Street, Toronto, ON M5S 1A4, CanadaThe question generation system (QGS) for information technology (IT) education, designed to create, evaluate, and improve Multiple-Choice Questions (MCQs) using knowledge graphs (KGs) and large language models (LLMs), encounters three major needs: ensuring the generation of contextually relevant and accurate distractors, enhancing the diversity of generated questions, and balancing the higher-order thinking of questions to match various learning levels. To address these needs, we proposed a multi-agent system named Multi-Examiner, which integrates KGs, domain-specific search tools, and local knowledge bases, categorized according to Bloom’s taxonomy, to enhance the contextual relevance, diversity, and higher-order thinking of automatically generated information technology MCQs. Our methodology employed a mixed-methods approach combining system development with experimental evaluation. We first constructed a specialized architecture combining knowledge graphs with LLMs, then implemented a comparative study generating questions across six knowledge points from K-12 Computer Science Standard. We designed a multidimensional evaluation rubric to assess the semantic coherence, answer correctness, question validity, distractor relevance, question diversity, and higher-order thinking, and conducted a statistical analysis of ratings provided by 30 high school IT teachers. Results showed statistically significant improvements (<i>p</i> < 0.01) with Multi-Examiner outperforming GPT-4 by an average of 0.87 points (on a 5-point scale) for evaluation-level questions and 1.12 points for creation-level questions. The results demonstrated that: (i) overall, questions generated by the Multi-Examiner system outperformed those generated by GPT-4 across all dimensions and closely matched the quality of human-crafted questions in several dimensions; (ii) domain-specific search tools significantly enhanced the diversity of questions generated by Multi-Examiner; and (iii) GPT-4 generated better questions for knowledge points at the “remembering” and “understanding” levels, while Multi-Examiner significantly improved the higher-order thinking of questions for the “evaluating” and “creating” levels. This study contributes to the growing body of research on AI-supported educational assessment by demonstrating how specialized knowledge structures can enhance automated generation of higher-order thinking questions beyond what general-purpose language models can achieve.https://www.mdpi.com/2076-3417/15/10/5719question generationmulti-agent systemsknowledge graphslarge language modelsinformation technology educationBloom’s taxonomy |
| spellingShingle | Yonggu Wang Zeyu Yu Zihan Wang Zengyi Yu Jue Wang Multi-Examiner: A Knowledge Graph-Driven System for Generating Comprehensive IT Questions with Higher-Order Thinking Applied Sciences question generation multi-agent systems knowledge graphs large language models information technology education Bloom’s taxonomy |
| title | Multi-Examiner: A Knowledge Graph-Driven System for Generating Comprehensive IT Questions with Higher-Order Thinking |
| title_full | Multi-Examiner: A Knowledge Graph-Driven System for Generating Comprehensive IT Questions with Higher-Order Thinking |
| title_fullStr | Multi-Examiner: A Knowledge Graph-Driven System for Generating Comprehensive IT Questions with Higher-Order Thinking |
| title_full_unstemmed | Multi-Examiner: A Knowledge Graph-Driven System for Generating Comprehensive IT Questions with Higher-Order Thinking |
| title_short | Multi-Examiner: A Knowledge Graph-Driven System for Generating Comprehensive IT Questions with Higher-Order Thinking |
| title_sort | multi examiner a knowledge graph driven system for generating comprehensive it questions with higher order thinking |
| topic | question generation multi-agent systems knowledge graphs large language models information technology education Bloom’s taxonomy |
| url | https://www.mdpi.com/2076-3417/15/10/5719 |
| work_keys_str_mv | AT yongguwang multiexamineraknowledgegraphdrivensystemforgeneratingcomprehensiveitquestionswithhigherorderthinking AT zeyuyu multiexamineraknowledgegraphdrivensystemforgeneratingcomprehensiveitquestionswithhigherorderthinking AT zihanwang multiexamineraknowledgegraphdrivensystemforgeneratingcomprehensiveitquestionswithhigherorderthinking AT zengyiyu multiexamineraknowledgegraphdrivensystemforgeneratingcomprehensiveitquestionswithhigherorderthinking AT juewang multiexamineraknowledgegraphdrivensystemforgeneratingcomprehensiveitquestionswithhigherorderthinking |