Ensemble learning with RAG model to reduce redundant question topics in auto-generated exam questions

Abstract Reducing redundant question topics during automatic question generation (AQG) is essential for enhancing the quality of test sheets in assessment. Existing AQG models frequently generate repetitive questions due to insufficient named entity (Question Topic) diversity. This study aims to red...

Full description

Saved in:
Bibliographic Details
Main Authors: R. Tharaniya Sairaj, S. R. Balasundaram
Format: Article
Language:English
Published: Springer 2025-08-01
Series:Discover Computing
Subjects:
Online Access:https://doi.org/10.1007/s10791-025-09683-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849234567485980672
author R. Tharaniya Sairaj
S. R. Balasundaram
author_facet R. Tharaniya Sairaj
S. R. Balasundaram
author_sort R. Tharaniya Sairaj
collection DOAJ
description Abstract Reducing redundant question topics during automatic question generation (AQG) is essential for enhancing the quality of test sheets in assessment. Existing AQG models frequently generate repetitive questions due to insufficient named entity (Question Topic) diversity. This study aims to reduce redundancy in auto generated questions by improving diversity in question topics. The methodology is organised in three main phases: first, a fuzzy ontology mapping technique with ensemble learning is applied to generate expanded entity-relationship set generation from external Knowledge Graphs. Second, the generated entity-relationship set is integrated with Retrieval Augmented Generation (RAG) model for source text expansion via augmentation. Third, T5-shd, a pretrained AQG model is adopted to reduce repetition in generated questions. Comparison against baselines such as T5-e2e and T5-ppl shows substantial performance gain as well as reduction in redundancy. Experimental results on various datasets show that the proposed RAG + T5 model reduces redundant question topics along with an improvement in terms of ROUGE-2 metric (up to 5%) and BERTScore (up to 12%) over existing methods. The application of Ensemble Pruning, specifically with the m-EPIC algorithm, further enhances accuracy while reducing computational overhead (around 26%). These findings highlight the efficacy of combining ensemble learning with RAG-based transformers to refine AQG, ensuring improved question diversity and balanced relevance of generated questions. Additionally, this approach helps to reduce model complexity in Automatic Question Generation.
format Article
id doaj-art-d3540d99962a4e7aa0c6f126fc589262
institution Kabale University
issn 2948-2992
language English
publishDate 2025-08-01
publisher Springer
record_format Article
series Discover Computing
spelling doaj-art-d3540d99962a4e7aa0c6f126fc5892622025-08-20T04:03:06ZengSpringerDiscover Computing2948-29922025-08-0128111710.1007/s10791-025-09683-2Ensemble learning with RAG model to reduce redundant question topics in auto-generated exam questionsR. Tharaniya Sairaj0S. R. Balasundaram1Department of Computer Science Engineering, Indian Institute of Information TechnologyDepartment of Computer Applications, National Institute of TechnologyAbstract Reducing redundant question topics during automatic question generation (AQG) is essential for enhancing the quality of test sheets in assessment. Existing AQG models frequently generate repetitive questions due to insufficient named entity (Question Topic) diversity. This study aims to reduce redundancy in auto generated questions by improving diversity in question topics. The methodology is organised in three main phases: first, a fuzzy ontology mapping technique with ensemble learning is applied to generate expanded entity-relationship set generation from external Knowledge Graphs. Second, the generated entity-relationship set is integrated with Retrieval Augmented Generation (RAG) model for source text expansion via augmentation. Third, T5-shd, a pretrained AQG model is adopted to reduce repetition in generated questions. Comparison against baselines such as T5-e2e and T5-ppl shows substantial performance gain as well as reduction in redundancy. Experimental results on various datasets show that the proposed RAG + T5 model reduces redundant question topics along with an improvement in terms of ROUGE-2 metric (up to 5%) and BERTScore (up to 12%) over existing methods. The application of Ensemble Pruning, specifically with the m-EPIC algorithm, further enhances accuracy while reducing computational overhead (around 26%). These findings highlight the efficacy of combining ensemble learning with RAG-based transformers to refine AQG, ensuring improved question diversity and balanced relevance of generated questions. Additionally, this approach helps to reduce model complexity in Automatic Question Generation.https://doi.org/10.1007/s10791-025-09683-2Named entity selectionEnsemble pruning algorithmKnowledge graphsRetrieval augmented generationAutomatic question generation.
spellingShingle R. Tharaniya Sairaj
S. R. Balasundaram
Ensemble learning with RAG model to reduce redundant question topics in auto-generated exam questions
Discover Computing
Named entity selection
Ensemble pruning algorithm
Knowledge graphs
Retrieval augmented generation
Automatic question generation.
title Ensemble learning with RAG model to reduce redundant question topics in auto-generated exam questions
title_full Ensemble learning with RAG model to reduce redundant question topics in auto-generated exam questions
title_fullStr Ensemble learning with RAG model to reduce redundant question topics in auto-generated exam questions
title_full_unstemmed Ensemble learning with RAG model to reduce redundant question topics in auto-generated exam questions
title_short Ensemble learning with RAG model to reduce redundant question topics in auto-generated exam questions
title_sort ensemble learning with rag model to reduce redundant question topics in auto generated exam questions
topic Named entity selection
Ensemble pruning algorithm
Knowledge graphs
Retrieval augmented generation
Automatic question generation.
url https://doi.org/10.1007/s10791-025-09683-2
work_keys_str_mv AT rtharaniyasairaj ensemblelearningwithragmodeltoreduceredundantquestiontopicsinautogeneratedexamquestions
AT srbalasundaram ensemblelearningwithragmodeltoreduceredundantquestiontopicsinautogeneratedexamquestions