Dataset Construction Using Item Response Theory for Educational Machine Learning Competitions

Machine learning has been integrated into numerous applications and has emerged as one of the most transformative technologies in our daily lives. In recent years, the number of individuals studying machine learning has grown substantially, leading to the emergence of numerous educational competitio...

Full description

Saved in:
Bibliographic Details
Main Authors: Takeaki Sakabe, Yuko Sakurai, Emiko Tsutsumi, Satoshi Oyama
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11045924/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850092201653043200
author Takeaki Sakabe
Yuko Sakurai
Emiko Tsutsumi
Satoshi Oyama
author_facet Takeaki Sakabe
Yuko Sakurai
Emiko Tsutsumi
Satoshi Oyama
author_sort Takeaki Sakabe
collection DOAJ
description Machine learning has been integrated into numerous applications and has emerged as one of the most transformative technologies in our daily lives. In recent years, the number of individuals studying machine learning has grown substantially, leading to the emergence of numerous educational competitions focused on building expertise in machine learning. In these competitions, the participants are tasked with constructing machine learning (ML) models. However, the dataset used to compare the performances of competing models is often selected arbitrarily, causing discrepancies between the dataset and participants’ skill levels. This can result in competition outcomes that fail to accurately reflect the participants’ abilities. We have developed a framework for generating image datasets that enable the abilities of competition participants to be accurately assessed. Specifically, we introduce the use of item response theory (IRT), commonly used in test creation and ability assessment, to estimate parameters such as item discrimination and difficulty for each image in existing datasets. Additionally, we utilize a conditional variational autoencoder (CVAE) that generates images with specific parameter values. These parameter values are generated based on the ability distribution of the competition participants and used to generate a dataset aligned with their ability distribution. To evaluate the effectiveness of the proposed framework, we conduct experiments using 810 ML models automatically created using 6 parameters with multiple values. Comparison of their performances between the original and the generated dataset showed that the latter was more effective in differentiating model performance. Unlike conventional IRT-based methods, which require human effort for dataset generation, our proposed framework fully automates the dataset generation process. By automating dataset generation, our approach streamlines the organization of ML competitions and ensures that datasets are well-suited to participants’ skill levels. This automation reduces the challenges of hosting competitions, promoting their broader adoption in educational settings.
format Article
id doaj-art-fa0753e41d6348d591fd600652313029
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-fa0753e41d6348d591fd6006523130292025-08-20T02:42:11ZengIEEEIEEE Access2169-35362025-01-011311022611024010.1109/ACCESS.2025.358201611045924Dataset Construction Using Item Response Theory for Educational Machine Learning CompetitionsTakeaki Sakabe0https://orcid.org/0009-0005-6312-2173Yuko Sakurai1https://orcid.org/0000-0002-0642-3878Emiko Tsutsumi2https://orcid.org/0000-0003-3338-8892Satoshi Oyama3https://orcid.org/0000-0002-8124-3578Department of Computer Science, Nagoya Institute of Technology, Nagoya, Aichi, JapanDepartment of Computer Science, Nagoya Institute of Technology, Nagoya, Aichi, JapanFaculty of Science and Engineering, Hosei University, Koganei, Tokyo, JapanGraduate School of Data Science, Nagoya City University, Nagoya, Aichi, JapanMachine learning has been integrated into numerous applications and has emerged as one of the most transformative technologies in our daily lives. In recent years, the number of individuals studying machine learning has grown substantially, leading to the emergence of numerous educational competitions focused on building expertise in machine learning. In these competitions, the participants are tasked with constructing machine learning (ML) models. However, the dataset used to compare the performances of competing models is often selected arbitrarily, causing discrepancies between the dataset and participants’ skill levels. This can result in competition outcomes that fail to accurately reflect the participants’ abilities. We have developed a framework for generating image datasets that enable the abilities of competition participants to be accurately assessed. Specifically, we introduce the use of item response theory (IRT), commonly used in test creation and ability assessment, to estimate parameters such as item discrimination and difficulty for each image in existing datasets. Additionally, we utilize a conditional variational autoencoder (CVAE) that generates images with specific parameter values. These parameter values are generated based on the ability distribution of the competition participants and used to generate a dataset aligned with their ability distribution. To evaluate the effectiveness of the proposed framework, we conduct experiments using 810 ML models automatically created using 6 parameters with multiple values. Comparison of their performances between the original and the generated dataset showed that the latter was more effective in differentiating model performance. Unlike conventional IRT-based methods, which require human effort for dataset generation, our proposed framework fully automates the dataset generation process. By automating dataset generation, our approach streamlines the organization of ML competitions and ensures that datasets are well-suited to participants’ skill levels. This automation reduces the challenges of hosting competitions, promoting their broader adoption in educational settings.https://ieeexplore.ieee.org/document/11045924/Item response theoryconditional VAEdata analysis competitiongenerating dataset
spellingShingle Takeaki Sakabe
Yuko Sakurai
Emiko Tsutsumi
Satoshi Oyama
Dataset Construction Using Item Response Theory for Educational Machine Learning Competitions
IEEE Access
Item response theory
conditional VAE
data analysis competition
generating dataset
title Dataset Construction Using Item Response Theory for Educational Machine Learning Competitions
title_full Dataset Construction Using Item Response Theory for Educational Machine Learning Competitions
title_fullStr Dataset Construction Using Item Response Theory for Educational Machine Learning Competitions
title_full_unstemmed Dataset Construction Using Item Response Theory for Educational Machine Learning Competitions
title_short Dataset Construction Using Item Response Theory for Educational Machine Learning Competitions
title_sort dataset construction using item response theory for educational machine learning competitions
topic Item response theory
conditional VAE
data analysis competition
generating dataset
url https://ieeexplore.ieee.org/document/11045924/
work_keys_str_mv AT takeakisakabe datasetconstructionusingitemresponsetheoryforeducationalmachinelearningcompetitions
AT yukosakurai datasetconstructionusingitemresponsetheoryforeducationalmachinelearningcompetitions
AT emikotsutsumi datasetconstructionusingitemresponsetheoryforeducationalmachinelearningcompetitions
AT satoshioyama datasetconstructionusingitemresponsetheoryforeducationalmachinelearningcompetitions