Dataset Construction Using Item Response Theory for Educational Machine Learning Competitions
Machine learning has been integrated into numerous applications and has emerged as one of the most transformative technologies in our daily lives. In recent years, the number of individuals studying machine learning has grown substantially, leading to the emergence of numerous educational competitio...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11045924/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850092201653043200 |
|---|---|
| author | Takeaki Sakabe Yuko Sakurai Emiko Tsutsumi Satoshi Oyama |
| author_facet | Takeaki Sakabe Yuko Sakurai Emiko Tsutsumi Satoshi Oyama |
| author_sort | Takeaki Sakabe |
| collection | DOAJ |
| description | Machine learning has been integrated into numerous applications and has emerged as one of the most transformative technologies in our daily lives. In recent years, the number of individuals studying machine learning has grown substantially, leading to the emergence of numerous educational competitions focused on building expertise in machine learning. In these competitions, the participants are tasked with constructing machine learning (ML) models. However, the dataset used to compare the performances of competing models is often selected arbitrarily, causing discrepancies between the dataset and participants’ skill levels. This can result in competition outcomes that fail to accurately reflect the participants’ abilities. We have developed a framework for generating image datasets that enable the abilities of competition participants to be accurately assessed. Specifically, we introduce the use of item response theory (IRT), commonly used in test creation and ability assessment, to estimate parameters such as item discrimination and difficulty for each image in existing datasets. Additionally, we utilize a conditional variational autoencoder (CVAE) that generates images with specific parameter values. These parameter values are generated based on the ability distribution of the competition participants and used to generate a dataset aligned with their ability distribution. To evaluate the effectiveness of the proposed framework, we conduct experiments using 810 ML models automatically created using 6 parameters with multiple values. Comparison of their performances between the original and the generated dataset showed that the latter was more effective in differentiating model performance. Unlike conventional IRT-based methods, which require human effort for dataset generation, our proposed framework fully automates the dataset generation process. By automating dataset generation, our approach streamlines the organization of ML competitions and ensures that datasets are well-suited to participants’ skill levels. This automation reduces the challenges of hosting competitions, promoting their broader adoption in educational settings. |
| format | Article |
| id | doaj-art-fa0753e41d6348d591fd600652313029 |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-fa0753e41d6348d591fd6006523130292025-08-20T02:42:11ZengIEEEIEEE Access2169-35362025-01-011311022611024010.1109/ACCESS.2025.358201611045924Dataset Construction Using Item Response Theory for Educational Machine Learning CompetitionsTakeaki Sakabe0https://orcid.org/0009-0005-6312-2173Yuko Sakurai1https://orcid.org/0000-0002-0642-3878Emiko Tsutsumi2https://orcid.org/0000-0003-3338-8892Satoshi Oyama3https://orcid.org/0000-0002-8124-3578Department of Computer Science, Nagoya Institute of Technology, Nagoya, Aichi, JapanDepartment of Computer Science, Nagoya Institute of Technology, Nagoya, Aichi, JapanFaculty of Science and Engineering, Hosei University, Koganei, Tokyo, JapanGraduate School of Data Science, Nagoya City University, Nagoya, Aichi, JapanMachine learning has been integrated into numerous applications and has emerged as one of the most transformative technologies in our daily lives. In recent years, the number of individuals studying machine learning has grown substantially, leading to the emergence of numerous educational competitions focused on building expertise in machine learning. In these competitions, the participants are tasked with constructing machine learning (ML) models. However, the dataset used to compare the performances of competing models is often selected arbitrarily, causing discrepancies between the dataset and participants’ skill levels. This can result in competition outcomes that fail to accurately reflect the participants’ abilities. We have developed a framework for generating image datasets that enable the abilities of competition participants to be accurately assessed. Specifically, we introduce the use of item response theory (IRT), commonly used in test creation and ability assessment, to estimate parameters such as item discrimination and difficulty for each image in existing datasets. Additionally, we utilize a conditional variational autoencoder (CVAE) that generates images with specific parameter values. These parameter values are generated based on the ability distribution of the competition participants and used to generate a dataset aligned with their ability distribution. To evaluate the effectiveness of the proposed framework, we conduct experiments using 810 ML models automatically created using 6 parameters with multiple values. Comparison of their performances between the original and the generated dataset showed that the latter was more effective in differentiating model performance. Unlike conventional IRT-based methods, which require human effort for dataset generation, our proposed framework fully automates the dataset generation process. By automating dataset generation, our approach streamlines the organization of ML competitions and ensures that datasets are well-suited to participants’ skill levels. This automation reduces the challenges of hosting competitions, promoting their broader adoption in educational settings.https://ieeexplore.ieee.org/document/11045924/Item response theoryconditional VAEdata analysis competitiongenerating dataset |
| spellingShingle | Takeaki Sakabe Yuko Sakurai Emiko Tsutsumi Satoshi Oyama Dataset Construction Using Item Response Theory for Educational Machine Learning Competitions IEEE Access Item response theory conditional VAE data analysis competition generating dataset |
| title | Dataset Construction Using Item Response Theory for Educational Machine Learning Competitions |
| title_full | Dataset Construction Using Item Response Theory for Educational Machine Learning Competitions |
| title_fullStr | Dataset Construction Using Item Response Theory for Educational Machine Learning Competitions |
| title_full_unstemmed | Dataset Construction Using Item Response Theory for Educational Machine Learning Competitions |
| title_short | Dataset Construction Using Item Response Theory for Educational Machine Learning Competitions |
| title_sort | dataset construction using item response theory for educational machine learning competitions |
| topic | Item response theory conditional VAE data analysis competition generating dataset |
| url | https://ieeexplore.ieee.org/document/11045924/ |
| work_keys_str_mv | AT takeakisakabe datasetconstructionusingitemresponsetheoryforeducationalmachinelearningcompetitions AT yukosakurai datasetconstructionusingitemresponsetheoryforeducationalmachinelearningcompetitions AT emikotsutsumi datasetconstructionusingitemresponsetheoryforeducationalmachinelearningcompetitions AT satoshioyama datasetconstructionusingitemresponsetheoryforeducationalmachinelearningcompetitions |