Rasch-based comparison of items created with and without generative AI

This study explores the evolving interaction between Generative Artificial Intelligence (AI) and education, focusing on how technologies such as Natural Language Processing and specific models like OpenAI’s ChatGPT can be used on high-stakes examinations. The main objective is to evaluate the abilit...

Full description

Saved in:
Bibliographic Details
Main Authors: Karla Karina Ruiz Mendoza, Luis Horacio Pedroza Zúñiga
Format: Article
Language:English
Published: OmniaScience 2025-06-01
Series:Journal of Technology and Science Education
Subjects:
Online Access:https://www.jotse.org/index.php/jotse/article/view/3135
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849233285723455488
author Karla Karina Ruiz Mendoza
Luis Horacio Pedroza Zúñiga
author_facet Karla Karina Ruiz Mendoza
Luis Horacio Pedroza Zúñiga
author_sort Karla Karina Ruiz Mendoza
collection DOAJ
description This study explores the evolving interaction between Generative Artificial Intelligence (AI) and education, focusing on how technologies such as Natural Language Processing and specific models like OpenAI’s ChatGPT can be used on high-stakes examinations. The main objective is to evaluate the ability of ChatGPT version 4.0 to generate written language assessment items and compare them to those created by human experts. The pilot items were developed for the Higher Education Entrance Examination (ExIES, according to its Spanish initials) administered at the Autonomous University of Baja California. Item Response Theory (IRT) analyses were performed on responses from 2,263 test-takers. Results show that although ChatGPT-generated items tend to be more challenging, both sets exhibit a comparable Rasch model fit and discriminatory power across varying levels of student ability. This finding suggests that Generative AI can effectively complement exam developers in creating large-scale assessments. Furthermore, ChatGPT 4.0 demonstrates a slightly higher capacity to differentiate among students of varying skill levels. In conclusion, the study underscores the importance of continually exploring AI-driven item generation as a potential means to enhance educational assessment practices and improve pedagogical outcomes.
format Article
id doaj-art-da2ba1e09312470aa4861d1b5a602dcd
institution Kabale University
issn 2013-6374
language English
publishDate 2025-06-01
publisher OmniaScience
record_format Article
series Journal of Technology and Science Education
spelling doaj-art-da2ba1e09312470aa4861d1b5a602dcd2025-08-20T10:41:09ZengOmniaScienceJournal of Technology and Science Education2013-63742025-06-0115247949410.3926/jotse.3135483Rasch-based comparison of items created with and without generative AIKarla Karina Ruiz Mendoza0Luis Horacio Pedroza Zúñiga1IIDE-UABCDoctor, Instituto de Investigación y Desarrollo Educativo (IIDE), México PTC del IIDE de la Universidad Autónoma de Baja CaliforniaThis study explores the evolving interaction between Generative Artificial Intelligence (AI) and education, focusing on how technologies such as Natural Language Processing and specific models like OpenAI’s ChatGPT can be used on high-stakes examinations. The main objective is to evaluate the ability of ChatGPT version 4.0 to generate written language assessment items and compare them to those created by human experts. The pilot items were developed for the Higher Education Entrance Examination (ExIES, according to its Spanish initials) administered at the Autonomous University of Baja California. Item Response Theory (IRT) analyses were performed on responses from 2,263 test-takers. Results show that although ChatGPT-generated items tend to be more challenging, both sets exhibit a comparable Rasch model fit and discriminatory power across varying levels of student ability. This finding suggests that Generative AI can effectively complement exam developers in creating large-scale assessments. Furthermore, ChatGPT 4.0 demonstrates a slightly higher capacity to differentiate among students of varying skill levels. In conclusion, the study underscores the importance of continually exploring AI-driven item generation as a potential means to enhance educational assessment practices and improve pedagogical outcomes.https://www.jotse.org/index.php/jotse/article/view/3135artificial intelligence, chatgpt, educational evaluation, test, digital technology
spellingShingle Karla Karina Ruiz Mendoza
Luis Horacio Pedroza Zúñiga
Rasch-based comparison of items created with and without generative AI
Journal of Technology and Science Education
artificial intelligence, chatgpt, educational evaluation, test, digital technology
title Rasch-based comparison of items created with and without generative AI
title_full Rasch-based comparison of items created with and without generative AI
title_fullStr Rasch-based comparison of items created with and without generative AI
title_full_unstemmed Rasch-based comparison of items created with and without generative AI
title_short Rasch-based comparison of items created with and without generative AI
title_sort rasch based comparison of items created with and without generative ai
topic artificial intelligence, chatgpt, educational evaluation, test, digital technology
url https://www.jotse.org/index.php/jotse/article/view/3135
work_keys_str_mv AT karlakarinaruizmendoza raschbasedcomparisonofitemscreatedwithandwithoutgenerativeai
AT luishoraciopedrozazuniga raschbasedcomparisonofitemscreatedwithandwithoutgenerativeai