Rasch-based comparison of items created with and without generative AI
This study explores the evolving interaction between Generative Artificial Intelligence (AI) and education, focusing on how technologies such as Natural Language Processing and specific models like OpenAI’s ChatGPT can be used on high-stakes examinations. The main objective is to evaluate the abilit...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
OmniaScience
2025-06-01
|
| Series: | Journal of Technology and Science Education |
| Subjects: | |
| Online Access: | https://www.jotse.org/index.php/jotse/article/view/3135 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849233285723455488 |
|---|---|
| author | Karla Karina Ruiz Mendoza Luis Horacio Pedroza Zúñiga |
| author_facet | Karla Karina Ruiz Mendoza Luis Horacio Pedroza Zúñiga |
| author_sort | Karla Karina Ruiz Mendoza |
| collection | DOAJ |
| description | This study explores the evolving interaction between Generative Artificial Intelligence (AI) and education, focusing on how technologies such as Natural Language Processing and specific models like OpenAI’s ChatGPT can be used on high-stakes examinations. The main objective is to evaluate the ability of ChatGPT version 4.0 to generate written language assessment items and compare them to those created by human experts. The pilot items were developed for the Higher Education Entrance Examination (ExIES, according to its Spanish initials) administered at the Autonomous University of Baja California. Item Response Theory (IRT) analyses were performed on responses from 2,263 test-takers. Results show that although ChatGPT-generated items tend to be more challenging, both sets exhibit a comparable Rasch model fit and discriminatory power across varying levels of student ability. This finding suggests that Generative AI can effectively complement exam developers in creating large-scale assessments. Furthermore, ChatGPT 4.0 demonstrates a slightly higher capacity to differentiate among students of varying skill levels. In conclusion, the study underscores the importance of continually exploring AI-driven item generation as a potential means to enhance educational assessment practices and improve pedagogical outcomes. |
| format | Article |
| id | doaj-art-da2ba1e09312470aa4861d1b5a602dcd |
| institution | Kabale University |
| issn | 2013-6374 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | OmniaScience |
| record_format | Article |
| series | Journal of Technology and Science Education |
| spelling | doaj-art-da2ba1e09312470aa4861d1b5a602dcd2025-08-20T10:41:09ZengOmniaScienceJournal of Technology and Science Education2013-63742025-06-0115247949410.3926/jotse.3135483Rasch-based comparison of items created with and without generative AIKarla Karina Ruiz Mendoza0Luis Horacio Pedroza Zúñiga1IIDE-UABCDoctor, Instituto de Investigación y Desarrollo Educativo (IIDE), México PTC del IIDE de la Universidad Autónoma de Baja CaliforniaThis study explores the evolving interaction between Generative Artificial Intelligence (AI) and education, focusing on how technologies such as Natural Language Processing and specific models like OpenAI’s ChatGPT can be used on high-stakes examinations. The main objective is to evaluate the ability of ChatGPT version 4.0 to generate written language assessment items and compare them to those created by human experts. The pilot items were developed for the Higher Education Entrance Examination (ExIES, according to its Spanish initials) administered at the Autonomous University of Baja California. Item Response Theory (IRT) analyses were performed on responses from 2,263 test-takers. Results show that although ChatGPT-generated items tend to be more challenging, both sets exhibit a comparable Rasch model fit and discriminatory power across varying levels of student ability. This finding suggests that Generative AI can effectively complement exam developers in creating large-scale assessments. Furthermore, ChatGPT 4.0 demonstrates a slightly higher capacity to differentiate among students of varying skill levels. In conclusion, the study underscores the importance of continually exploring AI-driven item generation as a potential means to enhance educational assessment practices and improve pedagogical outcomes.https://www.jotse.org/index.php/jotse/article/view/3135artificial intelligence, chatgpt, educational evaluation, test, digital technology |
| spellingShingle | Karla Karina Ruiz Mendoza Luis Horacio Pedroza Zúñiga Rasch-based comparison of items created with and without generative AI Journal of Technology and Science Education artificial intelligence, chatgpt, educational evaluation, test, digital technology |
| title | Rasch-based comparison of items created with and without generative AI |
| title_full | Rasch-based comparison of items created with and without generative AI |
| title_fullStr | Rasch-based comparison of items created with and without generative AI |
| title_full_unstemmed | Rasch-based comparison of items created with and without generative AI |
| title_short | Rasch-based comparison of items created with and without generative AI |
| title_sort | rasch based comparison of items created with and without generative ai |
| topic | artificial intelligence, chatgpt, educational evaluation, test, digital technology |
| url | https://www.jotse.org/index.php/jotse/article/view/3135 |
| work_keys_str_mv | AT karlakarinaruizmendoza raschbasedcomparisonofitemscreatedwithandwithoutgenerativeai AT luishoraciopedrozazuniga raschbasedcomparisonofitemscreatedwithandwithoutgenerativeai |