COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS
Generative artificial intelligence (GenAI) tools offer innovative capabilities for addressing a wide array of tasks involving extensive datasets, both textual and non-textual. These tools have shown remarkable potential in the field of education, where their functionalities are increasingly leverage...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | ces |
Published: |
Vydavatelství ZČU v Plzni
2024-12-01
|
Series: | Trendy v podnikání |
Subjects: | |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1825208903424540672 |
---|---|
author | Tomáš Kincl Daria Gunina Michal Novák Jan Pospíšil |
author_facet | Tomáš Kincl Daria Gunina Michal Novák Jan Pospíšil |
author_sort | Tomáš Kincl |
collection | DOAJ |
description | Generative artificial intelligence (GenAI) tools offer innovative capabilities for addressing a wide array of tasks involving extensive datasets, both textual and non-textual. These tools have shown remarkable potential in the field of education, where their functionalities are increasingly leveraged not only by students but also by educators. This study investigates the extent to which human evaluator assessments align with automated evaluations conducted by large language models, with a focus on a) the complexity of the evaluated texts (academic essays that encompass literature reviews, critical assessments of sources, and reflective insights within the context of societal or economic practices) and b) the unique challenges posed by the Czech language, in which the evaluated works are submitted. The research adopts a quantitative (cross-sectional) approach, analysing 30 essays submitted as an assignment for a foundational theoretical course at the master's level. These essays were evaluated by a human evaluator and subsequently by virtual assistants utilizing large language models, specifically ChatGPT (paid version 4.0) and Claude (paid version Sonet 3.5). Statistical analysis revealed that there is a significant statistical difference between human evaluator and both automated systems. Moreover, the evaluations were not consistent when distinguishing between good and less good essays. We also discussed challenges and limitations of using GenAI tools for evaluating submitted text assignments in the context of tertiary education. |
format | Article |
id | doaj-art-f0eab9137c5541c3b3bf6818798da2a5 |
institution | Kabale University |
issn | 2788-0079 |
language | ces |
publishDate | 2024-12-01 |
publisher | Vydavatelství ZČU v Plzni |
record_format | Article |
series | Trendy v podnikání |
spelling | doaj-art-f0eab9137c5541c3b3bf6818798da2a52025-02-06T17:42:43ZcesVydavatelství ZČU v PlzniTrendy v podnikání2788-00792024-12-011422534https://doi.org/10.24132/jbt.2024.14.2.25_34COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONSTomáš Kinclhttps://orcid.org/0000-0002-9738-3348Daria Guninahttps://orcid.org/0000-0002-4149-4962Michal Novákhttps://orcid.org/0000-0001-7893-7774Jan Pospíšilhttps://orcid.org/0000-0003-2054-311XGenerative artificial intelligence (GenAI) tools offer innovative capabilities for addressing a wide array of tasks involving extensive datasets, both textual and non-textual. These tools have shown remarkable potential in the field of education, where their functionalities are increasingly leveraged not only by students but also by educators. This study investigates the extent to which human evaluator assessments align with automated evaluations conducted by large language models, with a focus on a) the complexity of the evaluated texts (academic essays that encompass literature reviews, critical assessments of sources, and reflective insights within the context of societal or economic practices) and b) the unique challenges posed by the Czech language, in which the evaluated works are submitted. The research adopts a quantitative (cross-sectional) approach, analysing 30 essays submitted as an assignment for a foundational theoretical course at the master's level. These essays were evaluated by a human evaluator and subsequently by virtual assistants utilizing large language models, specifically ChatGPT (paid version 4.0) and Claude (paid version Sonet 3.5). Statistical analysis revealed that there is a significant statistical difference between human evaluator and both automated systems. Moreover, the evaluations were not consistent when distinguishing between good and less good essays. We also discussed challenges and limitations of using GenAI tools for evaluating submitted text assignments in the context of tertiary education.automated essay evaluationgenerative aichatgpttertiary education |
spellingShingle | Tomáš Kincl Daria Gunina Michal Novák Jan Pospíšil COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS Trendy v podnikání automated essay evaluation generative ai chatgpt tertiary education |
title | COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS |
title_full | COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS |
title_fullStr | COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS |
title_full_unstemmed | COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS |
title_short | COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS |
title_sort | comparing human and ai based essay evaluation in the czech higher education challenges and limitations |
topic | automated essay evaluation generative ai chatgpt tertiary education |
work_keys_str_mv | AT tomaskincl comparinghumanandaibasedessayevaluationintheczechhighereducationchallengesandlimitations AT dariagunina comparinghumanandaibasedessayevaluationintheczechhighereducationchallengesandlimitations AT michalnovak comparinghumanandaibasedessayevaluationintheczechhighereducationchallengesandlimitations AT janpospisil comparinghumanandaibasedessayevaluationintheczechhighereducationchallengesandlimitations |