COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS

Generative artificial intelligence (GenAI) tools offer innovative capabilities for addressing a wide array of tasks involving extensive datasets, both textual and non-textual. These tools have shown remarkable potential in the field of education, where their functionalities are increasingly leverage...

Full description

Saved in:
Bibliographic Details
Main Authors: Tomáš Kincl, Daria Gunina, Michal Novák, Jan Pospíšil
Format: Article
Language:ces
Published: Vydavatelství ZČU v Plzni 2024-12-01
Series:Trendy v podnikání
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1825208903424540672
author Tomáš Kincl
Daria Gunina
Michal Novák
Jan Pospíšil
author_facet Tomáš Kincl
Daria Gunina
Michal Novák
Jan Pospíšil
author_sort Tomáš Kincl
collection DOAJ
description Generative artificial intelligence (GenAI) tools offer innovative capabilities for addressing a wide array of tasks involving extensive datasets, both textual and non-textual. These tools have shown remarkable potential in the field of education, where their functionalities are increasingly leveraged not only by students but also by educators. This study investigates the extent to which human evaluator assessments align with automated evaluations conducted by large language models, with a focus on a) the complexity of the evaluated texts (academic essays that encompass literature reviews, critical assessments of sources, and reflective insights within the context of societal or economic practices) and b) the unique challenges posed by the Czech language, in which the evaluated works are submitted. The research adopts a quantitative (cross-sectional) approach, analysing 30 essays submitted as an assignment for a foundational theoretical course at the master's level. These essays were evaluated by a human evaluator and subsequently by virtual assistants utilizing large language models, specifically ChatGPT (paid version 4.0) and Claude (paid version Sonet 3.5). Statistical analysis revealed that there is a significant statistical difference between human evaluator and both automated systems. Moreover, the evaluations were not consistent when distinguishing between good and less good essays. We also discussed challenges and limitations of using GenAI tools for evaluating submitted text assignments in the context of tertiary education.
format Article
id doaj-art-f0eab9137c5541c3b3bf6818798da2a5
institution Kabale University
issn 2788-0079
language ces
publishDate 2024-12-01
publisher Vydavatelství ZČU v Plzni
record_format Article
series Trendy v podnikání
spelling doaj-art-f0eab9137c5541c3b3bf6818798da2a52025-02-06T17:42:43ZcesVydavatelství ZČU v PlzniTrendy v podnikání2788-00792024-12-011422534https://doi.org/10.24132/jbt.2024.14.2.25_34COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONSTomáš Kinclhttps://orcid.org/0000-0002-9738-3348Daria Guninahttps://orcid.org/0000-0002-4149-4962Michal Novákhttps://orcid.org/0000-0001-7893-7774Jan Pospíšilhttps://orcid.org/0000-0003-2054-311XGenerative artificial intelligence (GenAI) tools offer innovative capabilities for addressing a wide array of tasks involving extensive datasets, both textual and non-textual. These tools have shown remarkable potential in the field of education, where their functionalities are increasingly leveraged not only by students but also by educators. This study investigates the extent to which human evaluator assessments align with automated evaluations conducted by large language models, with a focus on a) the complexity of the evaluated texts (academic essays that encompass literature reviews, critical assessments of sources, and reflective insights within the context of societal or economic practices) and b) the unique challenges posed by the Czech language, in which the evaluated works are submitted. The research adopts a quantitative (cross-sectional) approach, analysing 30 essays submitted as an assignment for a foundational theoretical course at the master's level. These essays were evaluated by a human evaluator and subsequently by virtual assistants utilizing large language models, specifically ChatGPT (paid version 4.0) and Claude (paid version Sonet 3.5). Statistical analysis revealed that there is a significant statistical difference between human evaluator and both automated systems. Moreover, the evaluations were not consistent when distinguishing between good and less good essays. We also discussed challenges and limitations of using GenAI tools for evaluating submitted text assignments in the context of tertiary education.automated essay evaluationgenerative aichatgpttertiary education
spellingShingle Tomáš Kincl
Daria Gunina
Michal Novák
Jan Pospíšil
COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS
Trendy v podnikání
automated essay evaluation
generative ai
chatgpt
tertiary education
title COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS
title_full COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS
title_fullStr COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS
title_full_unstemmed COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS
title_short COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS
title_sort comparing human and ai based essay evaluation in the czech higher education challenges and limitations
topic automated essay evaluation
generative ai
chatgpt
tertiary education
work_keys_str_mv AT tomaskincl comparinghumanandaibasedessayevaluationintheczechhighereducationchallengesandlimitations
AT dariagunina comparinghumanandaibasedessayevaluationintheczechhighereducationchallengesandlimitations
AT michalnovak comparinghumanandaibasedessayevaluationintheczechhighereducationchallengesandlimitations
AT janpospisil comparinghumanandaibasedessayevaluationintheczechhighereducationchallengesandlimitations