COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS

Generative artificial intelligence (GenAI) tools offer innovative capabilities for addressing a wide array of tasks involving extensive datasets, both textual and non-textual. These tools have shown remarkable potential in the field of education, where their functionalities are increasingly leverage...

Full description

Saved in:

Bibliographic Details
Main Authors:	Tomáš Kincl, Daria Gunina, Michal Novák, Jan Pospíšil
Format:	Article
Language:	ces
Published:	Vydavatelství ZČU v Plzni 2024-12-01
Series:	Trendy v podnikání
Subjects:	automated essay evaluation generative ai chatgpt tertiary education
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1825208903424540672
author	Tomáš Kincl Daria Gunina Michal Novák Jan Pospíšil
author_facet	Tomáš Kincl Daria Gunina Michal Novák Jan Pospíšil
author_sort	Tomáš Kincl
collection	DOAJ
description	Generative artificial intelligence (GenAI) tools offer innovative capabilities for addressing a wide array of tasks involving extensive datasets, both textual and non-textual. These tools have shown remarkable potential in the field of education, where their functionalities are increasingly leveraged not only by students but also by educators. This study investigates the extent to which human evaluator assessments align with automated evaluations conducted by large language models, with a focus on a) the complexity of the evaluated texts (academic essays that encompass literature reviews, critical assessments of sources, and reflective insights within the context of societal or economic practices) and b) the unique challenges posed by the Czech language, in which the evaluated works are submitted. The research adopts a quantitative (cross-sectional) approach, analysing 30 essays submitted as an assignment for a foundational theoretical course at the master's level. These essays were evaluated by a human evaluator and subsequently by virtual assistants utilizing large language models, specifically ChatGPT (paid version 4.0) and Claude (paid version Sonet 3.5). Statistical analysis revealed that there is a significant statistical difference between human evaluator and both automated systems. Moreover, the evaluations were not consistent when distinguishing between good and less good essays. We also discussed challenges and limitations of using GenAI tools for evaluating submitted text assignments in the context of tertiary education.
format	Article
id	doaj-art-f0eab9137c5541c3b3bf6818798da2a5
institution	Kabale University
issn	2788-0079
language	ces
publishDate	2024-12-01
publisher	Vydavatelství ZČU v Plzni
record_format	Article
series	Trendy v podnikání
spelling	doaj-art-f0eab9137c5541c3b3bf6818798da2a52025-02-06T17:42:43ZcesVydavatelství ZČU v PlzniTrendy v podnikání2788-00792024-12-011422534https://doi.org/10.24132/jbt.2024.14.2.25_34COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONSTomáš Kinclhttps://orcid.org/0000-0002-9738-3348Daria Guninahttps://orcid.org/0000-0002-4149-4962Michal Novákhttps://orcid.org/0000-0001-7893-7774Jan Pospíšilhttps://orcid.org/0000-0003-2054-311XGenerative artificial intelligence (GenAI) tools offer innovative capabilities for addressing a wide array of tasks involving extensive datasets, both textual and non-textual. These tools have shown remarkable potential in the field of education, where their functionalities are increasingly leveraged not only by students but also by educators. This study investigates the extent to which human evaluator assessments align with automated evaluations conducted by large language models, with a focus on a) the complexity of the evaluated texts (academic essays that encompass literature reviews, critical assessments of sources, and reflective insights within the context of societal or economic practices) and b) the unique challenges posed by the Czech language, in which the evaluated works are submitted. The research adopts a quantitative (cross-sectional) approach, analysing 30 essays submitted as an assignment for a foundational theoretical course at the master's level. These essays were evaluated by a human evaluator and subsequently by virtual assistants utilizing large language models, specifically ChatGPT (paid version 4.0) and Claude (paid version Sonet 3.5). Statistical analysis revealed that there is a significant statistical difference between human evaluator and both automated systems. Moreover, the evaluations were not consistent when distinguishing between good and less good essays. We also discussed challenges and limitations of using GenAI tools for evaluating submitted text assignments in the context of tertiary education.automated essay evaluationgenerative aichatgpttertiary education
spellingShingle	Tomáš Kincl Daria Gunina Michal Novák Jan Pospíšil COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS Trendy v podnikání automated essay evaluation generative ai chatgpt tertiary education
title	COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS
title_full	COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS
title_fullStr	COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS
title_full_unstemmed	COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS
title_short	COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS
title_sort	comparing human and ai based essay evaluation in the czech higher education challenges and limitations
topic	automated essay evaluation generative ai chatgpt tertiary education
work_keys_str_mv	AT tomaskincl comparinghumanandaibasedessayevaluationintheczechhighereducationchallengesandlimitations AT dariagunina comparinghumanandaibasedessayevaluationintheczechhighereducationchallengesandlimitations AT michalnovak comparinghumanandaibasedessayevaluationintheczechhighereducationchallengesandlimitations AT janpospisil comparinghumanandaibasedessayevaluationintheczechhighereducationchallengesandlimitations

COMPARING HUMAN AND AI-BASED ESSAY EVALUATION IN THE CZECH HIGHER EDUCATION: CHALLENGES AND LIMITATIONS

Similar Items