The influence of Gen-AI tools application for text data augmentation: case of Lithuanian educational context data classification

Abstract Today, Gen-AI tools are used for various purposes, ranging from everyday tasks, such as summarizing texts, to high-level solutions tailored to a company’s needs. Trustable and high-quality datasets are the most important component in building the models for all artificial intelligence-based...

Full description

Saved in:
Bibliographic Details
Main Authors: Pavel Stefanovič, Urtė Radvilaitė, Birutė Pliuskuvienė, Simona Ramanauskaitė
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-11877-z
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Today, Gen-AI tools are used for various purposes, ranging from everyday tasks, such as summarizing texts, to high-level solutions tailored to a company’s needs. Trustable and high-quality datasets are the most important component in building the models for all artificial intelligence-based solutions. In some specific areas, creating a large dataset manually can be challenging, so various techniques can be used to expand existing datasets. Therefore, in this research, the Gen-AI tools were used to augment the educational context text dataset that can be used to detect students who used generators to answer open-ended questions. An experimental investigation has been performed to evaluate the effectiveness of three Gen-AI tools in augmenting the existing dataset: OpenAI ChatGPT, Google Gemini, and Microsoft Copilot. During the augmentation process, the number of texts increased from 1079 to 7982. To find the efficiency of each Gen-AI tool or their combinations, the dataset has been divided into various subsets. All subsets were used to train several machine-learning algorithms. Additionally, the text has been processed into numerical data using two methods: bag-of-words and sBERT. A total of 15,296 models have been trained, tested, and evaluated. The results of the research have shown that text augmentation using Gen-AI tools increased the models’ accuracy.
ISSN:2045-2322