The influence of Gen-AI tools application for text data augmentation: case of Lithuanian educational context data classification
Abstract Today, Gen-AI tools are used for various purposes, ranging from everyday tasks, such as summarizing texts, to high-level solutions tailored to a company’s needs. Trustable and high-quality datasets are the most important component in building the models for all artificial intelligence-based...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-11877-z |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Today, Gen-AI tools are used for various purposes, ranging from everyday tasks, such as summarizing texts, to high-level solutions tailored to a company’s needs. Trustable and high-quality datasets are the most important component in building the models for all artificial intelligence-based solutions. In some specific areas, creating a large dataset manually can be challenging, so various techniques can be used to expand existing datasets. Therefore, in this research, the Gen-AI tools were used to augment the educational context text dataset that can be used to detect students who used generators to answer open-ended questions. An experimental investigation has been performed to evaluate the effectiveness of three Gen-AI tools in augmenting the existing dataset: OpenAI ChatGPT, Google Gemini, and Microsoft Copilot. During the augmentation process, the number of texts increased from 1079 to 7982. To find the efficiency of each Gen-AI tool or their combinations, the dataset has been divided into various subsets. All subsets were used to train several machine-learning algorithms. Additionally, the text has been processed into numerical data using two methods: bag-of-words and sBERT. A total of 15,296 models have been trained, tested, and evaluated. The results of the research have shown that text augmentation using Gen-AI tools increased the models’ accuracy. |
|---|---|
| ISSN: | 2045-2322 |