Integrated Technique of Natural Language Texts and Source Codes Authorship Verification in the Academic Environment
The issue of text plagiarism in academic and educational environments is becoming increasingly relevant every year. The quality of research articles and works is declining due to students copying fragments of others’ works and using modern generative models for text and source code creati...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11059894/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The issue of text plagiarism in academic and educational environments is becoming increasingly relevant every year. The quality of research articles and works is declining due to students copying fragments of others’ works and using modern generative models for text and source code creation. The article proposes an integrated technique for authorship verification of both natural and programming language texts, based on a combination of statistical methods, machine learning, and deep neural networks. The presented technique addresses several related tasks: assessing text homogeneity, detecting plagiarism when solving closed-set authorship attribution problems, and identifying texts and fragments created by generative models. Experimental data include a multi-domain dataset of natural language texts consisting of research articles on natural and technical sciences, PhD dissertations, and artificially generated samples on related topics. To evaluate the effectiveness of the technique in relation to programming language texts, a multilingual program dataset was used, consisting of source codes for programs of technical students as well as artificially generated program codes. The experimental results demonstrate the effectiveness of the proposed technique for plagiarism detection and copyright protection in the educational process. The accuracy of identifying heterogeneous fragments in text or code is 93-94%, authorship attribution ac-curacy is 89-99% depending on the number of co-authors, and verification accuracy is 97.5-99.4%. |
|---|---|
| ISSN: | 2169-3536 |