Integrated Technique of Natural Language Texts and Source Codes Authorship Verification in the Academic Environment

The issue of text plagiarism in academic and educational environments is becoming increasingly relevant every year. The quality of research articles and works is declining due to students copying fragments of others’ works and using modern generative models for text and source code creati...

Full description

Saved in:
Bibliographic Details
Main Authors: Aleksandr Romanov, Anna Kurtukova, Anastasiia Fedotova, Alexander Shelupanov
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11059894/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849424206349991936
author Aleksandr Romanov
Anna Kurtukova
Anastasiia Fedotova
Alexander Shelupanov
author_facet Aleksandr Romanov
Anna Kurtukova
Anastasiia Fedotova
Alexander Shelupanov
author_sort Aleksandr Romanov
collection DOAJ
description The issue of text plagiarism in academic and educational environments is becoming increasingly relevant every year. The quality of research articles and works is declining due to students copying fragments of others’ works and using modern generative models for text and source code creation. The article proposes an integrated technique for authorship verification of both natural and programming language texts, based on a combination of statistical methods, machine learning, and deep neural networks. The presented technique addresses several related tasks: assessing text homogeneity, detecting plagiarism when solving closed-set authorship attribution problems, and identifying texts and fragments created by generative models. Experimental data include a multi-domain dataset of natural language texts consisting of research articles on natural and technical sciences, PhD dissertations, and artificially generated samples on related topics. To evaluate the effectiveness of the technique in relation to programming language texts, a multilingual program dataset was used, consisting of source codes for programs of technical students as well as artificially generated program codes. The experimental results demonstrate the effectiveness of the proposed technique for plagiarism detection and copyright protection in the educational process. The accuracy of identifying heterogeneous fragments in text or code is 93-94%, authorship attribution ac-curacy is 89-99% depending on the number of co-authors, and verification accuracy is 97.5-99.4%.
format Article
id doaj-art-871af5bf87394f69a227fa1aec9ed200
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-871af5bf87394f69a227fa1aec9ed2002025-08-20T03:30:19ZengIEEEIEEE Access2169-35362025-01-011311327411329010.1109/ACCESS.2025.358461611059894Integrated Technique of Natural Language Texts and Source Codes Authorship Verification in the Academic EnvironmentAleksandr Romanov0https://orcid.org/0000-0002-2587-2222Anna Kurtukova1https://orcid.org/0000-0001-5619-1836Anastasiia Fedotova2https://orcid.org/0000-0001-7844-4363Alexander Shelupanov3Department of Security, Tomsk State University of Control Systems and Radioelectronics, Tomsk, RussiaDepartment of Security, Tomsk State University of Control Systems and Radioelectronics, Tomsk, RussiaDepartment of Security, Tomsk State University of Control Systems and Radioelectronics, Tomsk, RussiaDepartment of Security, Tomsk State University of Control Systems and Radioelectronics, Tomsk, RussiaThe issue of text plagiarism in academic and educational environments is becoming increasingly relevant every year. The quality of research articles and works is declining due to students copying fragments of others’ works and using modern generative models for text and source code creation. The article proposes an integrated technique for authorship verification of both natural and programming language texts, based on a combination of statistical methods, machine learning, and deep neural networks. The presented technique addresses several related tasks: assessing text homogeneity, detecting plagiarism when solving closed-set authorship attribution problems, and identifying texts and fragments created by generative models. Experimental data include a multi-domain dataset of natural language texts consisting of research articles on natural and technical sciences, PhD dissertations, and artificially generated samples on related topics. To evaluate the effectiveness of the technique in relation to programming language texts, a multilingual program dataset was used, consisting of source codes for programs of technical students as well as artificially generated program codes. The experimental results demonstrate the effectiveness of the proposed technique for plagiarism detection and copyright protection in the educational process. The accuracy of identifying heterogeneous fragments in text or code is 93-94%, authorship attribution ac-curacy is 89-99% depending on the number of co-authors, and verification accuracy is 97.5-99.4%.https://ieeexplore.ieee.org/document/11059894/Plagiarism detectioneducational process estimationartificial generationsource codeauthorship attributionauthorship verification
spellingShingle Aleksandr Romanov
Anna Kurtukova
Anastasiia Fedotova
Alexander Shelupanov
Integrated Technique of Natural Language Texts and Source Codes Authorship Verification in the Academic Environment
IEEE Access
Plagiarism detection
educational process estimation
artificial generation
source code
authorship attribution
authorship verification
title Integrated Technique of Natural Language Texts and Source Codes Authorship Verification in the Academic Environment
title_full Integrated Technique of Natural Language Texts and Source Codes Authorship Verification in the Academic Environment
title_fullStr Integrated Technique of Natural Language Texts and Source Codes Authorship Verification in the Academic Environment
title_full_unstemmed Integrated Technique of Natural Language Texts and Source Codes Authorship Verification in the Academic Environment
title_short Integrated Technique of Natural Language Texts and Source Codes Authorship Verification in the Academic Environment
title_sort integrated technique of natural language texts and source codes authorship verification in the academic environment
topic Plagiarism detection
educational process estimation
artificial generation
source code
authorship attribution
authorship verification
url https://ieeexplore.ieee.org/document/11059894/
work_keys_str_mv AT aleksandrromanov integratedtechniqueofnaturallanguagetextsandsourcecodesauthorshipverificationintheacademicenvironment
AT annakurtukova integratedtechniqueofnaturallanguagetextsandsourcecodesauthorshipverificationintheacademicenvironment
AT anastasiiafedotova integratedtechniqueofnaturallanguagetextsandsourcecodesauthorshipverificationintheacademicenvironment
AT alexandershelupanov integratedtechniqueofnaturallanguagetextsandsourcecodesauthorshipverificationintheacademicenvironment