Integrated Technique of Natural Language Texts and Source Codes Authorship Verification in the Academic Environment
The issue of text plagiarism in academic and educational environments is becoming increasingly relevant every year. The quality of research articles and works is declining due to students copying fragments of others’ works and using modern generative models for text and source code creati...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11059894/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849424206349991936 |
|---|---|
| author | Aleksandr Romanov Anna Kurtukova Anastasiia Fedotova Alexander Shelupanov |
| author_facet | Aleksandr Romanov Anna Kurtukova Anastasiia Fedotova Alexander Shelupanov |
| author_sort | Aleksandr Romanov |
| collection | DOAJ |
| description | The issue of text plagiarism in academic and educational environments is becoming increasingly relevant every year. The quality of research articles and works is declining due to students copying fragments of others’ works and using modern generative models for text and source code creation. The article proposes an integrated technique for authorship verification of both natural and programming language texts, based on a combination of statistical methods, machine learning, and deep neural networks. The presented technique addresses several related tasks: assessing text homogeneity, detecting plagiarism when solving closed-set authorship attribution problems, and identifying texts and fragments created by generative models. Experimental data include a multi-domain dataset of natural language texts consisting of research articles on natural and technical sciences, PhD dissertations, and artificially generated samples on related topics. To evaluate the effectiveness of the technique in relation to programming language texts, a multilingual program dataset was used, consisting of source codes for programs of technical students as well as artificially generated program codes. The experimental results demonstrate the effectiveness of the proposed technique for plagiarism detection and copyright protection in the educational process. The accuracy of identifying heterogeneous fragments in text or code is 93-94%, authorship attribution ac-curacy is 89-99% depending on the number of co-authors, and verification accuracy is 97.5-99.4%. |
| format | Article |
| id | doaj-art-871af5bf87394f69a227fa1aec9ed200 |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-871af5bf87394f69a227fa1aec9ed2002025-08-20T03:30:19ZengIEEEIEEE Access2169-35362025-01-011311327411329010.1109/ACCESS.2025.358461611059894Integrated Technique of Natural Language Texts and Source Codes Authorship Verification in the Academic EnvironmentAleksandr Romanov0https://orcid.org/0000-0002-2587-2222Anna Kurtukova1https://orcid.org/0000-0001-5619-1836Anastasiia Fedotova2https://orcid.org/0000-0001-7844-4363Alexander Shelupanov3Department of Security, Tomsk State University of Control Systems and Radioelectronics, Tomsk, RussiaDepartment of Security, Tomsk State University of Control Systems and Radioelectronics, Tomsk, RussiaDepartment of Security, Tomsk State University of Control Systems and Radioelectronics, Tomsk, RussiaDepartment of Security, Tomsk State University of Control Systems and Radioelectronics, Tomsk, RussiaThe issue of text plagiarism in academic and educational environments is becoming increasingly relevant every year. The quality of research articles and works is declining due to students copying fragments of others’ works and using modern generative models for text and source code creation. The article proposes an integrated technique for authorship verification of both natural and programming language texts, based on a combination of statistical methods, machine learning, and deep neural networks. The presented technique addresses several related tasks: assessing text homogeneity, detecting plagiarism when solving closed-set authorship attribution problems, and identifying texts and fragments created by generative models. Experimental data include a multi-domain dataset of natural language texts consisting of research articles on natural and technical sciences, PhD dissertations, and artificially generated samples on related topics. To evaluate the effectiveness of the technique in relation to programming language texts, a multilingual program dataset was used, consisting of source codes for programs of technical students as well as artificially generated program codes. The experimental results demonstrate the effectiveness of the proposed technique for plagiarism detection and copyright protection in the educational process. The accuracy of identifying heterogeneous fragments in text or code is 93-94%, authorship attribution ac-curacy is 89-99% depending on the number of co-authors, and verification accuracy is 97.5-99.4%.https://ieeexplore.ieee.org/document/11059894/Plagiarism detectioneducational process estimationartificial generationsource codeauthorship attributionauthorship verification |
| spellingShingle | Aleksandr Romanov Anna Kurtukova Anastasiia Fedotova Alexander Shelupanov Integrated Technique of Natural Language Texts and Source Codes Authorship Verification in the Academic Environment IEEE Access Plagiarism detection educational process estimation artificial generation source code authorship attribution authorship verification |
| title | Integrated Technique of Natural Language Texts and Source Codes Authorship Verification in the Academic Environment |
| title_full | Integrated Technique of Natural Language Texts and Source Codes Authorship Verification in the Academic Environment |
| title_fullStr | Integrated Technique of Natural Language Texts and Source Codes Authorship Verification in the Academic Environment |
| title_full_unstemmed | Integrated Technique of Natural Language Texts and Source Codes Authorship Verification in the Academic Environment |
| title_short | Integrated Technique of Natural Language Texts and Source Codes Authorship Verification in the Academic Environment |
| title_sort | integrated technique of natural language texts and source codes authorship verification in the academic environment |
| topic | Plagiarism detection educational process estimation artificial generation source code authorship attribution authorship verification |
| url | https://ieeexplore.ieee.org/document/11059894/ |
| work_keys_str_mv | AT aleksandrromanov integratedtechniqueofnaturallanguagetextsandsourcecodesauthorshipverificationintheacademicenvironment AT annakurtukova integratedtechniqueofnaturallanguagetextsandsourcecodesauthorshipverificationintheacademicenvironment AT anastasiiafedotova integratedtechniqueofnaturallanguagetextsandsourcecodesauthorshipverificationintheacademicenvironment AT alexandershelupanov integratedtechniqueofnaturallanguagetextsandsourcecodesauthorshipverificationintheacademicenvironment |