Detecting Source Code Plagiarism in Student Assignment Submissions Using Clustering Techniques

In pragmatic courses, graduate students are required to submit programming assignments, which have been susceptible to various forms of plagiarism. Detecting counterfeited code in an academic setting is of paramount importance, given the prevalence of publications and papers. Plagiarism, defined as...

Full description

Saved in:
Bibliographic Details
Main Authors: Raddam Sami Mehsen, Majharoddin M. Kazi, Hiren Joshi
Format: Article
Language:English
Published: middle technical university 2024-06-01
Series:Journal of Techniques
Subjects:
Online Access:https://journal.mtu.edu.iq/index.php/MTU/article/view/1851
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832595208673427456
author Raddam Sami Mehsen
Majharoddin M. Kazi
Hiren Joshi
author_facet Raddam Sami Mehsen
Majharoddin M. Kazi
Hiren Joshi
author_sort Raddam Sami Mehsen
collection DOAJ
description In pragmatic courses, graduate students are required to submit programming assignments, which have been susceptible to various forms of plagiarism. Detecting counterfeited code in an academic setting is of paramount importance, given the prevalence of publications and papers. Plagiarism, defined as the unauthorized replication of written work without proper acknowledgment, has become a critical concern with the advent of information and communication technology (ICT) and the widespread availability of scholarly publications online. However, the extensive use of freeware text editors has posed challenges in detecting source code plagiarism. Numerous studies have investigated algorithms for revealing different types of plagiarism and detecting source code plagiarism. In this research, we propose an innovative strategy that combines TF-IDF (Term Frequency-Inverse Document Frequency) modifications with K-means clustering, achieving a remarkable precision rate of 99.2%. Additionally, we explore the hierarchical clustering method, which estimates an even higher precision rate of 99.5% compared to previous techniques. To implement our approach, we utilize the Python programming language along with relevant libraries, providing a robust and efficient system for source code plagiarism detection in student assignment submissions.
format Article
id doaj-art-ae610ac839c04acbb1afa6279680872e
institution Kabale University
issn 1818-653X
2708-8383
language English
publishDate 2024-06-01
publisher middle technical university
record_format Article
series Journal of Techniques
spelling doaj-art-ae610ac839c04acbb1afa6279680872e2025-01-19T10:58:54Zengmiddle technical universityJournal of Techniques1818-653X2708-83832024-06-016210.51173/jt.v6i2.1851Detecting Source Code Plagiarism in Student Assignment Submissions Using Clustering TechniquesRaddam Sami Mehsen0Majharoddin M. Kazi1Hiren Joshi2Department of Computer Science, Gujarat University, Ahmedabad, Gujarat, IndiaBill Gates College of Computer Science & Management, Osmanabad, Maharashtra, IndiaDepartment of Computer Science, Gujarat University, Ahmedabad, Gujarat, India In pragmatic courses, graduate students are required to submit programming assignments, which have been susceptible to various forms of plagiarism. Detecting counterfeited code in an academic setting is of paramount importance, given the prevalence of publications and papers. Plagiarism, defined as the unauthorized replication of written work without proper acknowledgment, has become a critical concern with the advent of information and communication technology (ICT) and the widespread availability of scholarly publications online. However, the extensive use of freeware text editors has posed challenges in detecting source code plagiarism. Numerous studies have investigated algorithms for revealing different types of plagiarism and detecting source code plagiarism. In this research, we propose an innovative strategy that combines TF-IDF (Term Frequency-Inverse Document Frequency) modifications with K-means clustering, achieving a remarkable precision rate of 99.2%. Additionally, we explore the hierarchical clustering method, which estimates an even higher precision rate of 99.5% compared to previous techniques. To implement our approach, we utilize the Python programming language along with relevant libraries, providing a robust and efficient system for source code plagiarism detection in student assignment submissions. https://journal.mtu.edu.iq/index.php/MTU/article/view/1851Source CodeC++ Programming LanguagePythonPlagiarismMachine Learning
spellingShingle Raddam Sami Mehsen
Majharoddin M. Kazi
Hiren Joshi
Detecting Source Code Plagiarism in Student Assignment Submissions Using Clustering Techniques
Journal of Techniques
Source Code
C++ Programming Language
Python
Plagiarism
Machine Learning
title Detecting Source Code Plagiarism in Student Assignment Submissions Using Clustering Techniques
title_full Detecting Source Code Plagiarism in Student Assignment Submissions Using Clustering Techniques
title_fullStr Detecting Source Code Plagiarism in Student Assignment Submissions Using Clustering Techniques
title_full_unstemmed Detecting Source Code Plagiarism in Student Assignment Submissions Using Clustering Techniques
title_short Detecting Source Code Plagiarism in Student Assignment Submissions Using Clustering Techniques
title_sort detecting source code plagiarism in student assignment submissions using clustering techniques
topic Source Code
C++ Programming Language
Python
Plagiarism
Machine Learning
url https://journal.mtu.edu.iq/index.php/MTU/article/view/1851
work_keys_str_mv AT raddamsamimehsen detectingsourcecodeplagiarisminstudentassignmentsubmissionsusingclusteringtechniques
AT majharoddinmkazi detectingsourcecodeplagiarisminstudentassignmentsubmissionsusingclusteringtechniques
AT hirenjoshi detectingsourcecodeplagiarisminstudentassignmentsubmissionsusingclusteringtechniques