Detecting Source Code Plagiarism in Student Assignment Submissions Using Clustering Techniques
In pragmatic courses, graduate students are required to submit programming assignments, which have been susceptible to various forms of plagiarism. Detecting counterfeited code in an academic setting is of paramount importance, given the prevalence of publications and papers. Plagiarism, defined as...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
middle technical university
2024-06-01
|
Series: | Journal of Techniques |
Subjects: | |
Online Access: | https://journal.mtu.edu.iq/index.php/MTU/article/view/1851 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832595208673427456 |
---|---|
author | Raddam Sami Mehsen Majharoddin M. Kazi Hiren Joshi |
author_facet | Raddam Sami Mehsen Majharoddin M. Kazi Hiren Joshi |
author_sort | Raddam Sami Mehsen |
collection | DOAJ |
description |
In pragmatic courses, graduate students are required to submit programming assignments, which have been susceptible to various forms of plagiarism. Detecting counterfeited code in an academic setting is of paramount importance, given the prevalence of publications and papers. Plagiarism, defined as the unauthorized replication of written work without proper acknowledgment, has become a critical concern with the advent of information and communication technology (ICT) and the widespread availability of scholarly publications online. However, the extensive use of freeware text editors has posed challenges in detecting source code plagiarism. Numerous studies have investigated algorithms for revealing different types of plagiarism and detecting source code plagiarism. In this research, we propose an innovative strategy that combines TF-IDF (Term Frequency-Inverse Document Frequency) modifications with K-means clustering, achieving a remarkable precision rate of 99.2%. Additionally, we explore the hierarchical clustering method, which estimates an even higher precision rate of 99.5% compared to previous techniques. To implement our approach, we utilize the Python programming language along with relevant libraries, providing a robust and efficient system for source code plagiarism detection in student assignment submissions.
|
format | Article |
id | doaj-art-ae610ac839c04acbb1afa6279680872e |
institution | Kabale University |
issn | 1818-653X 2708-8383 |
language | English |
publishDate | 2024-06-01 |
publisher | middle technical university |
record_format | Article |
series | Journal of Techniques |
spelling | doaj-art-ae610ac839c04acbb1afa6279680872e2025-01-19T10:58:54Zengmiddle technical universityJournal of Techniques1818-653X2708-83832024-06-016210.51173/jt.v6i2.1851Detecting Source Code Plagiarism in Student Assignment Submissions Using Clustering TechniquesRaddam Sami Mehsen0Majharoddin M. Kazi1Hiren Joshi2Department of Computer Science, Gujarat University, Ahmedabad, Gujarat, IndiaBill Gates College of Computer Science & Management, Osmanabad, Maharashtra, IndiaDepartment of Computer Science, Gujarat University, Ahmedabad, Gujarat, India In pragmatic courses, graduate students are required to submit programming assignments, which have been susceptible to various forms of plagiarism. Detecting counterfeited code in an academic setting is of paramount importance, given the prevalence of publications and papers. Plagiarism, defined as the unauthorized replication of written work without proper acknowledgment, has become a critical concern with the advent of information and communication technology (ICT) and the widespread availability of scholarly publications online. However, the extensive use of freeware text editors has posed challenges in detecting source code plagiarism. Numerous studies have investigated algorithms for revealing different types of plagiarism and detecting source code plagiarism. In this research, we propose an innovative strategy that combines TF-IDF (Term Frequency-Inverse Document Frequency) modifications with K-means clustering, achieving a remarkable precision rate of 99.2%. Additionally, we explore the hierarchical clustering method, which estimates an even higher precision rate of 99.5% compared to previous techniques. To implement our approach, we utilize the Python programming language along with relevant libraries, providing a robust and efficient system for source code plagiarism detection in student assignment submissions. https://journal.mtu.edu.iq/index.php/MTU/article/view/1851Source CodeC++ Programming LanguagePythonPlagiarismMachine Learning |
spellingShingle | Raddam Sami Mehsen Majharoddin M. Kazi Hiren Joshi Detecting Source Code Plagiarism in Student Assignment Submissions Using Clustering Techniques Journal of Techniques Source Code C++ Programming Language Python Plagiarism Machine Learning |
title | Detecting Source Code Plagiarism in Student Assignment Submissions Using Clustering Techniques |
title_full | Detecting Source Code Plagiarism in Student Assignment Submissions Using Clustering Techniques |
title_fullStr | Detecting Source Code Plagiarism in Student Assignment Submissions Using Clustering Techniques |
title_full_unstemmed | Detecting Source Code Plagiarism in Student Assignment Submissions Using Clustering Techniques |
title_short | Detecting Source Code Plagiarism in Student Assignment Submissions Using Clustering Techniques |
title_sort | detecting source code plagiarism in student assignment submissions using clustering techniques |
topic | Source Code C++ Programming Language Python Plagiarism Machine Learning |
url | https://journal.mtu.edu.iq/index.php/MTU/article/view/1851 |
work_keys_str_mv | AT raddamsamimehsen detectingsourcecodeplagiarisminstudentassignmentsubmissionsusingclusteringtechniques AT majharoddinmkazi detectingsourcecodeplagiarisminstudentassignmentsubmissionsusingclusteringtechniques AT hirenjoshi detectingsourcecodeplagiarisminstudentassignmentsubmissionsusingclusteringtechniques |