TriPlaNet: Enhancing machine-paraphrasing plagiarism detection through triplet network and contrastive learning

Powerful large language models (LLMs) have generated and paraphrased texts that are difficult for humans to distinguish from human-authored texts, sparking concerns about their potential misuse. Previous studies on detecting LLM-paraphrased texts have either proposed ineffective solutions and/or fai...

Full description

Saved in:
Bibliographic Details
Main Authors: Deyu Meng, Ziheng Wang, Tshewang Phuntsho, Tad Gonsalves
Format: Article
Language:English
Published: Elsevier 2025-09-01
Series:Egyptian Informatics Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1110866525001458
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Powerful large language models (LLMs) have generated and paraphrased texts that are difficult for humans to distinguish from human-authored texts, sparking concerns about their potential misuse. Previous studies on detecting LLM-paraphrased texts have either proposed ineffective solutions and/or failed to consider academic texts. To address these challenges, we propose a novel plagiarism detection framework called Triplet Plagiarism Network (TriPlaNet). The proposed framework combines three distinct Style Representation Transformers for Authorship (SRTA), each with its own set of parameters, and a few-shot classifier. These three SRTA encoders operate independently during contrastive training to capture nuanced variations in writing style. Our approach reframes plagiarism detection as an authorship attribution problem. To diversify the dataset, we demonstrate fine-tuning of 11b parameters T5 XXL model with Low-Rank Adaptation using a large-scale (more than 200k) plagiarism dataset to construct a controlled plagiarizer whereby proposing a new additional dataset. TriPlaNet demonstrated superior performance over existing models when tested on two datasets. The F1 scores on the two datasets were 99.37% and 99.48%, respectively. TriPlaNet also demonstrates robust performance in plagiarism detection across cross-dataset evaluations. The F1 scores remained above 80.50% and 81.49% on both datasets.
ISSN:1110-8665