Beyond N-Grams: Enhancing String Kernels With Transformer-Guided Semantic Insights

The rapid advancements in large language models (LLMs) have led to the generation of sophisticated AI-produced texts, posing significant challenges in distinguishing machine-generated content from authentic human writing. This study presents a novel hybrid framework that effectively integrates strin...

Full description

Saved in:
Bibliographic Details
Main Authors: Nazar Zaki, Reem Alderei, Mahra Alketbi, Alia Alkaabi, Fatima Alneyadi, Nadeen Zaki
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11021607/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849722658231418880
author Nazar Zaki
Reem Alderei
Mahra Alketbi
Alia Alkaabi
Fatima Alneyadi
Nadeen Zaki
author_facet Nazar Zaki
Reem Alderei
Mahra Alketbi
Alia Alkaabi
Fatima Alneyadi
Nadeen Zaki
author_sort Nazar Zaki
collection DOAJ
description The rapid advancements in large language models (LLMs) have led to the generation of sophisticated AI-produced texts, posing significant challenges in distinguishing machine-generated content from authentic human writing. This study presents a novel hybrid framework that effectively integrates string kernel approaches with deep contextual embeddings from state-of-the-art transformers for robust AI-generated text detection. We propose and evaluate four innovative kernel-based methods namely Attention-Augmented Kernel, Error Pattern Analysis, Transformer-Guided N-gram Selection, and a Custom Kernel Function each designed to uniquely capture semantic and structural distinctions of text. Extensive experiments conducted on eight diverse datasets comprising 2,501 total samples, featuring texts generated and enhanced by leading LLMs including GPT-3.5, GPT-4, DeepSeek, and KIMI, demonstrate superior performance of the proposed methods. Particularly, the Transformer-Guided N-gram Selection and the Custom Kernel Function consistently outperform baseline models, achieving near-perfect detection accuracy with significantly reduced computational complexity. Comprehensive hyperparameter optimization further solidifies our methods’ effectiveness and practical applicability. The publicly available datasets and robust empirical evaluations contribute valuable benchmarks for future research. This work sets a new standard in AI-text detection methodologies, enhancing reliability, efficiency, and scalability for real-world applications.
format Article
id doaj-art-e9c9e6dbe3184ac2a5e810e93387fbcc
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-e9c9e6dbe3184ac2a5e810e93387fbcc2025-08-20T03:11:17ZengIEEEIEEE Access2169-35362025-01-0113977799779310.1109/ACCESS.2025.357607611021607Beyond N-Grams: Enhancing String Kernels With Transformer-Guided Semantic InsightsNazar Zaki0https://orcid.org/0000-0002-6259-9843Reem Alderei1Mahra Alketbi2Alia Alkaabi3Fatima Alneyadi4https://orcid.org/0009-0000-1944-198XNadeen Zaki5Department of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, Al Ain, United Arab EmiratesDepartment of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, Al Ain, United Arab EmiratesDepartment of Information Systems and Security, College of Information Technology, United Arab Emirates University, Al Ain, United Arab EmiratesDepartment of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, Al Ain, United Arab EmiratesDepartment of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, Al Ain, United Arab EmiratesDepartment of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, Al Ain, United Arab EmiratesThe rapid advancements in large language models (LLMs) have led to the generation of sophisticated AI-produced texts, posing significant challenges in distinguishing machine-generated content from authentic human writing. This study presents a novel hybrid framework that effectively integrates string kernel approaches with deep contextual embeddings from state-of-the-art transformers for robust AI-generated text detection. We propose and evaluate four innovative kernel-based methods namely Attention-Augmented Kernel, Error Pattern Analysis, Transformer-Guided N-gram Selection, and a Custom Kernel Function each designed to uniquely capture semantic and structural distinctions of text. Extensive experiments conducted on eight diverse datasets comprising 2,501 total samples, featuring texts generated and enhanced by leading LLMs including GPT-3.5, GPT-4, DeepSeek, and KIMI, demonstrate superior performance of the proposed methods. Particularly, the Transformer-Guided N-gram Selection and the Custom Kernel Function consistently outperform baseline models, achieving near-perfect detection accuracy with significantly reduced computational complexity. Comprehensive hyperparameter optimization further solidifies our methods’ effectiveness and practical applicability. The publicly available datasets and robust empirical evaluations contribute valuable benchmarks for future research. This work sets a new standard in AI-text detection methodologies, enhancing reliability, efficiency, and scalability for real-world applications.https://ieeexplore.ieee.org/document/11021607/String kernelstransformer embeddingstext classificationsemantic analysishybrid models
spellingShingle Nazar Zaki
Reem Alderei
Mahra Alketbi
Alia Alkaabi
Fatima Alneyadi
Nadeen Zaki
Beyond N-Grams: Enhancing String Kernels With Transformer-Guided Semantic Insights
IEEE Access
String kernels
transformer embeddings
text classification
semantic analysis
hybrid models
title Beyond N-Grams: Enhancing String Kernels With Transformer-Guided Semantic Insights
title_full Beyond N-Grams: Enhancing String Kernels With Transformer-Guided Semantic Insights
title_fullStr Beyond N-Grams: Enhancing String Kernels With Transformer-Guided Semantic Insights
title_full_unstemmed Beyond N-Grams: Enhancing String Kernels With Transformer-Guided Semantic Insights
title_short Beyond N-Grams: Enhancing String Kernels With Transformer-Guided Semantic Insights
title_sort beyond n grams enhancing string kernels with transformer guided semantic insights
topic String kernels
transformer embeddings
text classification
semantic analysis
hybrid models
url https://ieeexplore.ieee.org/document/11021607/
work_keys_str_mv AT nazarzaki beyondngramsenhancingstringkernelswithtransformerguidedsemanticinsights
AT reemalderei beyondngramsenhancingstringkernelswithtransformerguidedsemanticinsights
AT mahraalketbi beyondngramsenhancingstringkernelswithtransformerguidedsemanticinsights
AT aliaalkaabi beyondngramsenhancingstringkernelswithtransformerguidedsemanticinsights
AT fatimaalneyadi beyondngramsenhancingstringkernelswithtransformerguidedsemanticinsights
AT nadeenzaki beyondngramsenhancingstringkernelswithtransformerguidedsemanticinsights