Beyond N-Grams: Enhancing String Kernels With Transformer-Guided Semantic Insights
The rapid advancements in large language models (LLMs) have led to the generation of sophisticated AI-produced texts, posing significant challenges in distinguishing machine-generated content from authentic human writing. This study presents a novel hybrid framework that effectively integrates strin...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11021607/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849722658231418880 |
|---|---|
| author | Nazar Zaki Reem Alderei Mahra Alketbi Alia Alkaabi Fatima Alneyadi Nadeen Zaki |
| author_facet | Nazar Zaki Reem Alderei Mahra Alketbi Alia Alkaabi Fatima Alneyadi Nadeen Zaki |
| author_sort | Nazar Zaki |
| collection | DOAJ |
| description | The rapid advancements in large language models (LLMs) have led to the generation of sophisticated AI-produced texts, posing significant challenges in distinguishing machine-generated content from authentic human writing. This study presents a novel hybrid framework that effectively integrates string kernel approaches with deep contextual embeddings from state-of-the-art transformers for robust AI-generated text detection. We propose and evaluate four innovative kernel-based methods namely Attention-Augmented Kernel, Error Pattern Analysis, Transformer-Guided N-gram Selection, and a Custom Kernel Function each designed to uniquely capture semantic and structural distinctions of text. Extensive experiments conducted on eight diverse datasets comprising 2,501 total samples, featuring texts generated and enhanced by leading LLMs including GPT-3.5, GPT-4, DeepSeek, and KIMI, demonstrate superior performance of the proposed methods. Particularly, the Transformer-Guided N-gram Selection and the Custom Kernel Function consistently outperform baseline models, achieving near-perfect detection accuracy with significantly reduced computational complexity. Comprehensive hyperparameter optimization further solidifies our methods’ effectiveness and practical applicability. The publicly available datasets and robust empirical evaluations contribute valuable benchmarks for future research. This work sets a new standard in AI-text detection methodologies, enhancing reliability, efficiency, and scalability for real-world applications. |
| format | Article |
| id | doaj-art-e9c9e6dbe3184ac2a5e810e93387fbcc |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-e9c9e6dbe3184ac2a5e810e93387fbcc2025-08-20T03:11:17ZengIEEEIEEE Access2169-35362025-01-0113977799779310.1109/ACCESS.2025.357607611021607Beyond N-Grams: Enhancing String Kernels With Transformer-Guided Semantic InsightsNazar Zaki0https://orcid.org/0000-0002-6259-9843Reem Alderei1Mahra Alketbi2Alia Alkaabi3Fatima Alneyadi4https://orcid.org/0009-0000-1944-198XNadeen Zaki5Department of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, Al Ain, United Arab EmiratesDepartment of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, Al Ain, United Arab EmiratesDepartment of Information Systems and Security, College of Information Technology, United Arab Emirates University, Al Ain, United Arab EmiratesDepartment of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, Al Ain, United Arab EmiratesDepartment of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, Al Ain, United Arab EmiratesDepartment of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, Al Ain, United Arab EmiratesThe rapid advancements in large language models (LLMs) have led to the generation of sophisticated AI-produced texts, posing significant challenges in distinguishing machine-generated content from authentic human writing. This study presents a novel hybrid framework that effectively integrates string kernel approaches with deep contextual embeddings from state-of-the-art transformers for robust AI-generated text detection. We propose and evaluate four innovative kernel-based methods namely Attention-Augmented Kernel, Error Pattern Analysis, Transformer-Guided N-gram Selection, and a Custom Kernel Function each designed to uniquely capture semantic and structural distinctions of text. Extensive experiments conducted on eight diverse datasets comprising 2,501 total samples, featuring texts generated and enhanced by leading LLMs including GPT-3.5, GPT-4, DeepSeek, and KIMI, demonstrate superior performance of the proposed methods. Particularly, the Transformer-Guided N-gram Selection and the Custom Kernel Function consistently outperform baseline models, achieving near-perfect detection accuracy with significantly reduced computational complexity. Comprehensive hyperparameter optimization further solidifies our methods’ effectiveness and practical applicability. The publicly available datasets and robust empirical evaluations contribute valuable benchmarks for future research. This work sets a new standard in AI-text detection methodologies, enhancing reliability, efficiency, and scalability for real-world applications.https://ieeexplore.ieee.org/document/11021607/String kernelstransformer embeddingstext classificationsemantic analysishybrid models |
| spellingShingle | Nazar Zaki Reem Alderei Mahra Alketbi Alia Alkaabi Fatima Alneyadi Nadeen Zaki Beyond N-Grams: Enhancing String Kernels With Transformer-Guided Semantic Insights IEEE Access String kernels transformer embeddings text classification semantic analysis hybrid models |
| title | Beyond N-Grams: Enhancing String Kernels With Transformer-Guided Semantic Insights |
| title_full | Beyond N-Grams: Enhancing String Kernels With Transformer-Guided Semantic Insights |
| title_fullStr | Beyond N-Grams: Enhancing String Kernels With Transformer-Guided Semantic Insights |
| title_full_unstemmed | Beyond N-Grams: Enhancing String Kernels With Transformer-Guided Semantic Insights |
| title_short | Beyond N-Grams: Enhancing String Kernels With Transformer-Guided Semantic Insights |
| title_sort | beyond n grams enhancing string kernels with transformer guided semantic insights |
| topic | String kernels transformer embeddings text classification semantic analysis hybrid models |
| url | https://ieeexplore.ieee.org/document/11021607/ |
| work_keys_str_mv | AT nazarzaki beyondngramsenhancingstringkernelswithtransformerguidedsemanticinsights AT reemalderei beyondngramsenhancingstringkernelswithtransformerguidedsemanticinsights AT mahraalketbi beyondngramsenhancingstringkernelswithtransformerguidedsemanticinsights AT aliaalkaabi beyondngramsenhancingstringkernelswithtransformerguidedsemanticinsights AT fatimaalneyadi beyondngramsenhancingstringkernelswithtransformerguidedsemanticinsights AT nadeenzaki beyondngramsenhancingstringkernelswithtransformerguidedsemanticinsights |