“Scarlet Cloak and the Forest Adventure”: a preliminary study of the impact of AI on commonly used writing tools

Abstract This paper explores the growing complexity of detecting and differentiating generative AI from other AI interventions. Initially prompted by noticing how tools like Grammarly were being flagged by AI detection software, it examines how these popular tools such as Grammarly, EditPad, Writefu...

Full description

Saved in:
Bibliographic Details
Main Authors: Barbara Bordalejo, Davide Pafumi, Frank Onuh, A. K. M. Iftekhar Khalid, Morgan Slayde Pearce, Daniel Paul O’Donnell
Format: Article
Language:English
Published: SpringerOpen 2025-02-01
Series:International Journal of Educational Technology in Higher Education
Online Access:https://doi.org/10.1186/s41239-025-00505-5
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract This paper explores the growing complexity of detecting and differentiating generative AI from other AI interventions. Initially prompted by noticing how tools like Grammarly were being flagged by AI detection software, it examines how these popular tools such as Grammarly, EditPad, Writefull, and AI models such as ChatGPT and Microsoft Bing Copilot affect human-generated texts and how accurately current AI-detection systems, including Turnitin and GPTZero, can assess texts for use of these tools. The results highlight that widely used writing aids, even those not primarily generative, can trigger false positives in AI detection tools. In order to provide a dataset, the authors applied different AI-enhanced tools to a number of texts of different styles that were written prior to the development of consumer AI tools, and evaluated their impact through key metrics such as readability, perplexity, and burstiness. The findings reveal that tools like Grammarly that subtly enhance readability also trigger detection and increase false positives, especially for non-native speakers. In general, paraphrasing tools score low values in AI detection software, allowing the changes to go mostly unnoticed by the software. However, the use of Microsoft Bing Copilot and Writefull on our selected texts were able to eschew AI detection fairly consistently. To exacerbate this problem, traditional AI detectors like Turnitin and GPTZero struggle to reliably differentiate between legitimate paraphrasing and AI generation, undermining their utility for enforcing academic integrity. The study concludes by urging educators to focus on managing interactions with AI in academic settings rather than outright banning its use. It calls for the creation of policies and guidelines that acknowledge the evolving role of AI in writing, emphasizing the need to interpret detection scores cautiously to avoid penalizing students unfairly. In addition, encouraging openness on how AI is used in writing could alleviate concerns in the research and writing process for both students and academics. The paper recommends a shift toward teaching responsible AI usage rather than pursuing rigid bans or relying on detection metrics that may not accurately capture misconduct.
ISSN:2365-9440