Admissions in the age of AI: detecting AI-generated application materials in higher education

Abstract Recent advances in Artificial Intelligence (AI), such as the development of large language models like ChatGPT, have blurred the boundaries between human and AI-generated text. This has led to a pressing need for tools that can determine whether text has been created or revised using AI. A...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yijun Zhao, Alexander Borelli, Fernando Martinez, Haoran Xue, Gary M. Weiss
Format:	Article
Language:	English
Published:	Nature Portfolio 2024-11-01
Series:	Scientific Reports
Online Access:	https://doi.org/10.1038/s41598-024-77847-z
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850179544991924224
author	Yijun Zhao Alexander Borelli Fernando Martinez Haoran Xue Gary M. Weiss
author_facet	Yijun Zhao Alexander Borelli Fernando Martinez Haoran Xue Gary M. Weiss
author_sort	Yijun Zhao
collection	DOAJ
description	Abstract Recent advances in Artificial Intelligence (AI), such as the development of large language models like ChatGPT, have blurred the boundaries between human and AI-generated text. This has led to a pressing need for tools that can determine whether text has been created or revised using AI. A general and universally effective detection model would be extremely useful, but appears to be beyond the reach of current technology and detection methods. The research described in this study adopts a domain and task specific approach and shows that specialized detection models can attain high accuracy. The study focuses on the higher education graduate admissions process, with the specific goal of identifying AI-generated and AI-revised Letters of Recommendation (LORs) and Statements of Intent (SOIs). Detecting such application materials is essential to ensure that applicants are evaluated on their true merits and abilities, and to foster an equitable and trustworthy admissions process. Our research is based on 3755 LORs and 1973 SOIs extracted from the application records of Fordham University’s Master’s programs in Computer Science and Data Science. To facilitate the construction and evaluation of detection models, we generated AI counterparts for each LOR and SOI using the GPT-3.5 Turbo API. The prompts for AI-generation text were derived from the admission data of the respective applicants, and the AI-revised LORs and SOIs were generated directly from the human-authored versions. We also utilize an open-access GPT-wiki-intro dataset to further validate our hypothesis regarding the feasibility of constructing domain-specific AI content detectors. Our experiments yield promising results in developing classifiers tailored to a specific domain when provided with sufficient training samples. Additionally, we present a comparative analysis of the word frequency and statistical characteristics of the text, which provides convincing evidence that ChatGPT employs distinctive vocabulary and paragraph structure compared to human-authored text. The code for this study is available on GitHub, and the models can be executed on user-provided data via an interactive web interface.
format	Article
id	doaj-art-4dbc2ddc04474d2eba426256a1b6e537
institution	OA Journals
issn	2045-2322
language	English
publishDate	2024-11-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-4dbc2ddc04474d2eba426256a1b6e5372025-08-20T02:18:28ZengNature PortfolioScientific Reports2045-23222024-11-0114111310.1038/s41598-024-77847-zAdmissions in the age of AI: detecting AI-generated application materials in higher educationYijun Zhao0Alexander Borelli1Fernando Martinez2Haoran Xue3Gary M. Weiss4Computer and Information Sciences Department, Fordham UniversityComputer and Information Sciences Department, Fordham UniversityComputer and Information Sciences Department, Fordham UniversityComputer and Information Sciences Department, Fordham UniversityComputer and Information Sciences Department, Fordham UniversityAbstract Recent advances in Artificial Intelligence (AI), such as the development of large language models like ChatGPT, have blurred the boundaries between human and AI-generated text. This has led to a pressing need for tools that can determine whether text has been created or revised using AI. A general and universally effective detection model would be extremely useful, but appears to be beyond the reach of current technology and detection methods. The research described in this study adopts a domain and task specific approach and shows that specialized detection models can attain high accuracy. The study focuses on the higher education graduate admissions process, with the specific goal of identifying AI-generated and AI-revised Letters of Recommendation (LORs) and Statements of Intent (SOIs). Detecting such application materials is essential to ensure that applicants are evaluated on their true merits and abilities, and to foster an equitable and trustworthy admissions process. Our research is based on 3755 LORs and 1973 SOIs extracted from the application records of Fordham University’s Master’s programs in Computer Science and Data Science. To facilitate the construction and evaluation of detection models, we generated AI counterparts for each LOR and SOI using the GPT-3.5 Turbo API. The prompts for AI-generation text were derived from the admission data of the respective applicants, and the AI-revised LORs and SOIs were generated directly from the human-authored versions. We also utilize an open-access GPT-wiki-intro dataset to further validate our hypothesis regarding the feasibility of constructing domain-specific AI content detectors. Our experiments yield promising results in developing classifiers tailored to a specific domain when provided with sufficient training samples. Additionally, we present a comparative analysis of the word frequency and statistical characteristics of the text, which provides convincing evidence that ChatGPT employs distinctive vocabulary and paragraph structure compared to human-authored text. The code for this study is available on GitHub, and the models can be executed on user-provided data via an interactive web interface.https://doi.org/10.1038/s41598-024-77847-z
spellingShingle	Yijun Zhao Alexander Borelli Fernando Martinez Haoran Xue Gary M. Weiss Admissions in the age of AI: detecting AI-generated application materials in higher education Scientific Reports
title	Admissions in the age of AI: detecting AI-generated application materials in higher education
title_full	Admissions in the age of AI: detecting AI-generated application materials in higher education
title_fullStr	Admissions in the age of AI: detecting AI-generated application materials in higher education
title_full_unstemmed	Admissions in the age of AI: detecting AI-generated application materials in higher education
title_short	Admissions in the age of AI: detecting AI-generated application materials in higher education
title_sort	admissions in the age of ai detecting ai generated application materials in higher education
url	https://doi.org/10.1038/s41598-024-77847-z
work_keys_str_mv	AT yijunzhao admissionsintheageofaidetectingaigeneratedapplicationmaterialsinhighereducation AT alexanderborelli admissionsintheageofaidetectingaigeneratedapplicationmaterialsinhighereducation AT fernandomartinez admissionsintheageofaidetectingaigeneratedapplicationmaterialsinhighereducation AT haoranxue admissionsintheageofaidetectingaigeneratedapplicationmaterialsinhighereducation AT garymweiss admissionsintheageofaidetectingaigeneratedapplicationmaterialsinhighereducation

Admissions in the age of AI: detecting AI-generated application materials in higher education

Similar Items