Evaluating a Large Language Model’s Ability to Synthesize a Health Science Master’s Thesis: Case Study

Abstract BackgroundLarge language models (LLMs) can aid students in mastering a new topic fast, but for the educational institutions responsible for assessing and grading the academic level of students, it can be difficult to discern whether a text has originated from a studen...

Full description

Saved in:
Bibliographic Details
Main Authors: Pål Joranger, Sara Rivenes Lafontan, Asgeir Brevik
Format: Article
Language:English
Published: JMIR Publications 2025-07-01
Series:JMIR Formative Research
Online Access:https://formative.jmir.org/2025/1/e73248
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849319923940065280
author Pål Joranger
Sara Rivenes Lafontan
Asgeir Brevik
author_facet Pål Joranger
Sara Rivenes Lafontan
Asgeir Brevik
author_sort Pål Joranger
collection DOAJ
description Abstract BackgroundLarge language models (LLMs) can aid students in mastering a new topic fast, but for the educational institutions responsible for assessing and grading the academic level of students, it can be difficult to discern whether a text has originated from a student’s own cognition or has been synthesized by an LLM. Universities have traditionally relied on a submitted written thesis as proof of higher-level learning, on which to grant grades and diplomas. But what happens when LLMs are able to mimic the academic writing of subject matter experts? This is now a real dilemma. The ubiquitous availability of LLMs challenges trust in the master’s thesis as evidence of subject matter comprehension and academic competencies. ObjectiveIn this study, we aimed to assess the quality of rapid machine-generated papers against the standards of the health science master’s program we are currently affiliated with. MethodsIn an exploratory case study, we used ChatGPT (OpenAI) to generate 2 research papers as conceivable student submissions for master’s thesis graduation from a health science master’s program. One paper simulated a qualitative health science research project and another simulated a quantitative health science research project. ResultsUsing a stepwise approach, we prompted ChatGPT to (1) synthesize 2 credible datasets, and (2) generate 2 papers, that—in our judgment—would have been able to pass as credible medium-quality graduation research papers at the health science master’s program the authors are currently affiliated with. It took 2.5 hours of iterative dialogue with ChatGPT to develop the qualitative paper and 3.5 hours to develop the quantitative paper. Making the synthetic datasets that served as a starting point for our ChatGPT-driven paper development took 1.5 and 16 hours for the qualitative and quantitative datasets, respectively. This included learning and prompt optimization, and for the quantitative dataset, it included the time it took to create tables, estimate relevant bivariate correlation coefficients, and prepare these coefficients to be read by ChatGPT. ConclusionsOur demonstration highlights the ease with which an LLM can synthesize research data, conduct scientific analyses, and produce credible research papers required for graduation from a master’s program. A clear and well-written master’s thesis, citing subject matter authorities and true to the expectations for academic writing, can no longer be regarded as solid proof of either extensive study or subject matter mastery. To uphold the integrity of academic standards and the value of university diplomas, we recommend that master’s programs prioritize oral examinations and school exams. This shift is now crucial to ensure a fair and rigorous assessment of higher-order learning and abilities at the master’s level.
format Article
id doaj-art-7ccdd1257e484a4a894f4796398ea4dd
institution Kabale University
issn 2561-326X
language English
publishDate 2025-07-01
publisher JMIR Publications
record_format Article
series JMIR Formative Research
spelling doaj-art-7ccdd1257e484a4a894f4796398ea4dd2025-08-20T03:50:16ZengJMIR PublicationsJMIR Formative Research2561-326X2025-07-019e73248e7324810.2196/73248Evaluating a Large Language Model’s Ability to Synthesize a Health Science Master’s Thesis: Case StudyPål Jorangerhttp://orcid.org/0000-0001-6274-8039Sara Rivenes Lafontanhttp://orcid.org/0000-0001-7382-3022Asgeir Brevikhttp://orcid.org/0009-0009-8892-4673 Abstract BackgroundLarge language models (LLMs) can aid students in mastering a new topic fast, but for the educational institutions responsible for assessing and grading the academic level of students, it can be difficult to discern whether a text has originated from a student’s own cognition or has been synthesized by an LLM. Universities have traditionally relied on a submitted written thesis as proof of higher-level learning, on which to grant grades and diplomas. But what happens when LLMs are able to mimic the academic writing of subject matter experts? This is now a real dilemma. The ubiquitous availability of LLMs challenges trust in the master’s thesis as evidence of subject matter comprehension and academic competencies. ObjectiveIn this study, we aimed to assess the quality of rapid machine-generated papers against the standards of the health science master’s program we are currently affiliated with. MethodsIn an exploratory case study, we used ChatGPT (OpenAI) to generate 2 research papers as conceivable student submissions for master’s thesis graduation from a health science master’s program. One paper simulated a qualitative health science research project and another simulated a quantitative health science research project. ResultsUsing a stepwise approach, we prompted ChatGPT to (1) synthesize 2 credible datasets, and (2) generate 2 papers, that—in our judgment—would have been able to pass as credible medium-quality graduation research papers at the health science master’s program the authors are currently affiliated with. It took 2.5 hours of iterative dialogue with ChatGPT to develop the qualitative paper and 3.5 hours to develop the quantitative paper. Making the synthetic datasets that served as a starting point for our ChatGPT-driven paper development took 1.5 and 16 hours for the qualitative and quantitative datasets, respectively. This included learning and prompt optimization, and for the quantitative dataset, it included the time it took to create tables, estimate relevant bivariate correlation coefficients, and prepare these coefficients to be read by ChatGPT. ConclusionsOur demonstration highlights the ease with which an LLM can synthesize research data, conduct scientific analyses, and produce credible research papers required for graduation from a master’s program. A clear and well-written master’s thesis, citing subject matter authorities and true to the expectations for academic writing, can no longer be regarded as solid proof of either extensive study or subject matter mastery. To uphold the integrity of academic standards and the value of university diplomas, we recommend that master’s programs prioritize oral examinations and school exams. This shift is now crucial to ensure a fair and rigorous assessment of higher-order learning and abilities at the master’s level.https://formative.jmir.org/2025/1/e73248
spellingShingle Pål Joranger
Sara Rivenes Lafontan
Asgeir Brevik
Evaluating a Large Language Model’s Ability to Synthesize a Health Science Master’s Thesis: Case Study
JMIR Formative Research
title Evaluating a Large Language Model’s Ability to Synthesize a Health Science Master’s Thesis: Case Study
title_full Evaluating a Large Language Model’s Ability to Synthesize a Health Science Master’s Thesis: Case Study
title_fullStr Evaluating a Large Language Model’s Ability to Synthesize a Health Science Master’s Thesis: Case Study
title_full_unstemmed Evaluating a Large Language Model’s Ability to Synthesize a Health Science Master’s Thesis: Case Study
title_short Evaluating a Large Language Model’s Ability to Synthesize a Health Science Master’s Thesis: Case Study
title_sort evaluating a large language model s ability to synthesize a health science master s thesis case study
url https://formative.jmir.org/2025/1/e73248
work_keys_str_mv AT paljoranger evaluatingalargelanguagemodelsabilitytosynthesizeahealthsciencemastersthesiscasestudy
AT sarariveneslafontan evaluatingalargelanguagemodelsabilitytosynthesizeahealthsciencemastersthesiscasestudy
AT asgeirbrevik evaluatingalargelanguagemodelsabilitytosynthesizeahealthsciencemastersthesiscasestudy