Assessment of Large Language Model Performance on Medical School Essay-Style Concept Appraisal Questions: Exploratory Study
AbstractBing Chat (subsequently renamed Microsoft Copilot)—a ChatGPT 4.0–based large language model—demonstrated comparable performance to medical students in answering essay-style concept appraisals, while assessors struggled to differentiate artificial intelligence (AI) responses from h...
Saved in:
| Main Authors: | , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
JMIR Publications
2025-06-01
|
| Series: | JMIR Medical Education |
| Online Access: | https://mededu.jmir.org/2025/1/e72034 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | AbstractBing Chat (subsequently renamed Microsoft Copilot)—a ChatGPT 4.0–based large language model—demonstrated comparable performance to medical students in answering essay-style concept appraisals, while assessors struggled to differentiate artificial intelligence (AI) responses from human responses. These results highlight the need to prepare students and educators for a future world of AI by fostering reflective learning practices and critical thinking. |
|---|---|
| ISSN: | 2369-3762 |