Domain‐Specific Customization for Language Models in Otolaryngology: The ENT GPT Assistant

Abstract Objective To develop and evaluate the effectiveness of domain‐specific customization in large language models (LLMs) by assessing the performance of the ENT GPT Assistant (E‐GPT‐A), a model specifically tailored for otolaryngology. Study Design Comparative analysis using multiple‐choice que...

Full description

Saved in:
Bibliographic Details
Main Authors: Brenton T. Bicknell, Nicholas J. Rivers, Adam Skelton, Delaney Sheehan, Charis Hodges, Stevan C. Fairburn, Benjamin J. Greene, Bharat Panuganti
Format: Article
Language:English
Published: Wiley 2025-04-01
Series:OTO Open
Subjects:
Online Access:https://doi.org/10.1002/oto2.70125
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Objective To develop and evaluate the effectiveness of domain‐specific customization in large language models (LLMs) by assessing the performance of the ENT GPT Assistant (E‐GPT‐A), a model specifically tailored for otolaryngology. Study Design Comparative analysis using multiple‐choice questions (MCQs) from established otolaryngology resources. Setting Tertiary care academic hospital. Methods Two hundred forty clinical‐vignette style MCQs were sourced from BoardVitals Otolaryngology and OTOQuest, covering a range of otolaryngology subspecialties (n = 40 for each). The E‐GPT‐A was developed using targeted instructions and customized to otolaryngology. The performance of E‐GPT‐A was compared against top‐performing and widely used artificial intelligence (AI) LLMs, including GPT‐3.5, GPT‐4, Claude 2.0, and Claude 2.1. Accuracy was assessed across subspecialties, varying question difficulty tiers, and in diagnostics and management. Results E‐GPT‐A achieved an overall accuracy of 74.6%, outperforming GPT‐3.5 (60.4%), Claude 2.0 (61.7%), Claude 2.1 (60.8%), and GPT‐4 (68.3%). The model performed best in allergy and rhinology (85.0%) and laryngology (82.5%), whereas showing lower accuracy in pediatrics (62.5%) and facial plastics/reconstructive surgery (67.5%). Accuracy also declined as question difficulty increased. The average correct response percentage among otolaryngologists and otolaryngology trainees was 71.1% in the question set. Conclusion This pilot study using the E‐GPT‐A demonstrates the potential benefits of domain‐specific customizations of language models for otolaryngology. However, further development, continuous updates, and continued real‐world validation are needed to fully assess the capabilities of LLMs in otolaryngology.
ISSN:2473-974X