Domain‐Specific Customization for Language Models in Otolaryngology: The ENT GPT Assistant
Abstract Objective To develop and evaluate the effectiveness of domain‐specific customization in large language models (LLMs) by assessing the performance of the ENT GPT Assistant (E‐GPT‐A), a model specifically tailored for otolaryngology. Study Design Comparative analysis using multiple‐choice que...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Wiley
2025-04-01
|
| Series: | OTO Open |
| Subjects: | |
| Online Access: | https://doi.org/10.1002/oto2.70125 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Objective To develop and evaluate the effectiveness of domain‐specific customization in large language models (LLMs) by assessing the performance of the ENT GPT Assistant (E‐GPT‐A), a model specifically tailored for otolaryngology. Study Design Comparative analysis using multiple‐choice questions (MCQs) from established otolaryngology resources. Setting Tertiary care academic hospital. Methods Two hundred forty clinical‐vignette style MCQs were sourced from BoardVitals Otolaryngology and OTOQuest, covering a range of otolaryngology subspecialties (n = 40 for each). The E‐GPT‐A was developed using targeted instructions and customized to otolaryngology. The performance of E‐GPT‐A was compared against top‐performing and widely used artificial intelligence (AI) LLMs, including GPT‐3.5, GPT‐4, Claude 2.0, and Claude 2.1. Accuracy was assessed across subspecialties, varying question difficulty tiers, and in diagnostics and management. Results E‐GPT‐A achieved an overall accuracy of 74.6%, outperforming GPT‐3.5 (60.4%), Claude 2.0 (61.7%), Claude 2.1 (60.8%), and GPT‐4 (68.3%). The model performed best in allergy and rhinology (85.0%) and laryngology (82.5%), whereas showing lower accuracy in pediatrics (62.5%) and facial plastics/reconstructive surgery (67.5%). Accuracy also declined as question difficulty increased. The average correct response percentage among otolaryngologists and otolaryngology trainees was 71.1% in the question set. Conclusion This pilot study using the E‐GPT‐A demonstrates the potential benefits of domain‐specific customizations of language models for otolaryngology. However, further development, continuous updates, and continued real‐world validation are needed to fully assess the capabilities of LLMs in otolaryngology. |
|---|---|
| ISSN: | 2473-974X |