Image Recognition Performance of GPT-4V(ision) and GPT-4o in Ophthalmology: Use of Images in Clinical Questions
Kosei Tomita,1 Takashi Nishida,2 Yoshiyuki Kitaguchi,3 Koji Kitazawa,4,* Masahiro Miyake5,* 1Department of Ophthalmology, Kawasaki Medical School, Okayama, Japan; 2Hamilton Glaucoma Center, Shiley Eye Institute, Viterbi Family Department of Ophthalmology, University of Califo...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Dove Medical Press
2025-05-01
|
| Series: | Clinical Ophthalmology |
| Subjects: | |
| Online Access: | https://www.dovepress.com/image-recognition-performance-of-gpt-4vision-and-gpt-4o-in-ophthalmolo-peer-reviewed-fulltext-article-OPTH |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Kosei Tomita,1 Takashi Nishida,2 Yoshiyuki Kitaguchi,3 Koji Kitazawa,4,&ast; Masahiro Miyake5,&ast; 1Department of Ophthalmology, Kawasaki Medical School, Okayama, Japan; 2Hamilton Glaucoma Center, Shiley Eye Institute, Viterbi Family Department of Ophthalmology, University of California, San Diego, La Jolla, CA, USA; 3Department of Ophthalmology, Osaka University Graduate School of Medicine, Osaka, Japan; 4Department of Ophthalmology, Kyoto Prefectural University of Medicine, Kyoto, Japan; 5Department of Ophthalmology and Visual Sciences, Kyoto University Graduate School of Medicine, Kyoto, Japan&ast;These authors contributed equally to this workCorrespondence: Takashi Nishida, University of California, 9415 Campus Point Drive, San Diego, La Jolla, CA, 92093-0946, USA, Email t.nishida.opt@gmail.comPurpose: To compare the diagnostic accuracy of Generative Pre-trained Transformer with Vision (GPT)-4, GPT-4 with Vision (GPT-4V), and GPT-4o for clinical questions in ophthalmology.Patients and Methods: The questions were collected from the “Diagnosis This” section on the American Academy of Ophthalmology website. We tested 580 questions and presented ChatGPT with the same questions under two conditions: 1) multimodal model, incorporating both the question text and associated images, and 2) text-only model. We then compared the difference in accuracy using McNemar tests among multimodal (GPT-4o and GPT-4V) and text-only (GPT-4V) models. The percentage of general correct answers was also collected from the website.Results: Multimodal GPT-4o performed the best accuracy (77.1%), followed by multimodal GPT-4V (71.0%), and then text-only GPT-4V (68.7%); (P values < 0.001, 0.012, and 0.001, respectively). All GPT-4 models showed higher accuracy than the general correct answers on the website (64.6%).Conclusion: The addition of information from images enhances the performance of GPT-4V in diagnosing clinical questions in ophthalmology. This suggests that integrating multimodal data could be crucial in developing more effective and reliable diagnostic tools in medical fields.Keywords: ChatGPT, large language model, GPT-4o, ophthalmology |
|---|---|
| ISSN: | 1177-5483 |