Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study

BackgroundWasp stings are a significant public health concern in many parts of the world, particularly in tropical and subtropical regions. The venom of wasps contains a variety of bioactive compounds that can lead to a wide range of clinical effects, from mild localized pain...

Full description

Saved in:

Bibliographic Details
Main Authors:	Wei Pan, Shuman Zhang, Yonghong Wang, Zhenglin Quan, Yanxia Zhu, Zhicheng Fang, Xianyi Yang
Format:	Article
Language:	English
Published:	JMIR Publications 2025-06-01
Series:	Journal of Medical Internet Research
Online Access:	https://www.jmir.org/2025/1/e67489
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849734776492130304
author	Wei Pan Shuman Zhang Yonghong Wang Zhenglin Quan Yanxia Zhu Zhicheng Fang Xianyi Yang
author_facet	Wei Pan Shuman Zhang Yonghong Wang Zhenglin Quan Yanxia Zhu Zhicheng Fang Xianyi Yang
author_sort	Wei Pan
collection	DOAJ
description	BackgroundWasp stings are a significant public health concern in many parts of the world, particularly in tropical and subtropical regions. The venom of wasps contains a variety of bioactive compounds that can lead to a wide range of clinical effects, from mild localized pain and swelling to severe, life-threatening allergic reactions, such as anaphylaxis. With the rapid development of artificial intelligence (AI) technologies, large language models (LLMs) are increasingly being used in health care, including emergency medicine and toxicology. These models have the potential to assist health care professionals in making fast and informed clinical decisions. This study aimed to assess the performance of 4 leading LLMs—ERNIE Bot 3.5 (Baidu), ERNIE Bot 4.0 (Baidu), Claude Pro (Anthropic), and ChatGPT 4.0—in managing wasp sting cases, with a focus on their accuracy, comprehensiveness, and decision-making abilities. ObjectiveThe objective of this research was to systematically evaluate and compare the capabilities of the 4 LLMs in the context of wasp sting management. This involved analyzing their responses to a series of standardized questions and real-world clinical scenarios. The study aimed to determine which LLMs provided the most accurate, complete, and clinically relevant information for the management of wasp stings. MethodsThis study used a cross-sectional design, creating 50 standardized questions that covered 10 key domains in the management of wasp stings, along with 20 real-world clinical case scenarios. Responses from the 4 LLMs were independently evaluated by 8 domain experts, who rated them on a 5-point Likert scale based on accuracy, completeness, and usefulness in clinical decision-making. Statistical comparisons between the models were made using the Wilcoxon signed-rank test, and the consistency of expert ratings was assessed using the Kendall coefficient of concordance. ResultsClaude Pro achieved the highest average score of 4.7 (SD 0.603) out of 5, followed closely by ChatGPT 4.0 with a score of 4.5. ERNIE Bot 4.0 and ERNIE Bot 3.5 received average scores of 4 (SD 0.600) and 3.8, respectively. In analyzing the 20 complex clinical cases, Claude Pro significantly outperformed ERNIE Bot 3.5, particularly in areas such as managing complications and assessing the severity of reactions (P<.001). The expert ratings showed moderate agreement (Kendall W=0.67), indicating that the assessments were consistently reliable. ConclusionsThe results of this study suggest that Claude Pro and ChatGPT 4.0 are highly capable of providing accurate and comprehensive support for the clinical management of wasp stings, particularly in complex decision-making scenarios. These findings support the increasing role of AI in emergency and toxicological medicine and suggest that the choice of AI tool should be based on the specific needs of the clinical situation, ensuring that the most appropriate model is selected for different health care applications.
format	Article
id	doaj-art-e3caed7729a04e0199d0266a40b42812
institution	DOAJ
issn	1438-8871
language	English
publishDate	2025-06-01
publisher	JMIR Publications
record_format	Article
series	Journal of Medical Internet Research
spelling	doaj-art-e3caed7729a04e0199d0266a40b428122025-08-20T03:07:43ZengJMIR PublicationsJournal of Medical Internet Research1438-88712025-06-0127e6748910.2196/67489Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation StudyWei Panhttps://orcid.org/0009-0004-7401-9836Shuman Zhanghttps://orcid.org/0009-0005-1615-9862Yonghong Wanghttps://orcid.org/0009-0004-5413-8627Zhenglin Quanhttps://orcid.org/0000-0001-9255-3849Yanxia Zhuhttps://orcid.org/0009-0008-0597-2761Zhicheng Fanghttps://orcid.org/0000-0003-2800-0806Xianyi Yanghttps://orcid.org/0000-0001-5343-5815 BackgroundWasp stings are a significant public health concern in many parts of the world, particularly in tropical and subtropical regions. The venom of wasps contains a variety of bioactive compounds that can lead to a wide range of clinical effects, from mild localized pain and swelling to severe, life-threatening allergic reactions, such as anaphylaxis. With the rapid development of artificial intelligence (AI) technologies, large language models (LLMs) are increasingly being used in health care, including emergency medicine and toxicology. These models have the potential to assist health care professionals in making fast and informed clinical decisions. This study aimed to assess the performance of 4 leading LLMs—ERNIE Bot 3.5 (Baidu), ERNIE Bot 4.0 (Baidu), Claude Pro (Anthropic), and ChatGPT 4.0—in managing wasp sting cases, with a focus on their accuracy, comprehensiveness, and decision-making abilities. ObjectiveThe objective of this research was to systematically evaluate and compare the capabilities of the 4 LLMs in the context of wasp sting management. This involved analyzing their responses to a series of standardized questions and real-world clinical scenarios. The study aimed to determine which LLMs provided the most accurate, complete, and clinically relevant information for the management of wasp stings. MethodsThis study used a cross-sectional design, creating 50 standardized questions that covered 10 key domains in the management of wasp stings, along with 20 real-world clinical case scenarios. Responses from the 4 LLMs were independently evaluated by 8 domain experts, who rated them on a 5-point Likert scale based on accuracy, completeness, and usefulness in clinical decision-making. Statistical comparisons between the models were made using the Wilcoxon signed-rank test, and the consistency of expert ratings was assessed using the Kendall coefficient of concordance. ResultsClaude Pro achieved the highest average score of 4.7 (SD 0.603) out of 5, followed closely by ChatGPT 4.0 with a score of 4.5. ERNIE Bot 4.0 and ERNIE Bot 3.5 received average scores of 4 (SD 0.600) and 3.8, respectively. In analyzing the 20 complex clinical cases, Claude Pro significantly outperformed ERNIE Bot 3.5, particularly in areas such as managing complications and assessing the severity of reactions (P<.001). The expert ratings showed moderate agreement (Kendall W=0.67), indicating that the assessments were consistently reliable. ConclusionsThe results of this study suggest that Claude Pro and ChatGPT 4.0 are highly capable of providing accurate and comprehensive support for the clinical management of wasp stings, particularly in complex decision-making scenarios. These findings support the increasing role of AI in emergency and toxicological medicine and suggest that the choice of AI tool should be based on the specific needs of the clinical situation, ensuring that the most appropriate model is selected for different health care applications.https://www.jmir.org/2025/1/e67489
spellingShingle	Wei Pan Shuman Zhang Yonghong Wang Zhenglin Quan Yanxia Zhu Zhicheng Fang Xianyi Yang Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study Journal of Medical Internet Research
title	Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study
title_full	Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study
title_fullStr	Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study
title_full_unstemmed	Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study
title_short	Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study
title_sort	clinical management of wasp stings using large language models cross sectional evaluation study
url	https://www.jmir.org/2025/1/e67489
work_keys_str_mv	AT weipan clinicalmanagementofwaspstingsusinglargelanguagemodelscrosssectionalevaluationstudy AT shumanzhang clinicalmanagementofwaspstingsusinglargelanguagemodelscrosssectionalevaluationstudy AT yonghongwang clinicalmanagementofwaspstingsusinglargelanguagemodelscrosssectionalevaluationstudy AT zhenglinquan clinicalmanagementofwaspstingsusinglargelanguagemodelscrosssectionalevaluationstudy AT yanxiazhu clinicalmanagementofwaspstingsusinglargelanguagemodelscrosssectionalevaluationstudy AT zhichengfang clinicalmanagementofwaspstingsusinglargelanguagemodelscrosssectionalevaluationstudy AT xianyiyang clinicalmanagementofwaspstingsusinglargelanguagemodelscrosssectionalevaluationstudy

Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study

Similar Items