Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer

Abstract Background Artificial intelligence AI platforms, such as Gemini, ChatGPT, DeepSeek, and Perplexity, are increasingly utilized to support clinical decision-making, yet their accuracy in specific medical domains remains variable. This study assessed the performance of these AI chatbots in res...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ahmed Bashah, Abdulkhaleq Salem, Ali Al-waqeerah, Eslam Ghaleb, Natheer Wahan, Ahmed Awad, Omran Al-tos, Gang Chen
Format:	Article
Language:	English
Published:	BMC 2025-08-01
Series:	BMC Oral Health
Subjects:	Artificial intelligence LLMs DeepSeek Gemini ChatGPT Perplexity
Online Access:	https://doi.org/10.1186/s12903-025-06726-4
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849225822626381824
author	Ahmed Bashah Abdulkhaleq Salem Ali Al-waqeerah Eslam Ghaleb Natheer Wahan Ahmed Awad Omran Al-tos Gang Chen
author_facet	Ahmed Bashah Abdulkhaleq Salem Ali Al-waqeerah Eslam Ghaleb Natheer Wahan Ahmed Awad Omran Al-tos Gang Chen
author_sort	Ahmed Bashah
collection	DOAJ
description	Abstract Background Artificial intelligence AI platforms, such as Gemini, ChatGPT, DeepSeek, and Perplexity, are increasingly utilized to support clinical decision-making, yet their accuracy in specific medical domains remains variable. This study assessed the performance of these AI chatbots in responding to clinical questions commonly posed by surgeons in the context of salivary gland cancer, a field closely related to oral and maxillofacial oncology. Methods Thirty clinical questions related to salivary gland malignancies were created according to the ASCO 2021 guidelines. Two researchers posted on four AI chatbot platforms: ChatGPT-4o, DeepSeek, Gemini, and Peperlixity. These questions were queried three times daily over ten days, yielding a total of 2700 responses that were coded as correct or incorrect. The accuracy of each response was statistically analyzed, and overall accuracy rates for each AI platform were calculated. Results DeepSeek achieved the highest accuracy rate at 86.9%, followed by Gemini at 78.9%, ChatGPT-4o at 72.8%, and Perplexity at 71.6%. Conclusion Despite demonstrating substantial potential, current AI chatbots have not yet achieved sufficient accuracy for standalone clinical use in salivary gland cancer in clinical applications. Enhancements in AI capabilities and rigorous clinical validation are necessary to ensure patient safety and effectiveness in clinical practice.
format	Article
id	doaj-art-2427fa8aa7764dcba5bafd646f270e3f
institution	Kabale University
issn	1472-6831
language	English
publishDate	2025-08-01
publisher	BMC
record_format	Article
series	BMC Oral Health
spelling	doaj-art-2427fa8aa7764dcba5bafd646f270e3f2025-08-24T11:54:57ZengBMCBMC Oral Health1472-68312025-08-0125111510.1186/s12903-025-06726-4Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancerAhmed Bashah0Abdulkhaleq Salem1Ali Al-waqeerah2Eslam Ghaleb3Natheer Wahan4Ahmed Awad5Omran Al-tos6Gang Chen7Department of Stomatology, The First Affiliated Hospital of Dalian Medical UniversityDepartment of Pharmacology, Dalian Medical UniversityDepartment of Respiratory, The First Affiliated Hospital of Dalian Medical UniversityDepartment of Biochemistry and Molecular Biology, Dalian Medical UniversityDepartment of Pharmacology, Dalian Medical UniversityDepartment of Stomatology, Dalian Medical UniversityDepartment of Stomatology, The First Affiliated Hospital of Dalian Medical UniversityDepartment of Stomatology, The First Affiliated Hospital of Dalian Medical UniversityAbstract Background Artificial intelligence AI platforms, such as Gemini, ChatGPT, DeepSeek, and Perplexity, are increasingly utilized to support clinical decision-making, yet their accuracy in specific medical domains remains variable. This study assessed the performance of these AI chatbots in responding to clinical questions commonly posed by surgeons in the context of salivary gland cancer, a field closely related to oral and maxillofacial oncology. Methods Thirty clinical questions related to salivary gland malignancies were created according to the ASCO 2021 guidelines. Two researchers posted on four AI chatbot platforms: ChatGPT-4o, DeepSeek, Gemini, and Peperlixity. These questions were queried three times daily over ten days, yielding a total of 2700 responses that were coded as correct or incorrect. The accuracy of each response was statistically analyzed, and overall accuracy rates for each AI platform were calculated. Results DeepSeek achieved the highest accuracy rate at 86.9%, followed by Gemini at 78.9%, ChatGPT-4o at 72.8%, and Perplexity at 71.6%. Conclusion Despite demonstrating substantial potential, current AI chatbots have not yet achieved sufficient accuracy for standalone clinical use in salivary gland cancer in clinical applications. Enhancements in AI capabilities and rigorous clinical validation are necessary to ensure patient safety and effectiveness in clinical practice.https://doi.org/10.1186/s12903-025-06726-4Artificial intelligenceLLMsDeepSeekGeminiChatGPTPerplexity
spellingShingle	Ahmed Bashah Abdulkhaleq Salem Ali Al-waqeerah Eslam Ghaleb Natheer Wahan Ahmed Awad Omran Al-tos Gang Chen Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer BMC Oral Health Artificial intelligence LLMs DeepSeek Gemini ChatGPT Perplexity
title	Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer
title_full	Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer
title_fullStr	Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer
title_full_unstemmed	Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer
title_short	Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer
title_sort	evaluation of deepseek gemini chatgpt 4o and perplexity in responding to salivary gland cancer
topic	Artificial intelligence LLMs DeepSeek Gemini ChatGPT Perplexity
url	https://doi.org/10.1186/s12903-025-06726-4
work_keys_str_mv	AT ahmedbashah evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer AT abdulkhaleqsalem evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer AT alialwaqeerah evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer AT eslamghaleb evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer AT natheerwahan evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer AT ahmedawad evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer AT omranaltos evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer AT gangchen evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer

Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer

Similar Items