Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer

Abstract Background Artificial intelligence AI platforms, such as Gemini, ChatGPT, DeepSeek, and Perplexity, are increasingly utilized to support clinical decision-making, yet their accuracy in specific medical domains remains variable. This study assessed the performance of these AI chatbots in res...

Full description

Saved in:
Bibliographic Details
Main Authors: Ahmed Bashah, Abdulkhaleq Salem, Ali Al-waqeerah, Eslam Ghaleb, Natheer Wahan, Ahmed Awad, Omran Al-tos, Gang Chen
Format: Article
Language:English
Published: BMC 2025-08-01
Series:BMC Oral Health
Subjects:
Online Access:https://doi.org/10.1186/s12903-025-06726-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849225822626381824
author Ahmed Bashah
Abdulkhaleq Salem
Ali Al-waqeerah
Eslam Ghaleb
Natheer Wahan
Ahmed Awad
Omran Al-tos
Gang Chen
author_facet Ahmed Bashah
Abdulkhaleq Salem
Ali Al-waqeerah
Eslam Ghaleb
Natheer Wahan
Ahmed Awad
Omran Al-tos
Gang Chen
author_sort Ahmed Bashah
collection DOAJ
description Abstract Background Artificial intelligence AI platforms, such as Gemini, ChatGPT, DeepSeek, and Perplexity, are increasingly utilized to support clinical decision-making, yet their accuracy in specific medical domains remains variable. This study assessed the performance of these AI chatbots in responding to clinical questions commonly posed by surgeons in the context of salivary gland cancer, a field closely related to oral and maxillofacial oncology. Methods Thirty clinical questions related to salivary gland malignancies were created according to the ASCO 2021 guidelines. Two researchers posted on four AI chatbot platforms: ChatGPT-4o, DeepSeek, Gemini, and Peperlixity. These questions were queried three times daily over ten days, yielding a total of 2700 responses that were coded as correct or incorrect. The accuracy of each response was statistically analyzed, and overall accuracy rates for each AI platform were calculated. Results DeepSeek achieved the highest accuracy rate at 86.9%, followed by Gemini at 78.9%, ChatGPT-4o at 72.8%, and Perplexity at 71.6%. Conclusion Despite demonstrating substantial potential, current AI chatbots have not yet achieved sufficient accuracy for standalone clinical use in salivary gland cancer in clinical applications. Enhancements in AI capabilities and rigorous clinical validation are necessary to ensure patient safety and effectiveness in clinical practice.
format Article
id doaj-art-2427fa8aa7764dcba5bafd646f270e3f
institution Kabale University
issn 1472-6831
language English
publishDate 2025-08-01
publisher BMC
record_format Article
series BMC Oral Health
spelling doaj-art-2427fa8aa7764dcba5bafd646f270e3f2025-08-24T11:54:57ZengBMCBMC Oral Health1472-68312025-08-0125111510.1186/s12903-025-06726-4Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancerAhmed Bashah0Abdulkhaleq Salem1Ali Al-waqeerah2Eslam Ghaleb3Natheer Wahan4Ahmed Awad5Omran Al-tos6Gang Chen7Department of Stomatology, The First Affiliated Hospital of Dalian Medical UniversityDepartment of Pharmacology, Dalian Medical UniversityDepartment of Respiratory, The First Affiliated Hospital of Dalian Medical UniversityDepartment of Biochemistry and Molecular Biology, Dalian Medical UniversityDepartment of Pharmacology, Dalian Medical UniversityDepartment of Stomatology, Dalian Medical UniversityDepartment of Stomatology, The First Affiliated Hospital of Dalian Medical UniversityDepartment of Stomatology, The First Affiliated Hospital of Dalian Medical UniversityAbstract Background Artificial intelligence AI platforms, such as Gemini, ChatGPT, DeepSeek, and Perplexity, are increasingly utilized to support clinical decision-making, yet their accuracy in specific medical domains remains variable. This study assessed the performance of these AI chatbots in responding to clinical questions commonly posed by surgeons in the context of salivary gland cancer, a field closely related to oral and maxillofacial oncology. Methods Thirty clinical questions related to salivary gland malignancies were created according to the ASCO 2021 guidelines. Two researchers posted on four AI chatbot platforms: ChatGPT-4o, DeepSeek, Gemini, and Peperlixity. These questions were queried three times daily over ten days, yielding a total of 2700 responses that were coded as correct or incorrect. The accuracy of each response was statistically analyzed, and overall accuracy rates for each AI platform were calculated. Results DeepSeek achieved the highest accuracy rate at 86.9%, followed by Gemini at 78.9%, ChatGPT-4o at 72.8%, and Perplexity at 71.6%. Conclusion Despite demonstrating substantial potential, current AI chatbots have not yet achieved sufficient accuracy for standalone clinical use in salivary gland cancer in clinical applications. Enhancements in AI capabilities and rigorous clinical validation are necessary to ensure patient safety and effectiveness in clinical practice.https://doi.org/10.1186/s12903-025-06726-4Artificial intelligenceLLMsDeepSeekGeminiChatGPTPerplexity
spellingShingle Ahmed Bashah
Abdulkhaleq Salem
Ali Al-waqeerah
Eslam Ghaleb
Natheer Wahan
Ahmed Awad
Omran Al-tos
Gang Chen
Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer
BMC Oral Health
Artificial intelligence
LLMs
DeepSeek
Gemini
ChatGPT
Perplexity
title Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer
title_full Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer
title_fullStr Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer
title_full_unstemmed Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer
title_short Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer
title_sort evaluation of deepseek gemini chatgpt 4o and perplexity in responding to salivary gland cancer
topic Artificial intelligence
LLMs
DeepSeek
Gemini
ChatGPT
Perplexity
url https://doi.org/10.1186/s12903-025-06726-4
work_keys_str_mv AT ahmedbashah evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer
AT abdulkhaleqsalem evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer
AT alialwaqeerah evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer
AT eslamghaleb evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer
AT natheerwahan evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer
AT ahmedawad evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer
AT omranaltos evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer
AT gangchen evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer