Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer
Abstract Background Artificial intelligence AI platforms, such as Gemini, ChatGPT, DeepSeek, and Perplexity, are increasingly utilized to support clinical decision-making, yet their accuracy in specific medical domains remains variable. This study assessed the performance of these AI chatbots in res...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-08-01
|
| Series: | BMC Oral Health |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12903-025-06726-4 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849225822626381824 |
|---|---|
| author | Ahmed Bashah Abdulkhaleq Salem Ali Al-waqeerah Eslam Ghaleb Natheer Wahan Ahmed Awad Omran Al-tos Gang Chen |
| author_facet | Ahmed Bashah Abdulkhaleq Salem Ali Al-waqeerah Eslam Ghaleb Natheer Wahan Ahmed Awad Omran Al-tos Gang Chen |
| author_sort | Ahmed Bashah |
| collection | DOAJ |
| description | Abstract Background Artificial intelligence AI platforms, such as Gemini, ChatGPT, DeepSeek, and Perplexity, are increasingly utilized to support clinical decision-making, yet their accuracy in specific medical domains remains variable. This study assessed the performance of these AI chatbots in responding to clinical questions commonly posed by surgeons in the context of salivary gland cancer, a field closely related to oral and maxillofacial oncology. Methods Thirty clinical questions related to salivary gland malignancies were created according to the ASCO 2021 guidelines. Two researchers posted on four AI chatbot platforms: ChatGPT-4o, DeepSeek, Gemini, and Peperlixity. These questions were queried three times daily over ten days, yielding a total of 2700 responses that were coded as correct or incorrect. The accuracy of each response was statistically analyzed, and overall accuracy rates for each AI platform were calculated. Results DeepSeek achieved the highest accuracy rate at 86.9%, followed by Gemini at 78.9%, ChatGPT-4o at 72.8%, and Perplexity at 71.6%. Conclusion Despite demonstrating substantial potential, current AI chatbots have not yet achieved sufficient accuracy for standalone clinical use in salivary gland cancer in clinical applications. Enhancements in AI capabilities and rigorous clinical validation are necessary to ensure patient safety and effectiveness in clinical practice. |
| format | Article |
| id | doaj-art-2427fa8aa7764dcba5bafd646f270e3f |
| institution | Kabale University |
| issn | 1472-6831 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | BMC |
| record_format | Article |
| series | BMC Oral Health |
| spelling | doaj-art-2427fa8aa7764dcba5bafd646f270e3f2025-08-24T11:54:57ZengBMCBMC Oral Health1472-68312025-08-0125111510.1186/s12903-025-06726-4Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancerAhmed Bashah0Abdulkhaleq Salem1Ali Al-waqeerah2Eslam Ghaleb3Natheer Wahan4Ahmed Awad5Omran Al-tos6Gang Chen7Department of Stomatology, The First Affiliated Hospital of Dalian Medical UniversityDepartment of Pharmacology, Dalian Medical UniversityDepartment of Respiratory, The First Affiliated Hospital of Dalian Medical UniversityDepartment of Biochemistry and Molecular Biology, Dalian Medical UniversityDepartment of Pharmacology, Dalian Medical UniversityDepartment of Stomatology, Dalian Medical UniversityDepartment of Stomatology, The First Affiliated Hospital of Dalian Medical UniversityDepartment of Stomatology, The First Affiliated Hospital of Dalian Medical UniversityAbstract Background Artificial intelligence AI platforms, such as Gemini, ChatGPT, DeepSeek, and Perplexity, are increasingly utilized to support clinical decision-making, yet their accuracy in specific medical domains remains variable. This study assessed the performance of these AI chatbots in responding to clinical questions commonly posed by surgeons in the context of salivary gland cancer, a field closely related to oral and maxillofacial oncology. Methods Thirty clinical questions related to salivary gland malignancies were created according to the ASCO 2021 guidelines. Two researchers posted on four AI chatbot platforms: ChatGPT-4o, DeepSeek, Gemini, and Peperlixity. These questions were queried three times daily over ten days, yielding a total of 2700 responses that were coded as correct or incorrect. The accuracy of each response was statistically analyzed, and overall accuracy rates for each AI platform were calculated. Results DeepSeek achieved the highest accuracy rate at 86.9%, followed by Gemini at 78.9%, ChatGPT-4o at 72.8%, and Perplexity at 71.6%. Conclusion Despite demonstrating substantial potential, current AI chatbots have not yet achieved sufficient accuracy for standalone clinical use in salivary gland cancer in clinical applications. Enhancements in AI capabilities and rigorous clinical validation are necessary to ensure patient safety and effectiveness in clinical practice.https://doi.org/10.1186/s12903-025-06726-4Artificial intelligenceLLMsDeepSeekGeminiChatGPTPerplexity |
| spellingShingle | Ahmed Bashah Abdulkhaleq Salem Ali Al-waqeerah Eslam Ghaleb Natheer Wahan Ahmed Awad Omran Al-tos Gang Chen Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer BMC Oral Health Artificial intelligence LLMs DeepSeek Gemini ChatGPT Perplexity |
| title | Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer |
| title_full | Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer |
| title_fullStr | Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer |
| title_full_unstemmed | Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer |
| title_short | Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer |
| title_sort | evaluation of deepseek gemini chatgpt 4o and perplexity in responding to salivary gland cancer |
| topic | Artificial intelligence LLMs DeepSeek Gemini ChatGPT Perplexity |
| url | https://doi.org/10.1186/s12903-025-06726-4 |
| work_keys_str_mv | AT ahmedbashah evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer AT abdulkhaleqsalem evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer AT alialwaqeerah evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer AT eslamghaleb evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer AT natheerwahan evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer AT ahmedawad evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer AT omranaltos evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer AT gangchen evaluationofdeepseekgeminichatgpt4oandperplexityinrespondingtosalivaryglandcancer |