Responses of Artificial Intelligence Chatbots to Testosterone Replacement Therapy: Patients Beware!

<b>Background/Objectives</b>: Using chatbots to seek healthcare information is becoming more popular. Misinformation and gaps in knowledge exist regarding the risk and benefits of testosterone replacement therapy (TRT). We aimed to assess and compare the quality and readability of respon...

Full description

Saved in:
Bibliographic Details
Main Authors: Herleen Pabla, Alyssa Lange, Nagalakshmi Nadiminty, Puneet Sindhwani
Format: Article
Language:English
Published: MDPI AG 2025-02-01
Series:Société Internationale d’Urologie Journal
Subjects:
Online Access:https://www.mdpi.com/2563-6499/6/1/13
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:<b>Background/Objectives</b>: Using chatbots to seek healthcare information is becoming more popular. Misinformation and gaps in knowledge exist regarding the risk and benefits of testosterone replacement therapy (TRT). We aimed to assess and compare the quality and readability of responses generated by four AI chatbots. <b>Methods</b>: ChatGPT, Google Bard, Bing Chat, and Perplexity AI were asked the same eleven questions regarding TRT. The responses were evaluated by four reviewers using DISCERN and Patient Education Materials Assessment Tool (PEMAT) questionnaires. Readability was assessed using the Readability Scoring system v2.0. to calculate the Flesch–Kincaid Reading Ease Score (FRES) and the Flesch–Kincaid Grade Level (FKGL). Kruskal–Wallis statistics were completed using GraphPad Prism V10.1.0. <b>Results</b>: Google Bard received the highest DISCERN (56.5) and PEMAT (96% understandability and 74% actionability), demonstrating the highest quality. The readability scores ranged from eleventh-grade level to college level, with Perplexity outperforming the other chatbots. Significant differences were found in understandability between Bing and Google Bard, DISCERN scores between Bing and Google Bard, FRES between ChatGPT and Perplexity, and FKGL scoring between ChatGPT and Perplexity AI. <b>Conclusions</b>: ChatGPT and Google Bard were the top performers based on their quality, understandability, and actionability. Despite Perplexity scoring higher in readability, the generated text still maintained an eleventh-grade complexity. Perplexity stood out for its extensive use of citations; however, it offered repetitive answers despite the diversity of questions posed to it. Google Bard demonstrated a high level of detail in its answers, offering additional value through visual aids. With improvements in technology, these AI chatbots may improve. Until then, patients and providers should be aware of the strengths and shortcomings of each.
ISSN:2563-6499