Comparative Evaluation of Artificial Intelligence Models for Contraceptive Counseling
Background: As digital health resources become increasingly prevalent, assessing the quality of information provided by publicly available AI tools is vital for evidence-based patient education. Objective: This study evaluates the accuracy and readability of responses from four large language models...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Digital |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2673-6470/5/2/10 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Background: As digital health resources become increasingly prevalent, assessing the quality of information provided by publicly available AI tools is vital for evidence-based patient education. Objective: This study evaluates the accuracy and readability of responses from four large language models—ChatGPT 4.0, ChatGPT 3.5, Google Bard, and Microsoft Bing—in providing contraceptive counseling. Methods: A cross-sectional analysis was conducted using standardized contraception questions, established readability indices, and a panel of blinded OB/GYN physician reviewers comparing model responses to an AAFP benchmark. Results: The models varied in readability and evidence adherence; notably, ChatGPT 3.5 provided more evidence-based responses than GPT-4.0, although all outputs exceeded the recommended 6th-grade reading level. Conclusions: Our findings underscore the need for the further refinement of LLMs to balance clinical accuracy with patient-friendly language, supporting their role as a supplement to clinician counseling. |
|---|---|
| ISSN: | 2673-6470 |