Comparative Evaluation of Artificial Intelligence Models for Contraceptive Counseling

Background: As digital health resources become increasingly prevalent, assessing the quality of information provided by publicly available AI tools is vital for evidence-based patient education. Objective: This study evaluates the accuracy and readability of responses from four large language models...

Full description

Saved in:
Bibliographic Details
Main Authors: Anisha V. Patel, Sona Jasani, Abdelrahman AlAshqar, Rushabh H. Doshi, Kanhai Amin, Aisvarya Panakam, Ankita Patil, Sangini S. Sheth
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Digital
Subjects:
Online Access:https://www.mdpi.com/2673-6470/5/2/10
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849432659269255168
author Anisha V. Patel
Sona Jasani
Abdelrahman AlAshqar
Rushabh H. Doshi
Kanhai Amin
Aisvarya Panakam
Ankita Patil
Sangini S. Sheth
author_facet Anisha V. Patel
Sona Jasani
Abdelrahman AlAshqar
Rushabh H. Doshi
Kanhai Amin
Aisvarya Panakam
Ankita Patil
Sangini S. Sheth
author_sort Anisha V. Patel
collection DOAJ
description Background: As digital health resources become increasingly prevalent, assessing the quality of information provided by publicly available AI tools is vital for evidence-based patient education. Objective: This study evaluates the accuracy and readability of responses from four large language models—ChatGPT 4.0, ChatGPT 3.5, Google Bard, and Microsoft Bing—in providing contraceptive counseling. Methods: A cross-sectional analysis was conducted using standardized contraception questions, established readability indices, and a panel of blinded OB/GYN physician reviewers comparing model responses to an AAFP benchmark. Results: The models varied in readability and evidence adherence; notably, ChatGPT 3.5 provided more evidence-based responses than GPT-4.0, although all outputs exceeded the recommended 6th-grade reading level. Conclusions: Our findings underscore the need for the further refinement of LLMs to balance clinical accuracy with patient-friendly language, supporting their role as a supplement to clinician counseling.
format Article
id doaj-art-81cbac19625f42ae9cfa8a2559da4401
institution Kabale University
issn 2673-6470
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Digital
spelling doaj-art-81cbac19625f42ae9cfa8a2559da44012025-08-20T03:27:18ZengMDPI AGDigital2673-64702025-03-01521010.3390/digital5020010Comparative Evaluation of Artificial Intelligence Models for Contraceptive CounselingAnisha V. Patel0Sona Jasani1Abdelrahman AlAshqar2Rushabh H. Doshi3Kanhai Amin4Aisvarya Panakam5Ankita Patil6Sangini S. Sheth7Department of Obstetrics, Gynecology, and Reproductive Sciences, Yale School of Medicine, New Haven, CT 06510, USADepartment of Obstetrics, Gynecology, and Reproductive Sciences, Yale School of Medicine, New Haven, CT 06510, USADepartment of Obstetrics, Gynecology, and Reproductive Sciences, Yale School of Medicine, New Haven, CT 06510, USADepartment of Internal Medicine, Yale School of Medicine, New Haven, CT 06510, USADepartment of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT 06520, USADepartment of Obstetrics and Gynecology, University of Pittsburgh Medical Center, Pittsburgh, PA 15219, USADepartment of Medicine, Division of Women’s Health, Brigham and Women’s Hospital, Boston, MA 02115, USADepartment of Obstetrics, Gynecology, and Reproductive Sciences, Yale School of Medicine, New Haven, CT 06510, USABackground: As digital health resources become increasingly prevalent, assessing the quality of information provided by publicly available AI tools is vital for evidence-based patient education. Objective: This study evaluates the accuracy and readability of responses from four large language models—ChatGPT 4.0, ChatGPT 3.5, Google Bard, and Microsoft Bing—in providing contraceptive counseling. Methods: A cross-sectional analysis was conducted using standardized contraception questions, established readability indices, and a panel of blinded OB/GYN physician reviewers comparing model responses to an AAFP benchmark. Results: The models varied in readability and evidence adherence; notably, ChatGPT 3.5 provided more evidence-based responses than GPT-4.0, although all outputs exceeded the recommended 6th-grade reading level. Conclusions: Our findings underscore the need for the further refinement of LLMs to balance clinical accuracy with patient-friendly language, supporting their role as a supplement to clinician counseling.https://www.mdpi.com/2673-6470/5/2/10contraceptioncontraceptive counselingreproductive healthartificial intelligencelarge language modelsdigital health
spellingShingle Anisha V. Patel
Sona Jasani
Abdelrahman AlAshqar
Rushabh H. Doshi
Kanhai Amin
Aisvarya Panakam
Ankita Patil
Sangini S. Sheth
Comparative Evaluation of Artificial Intelligence Models for Contraceptive Counseling
Digital
contraception
contraceptive counseling
reproductive health
artificial intelligence
large language models
digital health
title Comparative Evaluation of Artificial Intelligence Models for Contraceptive Counseling
title_full Comparative Evaluation of Artificial Intelligence Models for Contraceptive Counseling
title_fullStr Comparative Evaluation of Artificial Intelligence Models for Contraceptive Counseling
title_full_unstemmed Comparative Evaluation of Artificial Intelligence Models for Contraceptive Counseling
title_short Comparative Evaluation of Artificial Intelligence Models for Contraceptive Counseling
title_sort comparative evaluation of artificial intelligence models for contraceptive counseling
topic contraception
contraceptive counseling
reproductive health
artificial intelligence
large language models
digital health
url https://www.mdpi.com/2673-6470/5/2/10
work_keys_str_mv AT anishavpatel comparativeevaluationofartificialintelligencemodelsforcontraceptivecounseling
AT sonajasani comparativeevaluationofartificialintelligencemodelsforcontraceptivecounseling
AT abdelrahmanalashqar comparativeevaluationofartificialintelligencemodelsforcontraceptivecounseling
AT rushabhhdoshi comparativeevaluationofartificialintelligencemodelsforcontraceptivecounseling
AT kanhaiamin comparativeevaluationofartificialintelligencemodelsforcontraceptivecounseling
AT aisvaryapanakam comparativeevaluationofartificialintelligencemodelsforcontraceptivecounseling
AT ankitapatil comparativeevaluationofartificialintelligencemodelsforcontraceptivecounseling
AT sanginissheth comparativeevaluationofartificialintelligencemodelsforcontraceptivecounseling