Comparative Evaluation of Artificial Intelligence Models for Contraceptive Counseling
Background: As digital health resources become increasingly prevalent, assessing the quality of information provided by publicly available AI tools is vital for evidence-based patient education. Objective: This study evaluates the accuracy and readability of responses from four large language models...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Digital |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2673-6470/5/2/10 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849432659269255168 |
|---|---|
| author | Anisha V. Patel Sona Jasani Abdelrahman AlAshqar Rushabh H. Doshi Kanhai Amin Aisvarya Panakam Ankita Patil Sangini S. Sheth |
| author_facet | Anisha V. Patel Sona Jasani Abdelrahman AlAshqar Rushabh H. Doshi Kanhai Amin Aisvarya Panakam Ankita Patil Sangini S. Sheth |
| author_sort | Anisha V. Patel |
| collection | DOAJ |
| description | Background: As digital health resources become increasingly prevalent, assessing the quality of information provided by publicly available AI tools is vital for evidence-based patient education. Objective: This study evaluates the accuracy and readability of responses from four large language models—ChatGPT 4.0, ChatGPT 3.5, Google Bard, and Microsoft Bing—in providing contraceptive counseling. Methods: A cross-sectional analysis was conducted using standardized contraception questions, established readability indices, and a panel of blinded OB/GYN physician reviewers comparing model responses to an AAFP benchmark. Results: The models varied in readability and evidence adherence; notably, ChatGPT 3.5 provided more evidence-based responses than GPT-4.0, although all outputs exceeded the recommended 6th-grade reading level. Conclusions: Our findings underscore the need for the further refinement of LLMs to balance clinical accuracy with patient-friendly language, supporting their role as a supplement to clinician counseling. |
| format | Article |
| id | doaj-art-81cbac19625f42ae9cfa8a2559da4401 |
| institution | Kabale University |
| issn | 2673-6470 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Digital |
| spelling | doaj-art-81cbac19625f42ae9cfa8a2559da44012025-08-20T03:27:18ZengMDPI AGDigital2673-64702025-03-01521010.3390/digital5020010Comparative Evaluation of Artificial Intelligence Models for Contraceptive CounselingAnisha V. Patel0Sona Jasani1Abdelrahman AlAshqar2Rushabh H. Doshi3Kanhai Amin4Aisvarya Panakam5Ankita Patil6Sangini S. Sheth7Department of Obstetrics, Gynecology, and Reproductive Sciences, Yale School of Medicine, New Haven, CT 06510, USADepartment of Obstetrics, Gynecology, and Reproductive Sciences, Yale School of Medicine, New Haven, CT 06510, USADepartment of Obstetrics, Gynecology, and Reproductive Sciences, Yale School of Medicine, New Haven, CT 06510, USADepartment of Internal Medicine, Yale School of Medicine, New Haven, CT 06510, USADepartment of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT 06520, USADepartment of Obstetrics and Gynecology, University of Pittsburgh Medical Center, Pittsburgh, PA 15219, USADepartment of Medicine, Division of Women’s Health, Brigham and Women’s Hospital, Boston, MA 02115, USADepartment of Obstetrics, Gynecology, and Reproductive Sciences, Yale School of Medicine, New Haven, CT 06510, USABackground: As digital health resources become increasingly prevalent, assessing the quality of information provided by publicly available AI tools is vital for evidence-based patient education. Objective: This study evaluates the accuracy and readability of responses from four large language models—ChatGPT 4.0, ChatGPT 3.5, Google Bard, and Microsoft Bing—in providing contraceptive counseling. Methods: A cross-sectional analysis was conducted using standardized contraception questions, established readability indices, and a panel of blinded OB/GYN physician reviewers comparing model responses to an AAFP benchmark. Results: The models varied in readability and evidence adherence; notably, ChatGPT 3.5 provided more evidence-based responses than GPT-4.0, although all outputs exceeded the recommended 6th-grade reading level. Conclusions: Our findings underscore the need for the further refinement of LLMs to balance clinical accuracy with patient-friendly language, supporting their role as a supplement to clinician counseling.https://www.mdpi.com/2673-6470/5/2/10contraceptioncontraceptive counselingreproductive healthartificial intelligencelarge language modelsdigital health |
| spellingShingle | Anisha V. Patel Sona Jasani Abdelrahman AlAshqar Rushabh H. Doshi Kanhai Amin Aisvarya Panakam Ankita Patil Sangini S. Sheth Comparative Evaluation of Artificial Intelligence Models for Contraceptive Counseling Digital contraception contraceptive counseling reproductive health artificial intelligence large language models digital health |
| title | Comparative Evaluation of Artificial Intelligence Models for Contraceptive Counseling |
| title_full | Comparative Evaluation of Artificial Intelligence Models for Contraceptive Counseling |
| title_fullStr | Comparative Evaluation of Artificial Intelligence Models for Contraceptive Counseling |
| title_full_unstemmed | Comparative Evaluation of Artificial Intelligence Models for Contraceptive Counseling |
| title_short | Comparative Evaluation of Artificial Intelligence Models for Contraceptive Counseling |
| title_sort | comparative evaluation of artificial intelligence models for contraceptive counseling |
| topic | contraception contraceptive counseling reproductive health artificial intelligence large language models digital health |
| url | https://www.mdpi.com/2673-6470/5/2/10 |
| work_keys_str_mv | AT anishavpatel comparativeevaluationofartificialintelligencemodelsforcontraceptivecounseling AT sonajasani comparativeevaluationofartificialintelligencemodelsforcontraceptivecounseling AT abdelrahmanalashqar comparativeevaluationofartificialintelligencemodelsforcontraceptivecounseling AT rushabhhdoshi comparativeevaluationofartificialintelligencemodelsforcontraceptivecounseling AT kanhaiamin comparativeevaluationofartificialintelligencemodelsforcontraceptivecounseling AT aisvaryapanakam comparativeevaluationofartificialintelligencemodelsforcontraceptivecounseling AT ankitapatil comparativeevaluationofartificialintelligencemodelsforcontraceptivecounseling AT sanginissheth comparativeevaluationofartificialintelligencemodelsforcontraceptivecounseling |