Menstrual Health Education Using a Specialized Large Language Model in India: Development and Evaluation Study of MenstLLaMA

Abstract BackgroundThe quality and accessibility of menstrual health education (MHE) in low- and middle-income countries, including India, remain inadequate due to persistent challenges (eg, poverty, social stigma, and gender inequality). While community-driven initiatives hav...

Full description

Saved in:
Bibliographic Details
Main Authors: Prottay Kumar Adhikary, Isha Motiyani, Gayatri Oke, Maithili Joshi, Kanupriya Pathak, Salam Michael Singh, Tanmoy Chakraborty
Format: Article
Language:English
Published: JMIR Publications 2025-07-01
Series:Journal of Medical Internet Research
Online Access:https://www.jmir.org/2025/1/e71977
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849712800422690816
author Prottay Kumar Adhikary
Isha Motiyani
Gayatri Oke
Maithili Joshi
Kanupriya Pathak
Salam Michael Singh
Tanmoy Chakraborty
author_facet Prottay Kumar Adhikary
Isha Motiyani
Gayatri Oke
Maithili Joshi
Kanupriya Pathak
Salam Michael Singh
Tanmoy Chakraborty
author_sort Prottay Kumar Adhikary
collection DOAJ
description Abstract BackgroundThe quality and accessibility of menstrual health education (MHE) in low- and middle-income countries, including India, remain inadequate due to persistent challenges (eg, poverty, social stigma, and gender inequality). While community-driven initiatives have sought to raise awareness, artificial intelligence offers a scalable and efficient solution for disseminating accurate information. However, existing general-purpose large language models (LLMs) are often ill-suited for this task, tending to exhibit low accuracy, cultural insensitivity, and overly complex responses. To address these limitations, we developed MenstLLaMA—a specialized LLM tailored to the Indian context and designed to deliver MHE empathetically, supportively, and accessibly. ObjectiveWe aimed to develop and evaluate MenstLLaMA—a specialized LLM tailored to deliver accurate, culturally sensitive MHE—and assess its effectiveness in comparison to existing general-purpose models. MethodsWe curated MENST—a novel, domain-specific dataset comprising 23,820 question-answer pairs aggregated from medical websites, government portals, and health education resources. This dataset was systematically annotated with metadata capturing age groups, regions, topics, and sociocultural contexts. MenstLLaMA was developed by fine-tuning Meta-LLaMA-3-8B-Instruct, using parameter-efficient fine-tuning with low-rank adaptation to achieve domain alignment while minimizing computational overhead. We benchmarked MenstLLaMA against 9 state-of-the-art general-purpose LLMs, including GPT-4o, Claude-3, Gemini 1.5 Pro, and Mistral. The evaluation followed a multilayered framework: (1) automatic evaluation using standard natural language processing metrics (BLEU [Bilingual Evaluation Understudy], METEOR [Metric for Evaluation of Translation with Explicit Ordering], ROUGE-L [Recall-Oriented Understudy for Gisting Evaluation-Longest Common Subsequence], and BERTScore [Bidirectional Encoder Representations from Transformers Score]); (2) evaluation by clinical experts (N=18), who rated 200 expert-curated queries for accuracy and appropriateness; (3) medical practitioner interaction through the ISHA (Intelligent System for Menstrual Health Assistance) interactive chatbot, assessing qualitative dimensions (eg, relevance, understandability, preciseness, correctness,context sensitivity ResultsMenstLLaMA achieved the highest scores in BLEU (0.059) and BERTScore (0.911), outperforming GPT-4o (BLEU: 0.052, BERTScore: 0.896) and Claude-3 (BERTScore: 0.888). Clinical experts preferred MenstLLaMA’s responses over gold-standard answers in several culturally sensitive cases. In medical practitioners’ evaluations using the ISHA—the chat interface powered by MenstLLaMA—the model scored 3.5 in relevanceunderstandabilityprecisenesscorrectnesscontext sensitivityunderstandabilityrelevanceprecisenesscorrectnesstoneflowcontext sensitivity ConclusionsMenstLLaMA demonstrates exceptional accuracy, empathy, and user satisfaction within the domain of MHE, bridging critical gaps left by general-purpose LLMs. Its potential for integration into broader health education platforms positions it as a transformative tool for menstrual well-being. Future research could explore its long-term impact on public perception and menstrual hygiene practices, while expanding demographic representation, enhancing context sensitivity, and integrating multimodal and voice-based interactions to improve accessibility across diverse user groups.
format Article
id doaj-art-017b374ffc8e4dfeb08382cb271ec094
institution DOAJ
issn 1438-8871
language English
publishDate 2025-07-01
publisher JMIR Publications
record_format Article
series Journal of Medical Internet Research
spelling doaj-art-017b374ffc8e4dfeb08382cb271ec0942025-08-20T03:14:09ZengJMIR PublicationsJournal of Medical Internet Research1438-88712025-07-0127e71977e7197710.2196/71977Menstrual Health Education Using a Specialized Large Language Model in India: Development and Evaluation Study of MenstLLaMAProttay Kumar Adhikaryhttp://orcid.org/0000-0002-3025-9721Isha Motiyanihttp://orcid.org/0009-0006-3578-4644Gayatri Okehttp://orcid.org/0009-0007-5369-9387Maithili Joshihttp://orcid.org/0009-0000-9603-0762Kanupriya Pathakhttp://orcid.org/0009-0005-9746-8922Salam Michael Singhhttp://orcid.org/0000-0002-2249-6081Tanmoy Chakrabortyhttp://orcid.org/0000-0002-0210-0369 Abstract BackgroundThe quality and accessibility of menstrual health education (MHE) in low- and middle-income countries, including India, remain inadequate due to persistent challenges (eg, poverty, social stigma, and gender inequality). While community-driven initiatives have sought to raise awareness, artificial intelligence offers a scalable and efficient solution for disseminating accurate information. However, existing general-purpose large language models (LLMs) are often ill-suited for this task, tending to exhibit low accuracy, cultural insensitivity, and overly complex responses. To address these limitations, we developed MenstLLaMA—a specialized LLM tailored to the Indian context and designed to deliver MHE empathetically, supportively, and accessibly. ObjectiveWe aimed to develop and evaluate MenstLLaMA—a specialized LLM tailored to deliver accurate, culturally sensitive MHE—and assess its effectiveness in comparison to existing general-purpose models. MethodsWe curated MENST—a novel, domain-specific dataset comprising 23,820 question-answer pairs aggregated from medical websites, government portals, and health education resources. This dataset was systematically annotated with metadata capturing age groups, regions, topics, and sociocultural contexts. MenstLLaMA was developed by fine-tuning Meta-LLaMA-3-8B-Instruct, using parameter-efficient fine-tuning with low-rank adaptation to achieve domain alignment while minimizing computational overhead. We benchmarked MenstLLaMA against 9 state-of-the-art general-purpose LLMs, including GPT-4o, Claude-3, Gemini 1.5 Pro, and Mistral. The evaluation followed a multilayered framework: (1) automatic evaluation using standard natural language processing metrics (BLEU [Bilingual Evaluation Understudy], METEOR [Metric for Evaluation of Translation with Explicit Ordering], ROUGE-L [Recall-Oriented Understudy for Gisting Evaluation-Longest Common Subsequence], and BERTScore [Bidirectional Encoder Representations from Transformers Score]); (2) evaluation by clinical experts (N=18), who rated 200 expert-curated queries for accuracy and appropriateness; (3) medical practitioner interaction through the ISHA (Intelligent System for Menstrual Health Assistance) interactive chatbot, assessing qualitative dimensions (eg, relevance, understandability, preciseness, correctness,context sensitivity ResultsMenstLLaMA achieved the highest scores in BLEU (0.059) and BERTScore (0.911), outperforming GPT-4o (BLEU: 0.052, BERTScore: 0.896) and Claude-3 (BERTScore: 0.888). Clinical experts preferred MenstLLaMA’s responses over gold-standard answers in several culturally sensitive cases. In medical practitioners’ evaluations using the ISHA—the chat interface powered by MenstLLaMA—the model scored 3.5 in relevanceunderstandabilityprecisenesscorrectnesscontext sensitivityunderstandabilityrelevanceprecisenesscorrectnesstoneflowcontext sensitivity ConclusionsMenstLLaMA demonstrates exceptional accuracy, empathy, and user satisfaction within the domain of MHE, bridging critical gaps left by general-purpose LLMs. Its potential for integration into broader health education platforms positions it as a transformative tool for menstrual well-being. Future research could explore its long-term impact on public perception and menstrual hygiene practices, while expanding demographic representation, enhancing context sensitivity, and integrating multimodal and voice-based interactions to improve accessibility across diverse user groups.https://www.jmir.org/2025/1/e71977
spellingShingle Prottay Kumar Adhikary
Isha Motiyani
Gayatri Oke
Maithili Joshi
Kanupriya Pathak
Salam Michael Singh
Tanmoy Chakraborty
Menstrual Health Education Using a Specialized Large Language Model in India: Development and Evaluation Study of MenstLLaMA
Journal of Medical Internet Research
title Menstrual Health Education Using a Specialized Large Language Model in India: Development and Evaluation Study of MenstLLaMA
title_full Menstrual Health Education Using a Specialized Large Language Model in India: Development and Evaluation Study of MenstLLaMA
title_fullStr Menstrual Health Education Using a Specialized Large Language Model in India: Development and Evaluation Study of MenstLLaMA
title_full_unstemmed Menstrual Health Education Using a Specialized Large Language Model in India: Development and Evaluation Study of MenstLLaMA
title_short Menstrual Health Education Using a Specialized Large Language Model in India: Development and Evaluation Study of MenstLLaMA
title_sort menstrual health education using a specialized large language model in india development and evaluation study of menstllama
url https://www.jmir.org/2025/1/e71977
work_keys_str_mv AT prottaykumaradhikary menstrualhealtheducationusingaspecializedlargelanguagemodelinindiadevelopmentandevaluationstudyofmenstllama
AT ishamotiyani menstrualhealtheducationusingaspecializedlargelanguagemodelinindiadevelopmentandevaluationstudyofmenstllama
AT gayatrioke menstrualhealtheducationusingaspecializedlargelanguagemodelinindiadevelopmentandevaluationstudyofmenstllama
AT maithilijoshi menstrualhealtheducationusingaspecializedlargelanguagemodelinindiadevelopmentandevaluationstudyofmenstllama
AT kanupriyapathak menstrualhealtheducationusingaspecializedlargelanguagemodelinindiadevelopmentandevaluationstudyofmenstllama
AT salammichaelsingh menstrualhealtheducationusingaspecializedlargelanguagemodelinindiadevelopmentandevaluationstudyofmenstllama
AT tanmoychakraborty menstrualhealtheducationusingaspecializedlargelanguagemodelinindiadevelopmentandevaluationstudyofmenstllama