Democratizing cost-effective, agentic artificial intelligence to multilingual medical summarization through knowledge distillation
Abstract The increasing demand for multilingual capabilities in healthcare technology highlights the critical need for AI solutions capable of handling underrepresented languages, such as Arabic, in clinical documentation. Arabic’s unique linguistic complexities—morphological richness, syntactic var...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-10451-x |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849333340471033856 |
|---|---|
| author | Chanseo Lee Sonu Kumar Kimon A. Vogt Muhammad Munshi Panindhra Tallapudi Antonia Vogt Hamzeh Awad Wasim Khan |
| author_facet | Chanseo Lee Sonu Kumar Kimon A. Vogt Muhammad Munshi Panindhra Tallapudi Antonia Vogt Hamzeh Awad Wasim Khan |
| author_sort | Chanseo Lee |
| collection | DOAJ |
| description | Abstract The increasing demand for multilingual capabilities in healthcare technology highlights the critical need for AI solutions capable of handling underrepresented languages, such as Arabic, in clinical documentation. Arabic’s unique linguistic complexities—morphological richness, syntactic variations, and diglossia—present significant challenges for foundational large language models (LLMs), especially in domain-specific tasks like medical summarization. This study introduces AraSum, a domain-specific AI agent built using a novel knowledge distillation framework that transforms large multilingual LLMs into lightweight, task-optimized small language models (SLMs). Leveraging a synthetic dataset of Arabic medical dialogues, AraSum demonstrates superior performance over JAIS-30B, a foundational Arabic LLM, across key evaluation metrics, including BLEU and ROUGE scores. AraSum also outperforms JAIS in Arabic-speaking evaluator assessments of accuracy, comprehensiveness, and clinical utility while maintaining comparable linguistic performance as measured by a modified PDQI-9 inventory. Beyond accuracy, AraSum achieves these results with significantly lower computational and environmental costs, demonstrating the feasibility of deploying resource-efficient AI models in low-resource settings for domain-specific tasks. This work underscores the potential of SLM-based agentic architectures for advancing multilingual healthcare, encouraging sustainable artificial intelligence, and fostering equity in access to care. |
| format | Article |
| id | doaj-art-70c2d0122850448e9a38bf76554e4b47 |
| institution | Kabale University |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-70c2d0122850448e9a38bf76554e4b472025-08-20T03:45:53ZengNature PortfolioScientific Reports2045-23222025-07-0115111010.1038/s41598-025-10451-xDemocratizing cost-effective, agentic artificial intelligence to multilingual medical summarization through knowledge distillationChanseo Lee0Sonu Kumar1Kimon A. Vogt2Muhammad Munshi3Panindhra Tallapudi4Antonia Vogt5Hamzeh Awad6Wasim Khan7Sporo HealthSporo HealthSporo HealthDepartment of Anesthesiology, Yale School of MedicineSporo HealthGirton College, University of CambridgeFaculty of Allied Medical Sciences, Middle East UniversityDepartment of Trauma and Orthopedic Surgery, Addenbrooke’s Hospital, University of CambridgeAbstract The increasing demand for multilingual capabilities in healthcare technology highlights the critical need for AI solutions capable of handling underrepresented languages, such as Arabic, in clinical documentation. Arabic’s unique linguistic complexities—morphological richness, syntactic variations, and diglossia—present significant challenges for foundational large language models (LLMs), especially in domain-specific tasks like medical summarization. This study introduces AraSum, a domain-specific AI agent built using a novel knowledge distillation framework that transforms large multilingual LLMs into lightweight, task-optimized small language models (SLMs). Leveraging a synthetic dataset of Arabic medical dialogues, AraSum demonstrates superior performance over JAIS-30B, a foundational Arabic LLM, across key evaluation metrics, including BLEU and ROUGE scores. AraSum also outperforms JAIS in Arabic-speaking evaluator assessments of accuracy, comprehensiveness, and clinical utility while maintaining comparable linguistic performance as measured by a modified PDQI-9 inventory. Beyond accuracy, AraSum achieves these results with significantly lower computational and environmental costs, demonstrating the feasibility of deploying resource-efficient AI models in low-resource settings for domain-specific tasks. This work underscores the potential of SLM-based agentic architectures for advancing multilingual healthcare, encouraging sustainable artificial intelligence, and fostering equity in access to care.https://doi.org/10.1038/s41598-025-10451-xArtificial IntelligenceSmall Language Models (SLMs)Clinical DocumentationKnowledge DistillationSustainability in AIAI Agents |
| spellingShingle | Chanseo Lee Sonu Kumar Kimon A. Vogt Muhammad Munshi Panindhra Tallapudi Antonia Vogt Hamzeh Awad Wasim Khan Democratizing cost-effective, agentic artificial intelligence to multilingual medical summarization through knowledge distillation Scientific Reports Artificial Intelligence Small Language Models (SLMs) Clinical Documentation Knowledge Distillation Sustainability in AI AI Agents |
| title | Democratizing cost-effective, agentic artificial intelligence to multilingual medical summarization through knowledge distillation |
| title_full | Democratizing cost-effective, agentic artificial intelligence to multilingual medical summarization through knowledge distillation |
| title_fullStr | Democratizing cost-effective, agentic artificial intelligence to multilingual medical summarization through knowledge distillation |
| title_full_unstemmed | Democratizing cost-effective, agentic artificial intelligence to multilingual medical summarization through knowledge distillation |
| title_short | Democratizing cost-effective, agentic artificial intelligence to multilingual medical summarization through knowledge distillation |
| title_sort | democratizing cost effective agentic artificial intelligence to multilingual medical summarization through knowledge distillation |
| topic | Artificial Intelligence Small Language Models (SLMs) Clinical Documentation Knowledge Distillation Sustainability in AI AI Agents |
| url | https://doi.org/10.1038/s41598-025-10451-x |
| work_keys_str_mv | AT chanseolee democratizingcosteffectiveagenticartificialintelligencetomultilingualmedicalsummarizationthroughknowledgedistillation AT sonukumar democratizingcosteffectiveagenticartificialintelligencetomultilingualmedicalsummarizationthroughknowledgedistillation AT kimonavogt democratizingcosteffectiveagenticartificialintelligencetomultilingualmedicalsummarizationthroughknowledgedistillation AT muhammadmunshi democratizingcosteffectiveagenticartificialintelligencetomultilingualmedicalsummarizationthroughknowledgedistillation AT panindhratallapudi democratizingcosteffectiveagenticartificialintelligencetomultilingualmedicalsummarizationthroughknowledgedistillation AT antoniavogt democratizingcosteffectiveagenticartificialintelligencetomultilingualmedicalsummarizationthroughknowledgedistillation AT hamzehawad democratizingcosteffectiveagenticartificialintelligencetomultilingualmedicalsummarizationthroughknowledgedistillation AT wasimkhan democratizingcosteffectiveagenticartificialintelligencetomultilingualmedicalsummarizationthroughknowledgedistillation |