Democratizing cost-effective, agentic artificial intelligence to multilingual medical summarization through knowledge distillation

Abstract The increasing demand for multilingual capabilities in healthcare technology highlights the critical need for AI solutions capable of handling underrepresented languages, such as Arabic, in clinical documentation. Arabic’s unique linguistic complexities—morphological richness, syntactic var...

Full description

Saved in:
Bibliographic Details
Main Authors: Chanseo Lee, Sonu Kumar, Kimon A. Vogt, Muhammad Munshi, Panindhra Tallapudi, Antonia Vogt, Hamzeh Awad, Wasim Khan
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-10451-x
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849333340471033856
author Chanseo Lee
Sonu Kumar
Kimon A. Vogt
Muhammad Munshi
Panindhra Tallapudi
Antonia Vogt
Hamzeh Awad
Wasim Khan
author_facet Chanseo Lee
Sonu Kumar
Kimon A. Vogt
Muhammad Munshi
Panindhra Tallapudi
Antonia Vogt
Hamzeh Awad
Wasim Khan
author_sort Chanseo Lee
collection DOAJ
description Abstract The increasing demand for multilingual capabilities in healthcare technology highlights the critical need for AI solutions capable of handling underrepresented languages, such as Arabic, in clinical documentation. Arabic’s unique linguistic complexities—morphological richness, syntactic variations, and diglossia—present significant challenges for foundational large language models (LLMs), especially in domain-specific tasks like medical summarization. This study introduces AraSum, a domain-specific AI agent built using a novel knowledge distillation framework that transforms large multilingual LLMs into lightweight, task-optimized small language models (SLMs). Leveraging a synthetic dataset of Arabic medical dialogues, AraSum demonstrates superior performance over JAIS-30B, a foundational Arabic LLM, across key evaluation metrics, including BLEU and ROUGE scores. AraSum also outperforms JAIS in Arabic-speaking evaluator assessments of accuracy, comprehensiveness, and clinical utility while maintaining comparable linguistic performance as measured by a modified PDQI-9 inventory. Beyond accuracy, AraSum achieves these results with significantly lower computational and environmental costs, demonstrating the feasibility of deploying resource-efficient AI models in low-resource settings for domain-specific tasks. This work underscores the potential of SLM-based agentic architectures for advancing multilingual healthcare, encouraging sustainable artificial intelligence, and fostering equity in access to care.
format Article
id doaj-art-70c2d0122850448e9a38bf76554e4b47
institution Kabale University
issn 2045-2322
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-70c2d0122850448e9a38bf76554e4b472025-08-20T03:45:53ZengNature PortfolioScientific Reports2045-23222025-07-0115111010.1038/s41598-025-10451-xDemocratizing cost-effective, agentic artificial intelligence to multilingual medical summarization through knowledge distillationChanseo Lee0Sonu Kumar1Kimon A. Vogt2Muhammad Munshi3Panindhra Tallapudi4Antonia Vogt5Hamzeh Awad6Wasim Khan7Sporo HealthSporo HealthSporo HealthDepartment of Anesthesiology, Yale School of MedicineSporo HealthGirton College, University of CambridgeFaculty of Allied Medical Sciences, Middle East UniversityDepartment of Trauma and Orthopedic Surgery, Addenbrooke’s Hospital, University of CambridgeAbstract The increasing demand for multilingual capabilities in healthcare technology highlights the critical need for AI solutions capable of handling underrepresented languages, such as Arabic, in clinical documentation. Arabic’s unique linguistic complexities—morphological richness, syntactic variations, and diglossia—present significant challenges for foundational large language models (LLMs), especially in domain-specific tasks like medical summarization. This study introduces AraSum, a domain-specific AI agent built using a novel knowledge distillation framework that transforms large multilingual LLMs into lightweight, task-optimized small language models (SLMs). Leveraging a synthetic dataset of Arabic medical dialogues, AraSum demonstrates superior performance over JAIS-30B, a foundational Arabic LLM, across key evaluation metrics, including BLEU and ROUGE scores. AraSum also outperforms JAIS in Arabic-speaking evaluator assessments of accuracy, comprehensiveness, and clinical utility while maintaining comparable linguistic performance as measured by a modified PDQI-9 inventory. Beyond accuracy, AraSum achieves these results with significantly lower computational and environmental costs, demonstrating the feasibility of deploying resource-efficient AI models in low-resource settings for domain-specific tasks. This work underscores the potential of SLM-based agentic architectures for advancing multilingual healthcare, encouraging sustainable artificial intelligence, and fostering equity in access to care.https://doi.org/10.1038/s41598-025-10451-xArtificial IntelligenceSmall Language Models (SLMs)Clinical DocumentationKnowledge DistillationSustainability in AIAI Agents
spellingShingle Chanseo Lee
Sonu Kumar
Kimon A. Vogt
Muhammad Munshi
Panindhra Tallapudi
Antonia Vogt
Hamzeh Awad
Wasim Khan
Democratizing cost-effective, agentic artificial intelligence to multilingual medical summarization through knowledge distillation
Scientific Reports
Artificial Intelligence
Small Language Models (SLMs)
Clinical Documentation
Knowledge Distillation
Sustainability in AI
AI Agents
title Democratizing cost-effective, agentic artificial intelligence to multilingual medical summarization through knowledge distillation
title_full Democratizing cost-effective, agentic artificial intelligence to multilingual medical summarization through knowledge distillation
title_fullStr Democratizing cost-effective, agentic artificial intelligence to multilingual medical summarization through knowledge distillation
title_full_unstemmed Democratizing cost-effective, agentic artificial intelligence to multilingual medical summarization through knowledge distillation
title_short Democratizing cost-effective, agentic artificial intelligence to multilingual medical summarization through knowledge distillation
title_sort democratizing cost effective agentic artificial intelligence to multilingual medical summarization through knowledge distillation
topic Artificial Intelligence
Small Language Models (SLMs)
Clinical Documentation
Knowledge Distillation
Sustainability in AI
AI Agents
url https://doi.org/10.1038/s41598-025-10451-x
work_keys_str_mv AT chanseolee democratizingcosteffectiveagenticartificialintelligencetomultilingualmedicalsummarizationthroughknowledgedistillation
AT sonukumar democratizingcosteffectiveagenticartificialintelligencetomultilingualmedicalsummarizationthroughknowledgedistillation
AT kimonavogt democratizingcosteffectiveagenticartificialintelligencetomultilingualmedicalsummarizationthroughknowledgedistillation
AT muhammadmunshi democratizingcosteffectiveagenticartificialintelligencetomultilingualmedicalsummarizationthroughknowledgedistillation
AT panindhratallapudi democratizingcosteffectiveagenticartificialintelligencetomultilingualmedicalsummarizationthroughknowledgedistillation
AT antoniavogt democratizingcosteffectiveagenticartificialintelligencetomultilingualmedicalsummarizationthroughknowledgedistillation
AT hamzehawad democratizingcosteffectiveagenticartificialintelligencetomultilingualmedicalsummarizationthroughknowledgedistillation
AT wasimkhan democratizingcosteffectiveagenticartificialintelligencetomultilingualmedicalsummarizationthroughknowledgedistillation