Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning

Abstract Purpose Online consumer health forums offer an alternative source of health-related information for internet users seeking specific details that may not be readily available through articles or other one-way communication channels. However, the effectiveness of these forums can be constrain...

Full description

Saved in:

Bibliographic Details
Main Authors:	Tsaqif Naufal, Rahmad Mahendra, Alfan Farizki Wicaksono
Format:	Article
Language:	English
Published:	BMC 2025-05-01
Series:	Journal of Biomedical Semantics
Subjects:	Consumer health question-answering system Sentence recognition Medical entity recognition Keyphrase extraction Multi-task learning
Online Access:	https://doi.org/10.1186/s13326-025-00329-2
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849311890987024384
author	Tsaqif Naufal Rahmad Mahendra Alfan Farizki Wicaksono
author_facet	Tsaqif Naufal Rahmad Mahendra Alfan Farizki Wicaksono
author_sort	Tsaqif Naufal
collection	DOAJ
description	Abstract Purpose Online consumer health forums offer an alternative source of health-related information for internet users seeking specific details that may not be readily available through articles or other one-way communication channels. However, the effectiveness of these forums can be constrained by the limited number of healthcare professionals actively participating, which can impact response times to user inquiries. One potential solution to this issue is the integration of a semi-automatic system. A critical component of such a system is question processing, which often involves sentence recognition (SR), medical entity recognition (MER), and keyphrase extraction (KE) modules. We posit that the development of these three modules would enable the system to identify critical components of the question, thereby facilitating a deeper understanding of the question, and allowing for the re-formulation of more effective questions with extracted key information. Methods This work contributes to two key aspects related to these three tasks. First, we expand and publicly release an Indonesian dataset for each task. Second, we establish a baseline for all three tasks within the Indonesian language domain by employing transformer-based models with nine distinct encoder variations. Our feature studies revealed an interdependence among these three tasks. Consequently, we propose several multi-task learning (MTL) models, both in pairwise and three-way configurations, incorporating parallel and hierarchical architectures. Results Using F1-score at the chunk level, the inter-annotator agreements for SR, MER, and KE tasks were $$88{.}61\%, 64{.}83\%$$ 88.61 % , 64.83 % , and $$35{.}01\%$$ 35.01 % respectively. In single-task learning (STL) settings, the best performance for each task was achieved by different model, with $$\textrm{IndoNLU}_{\textrm{LARGE}}$$ IndoNLU LARGE obtained the highest average score. These results suggested that a larger model did not always perform better. We also found no indication of which ones between Indonesian and multilingual language models that generally performed better for our tasks. In pairwise MTL settings, we found that pairing tasks could outperform the STL baseline for all three tasks. Despite varying loss weights across our three-way MTL models, we did not identify a consistent pattern. While some configurations improved MER and KE performance, none surpassed the best pairwise MTL model for the SR task. Conclusion We extended an Indonesian dataset for SR, MER, and KE tasks, resulted in 1, 173 labeled data points which splitted into 773 training instances, 200 validation instances, and 200 testing instances. We then used transformer-based models to set a baseline for all three tasks. Our MTL experiments suggested that additional information regarding the other two tasks could help the learning process for MER and KE tasks, while had only a small effect for SR task.
format	Article
id	doaj-art-62b6ed8befa9401ea66e6111123f3119
institution	Kabale University
issn	2041-1480
language	English
publishDate	2025-05-01
publisher	BMC
record_format	Article
series	Journal of Biomedical Semantics
spelling	doaj-art-62b6ed8befa9401ea66e6111123f31192025-08-20T03:53:16ZengBMCJournal of Biomedical Semantics2041-14802025-05-0116111810.1186/s13326-025-00329-2Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learningTsaqif Naufal0Rahmad Mahendra1Alfan Farizki Wicaksono2Faculty of Computer Science, Universitas IndonesiaFaculty of Computer Science, Universitas IndonesiaFaculty of Computer Science, Universitas IndonesiaAbstract Purpose Online consumer health forums offer an alternative source of health-related information for internet users seeking specific details that may not be readily available through articles or other one-way communication channels. However, the effectiveness of these forums can be constrained by the limited number of healthcare professionals actively participating, which can impact response times to user inquiries. One potential solution to this issue is the integration of a semi-automatic system. A critical component of such a system is question processing, which often involves sentence recognition (SR), medical entity recognition (MER), and keyphrase extraction (KE) modules. We posit that the development of these three modules would enable the system to identify critical components of the question, thereby facilitating a deeper understanding of the question, and allowing for the re-formulation of more effective questions with extracted key information. Methods This work contributes to two key aspects related to these three tasks. First, we expand and publicly release an Indonesian dataset for each task. Second, we establish a baseline for all three tasks within the Indonesian language domain by employing transformer-based models with nine distinct encoder variations. Our feature studies revealed an interdependence among these three tasks. Consequently, we propose several multi-task learning (MTL) models, both in pairwise and three-way configurations, incorporating parallel and hierarchical architectures. Results Using F1-score at the chunk level, the inter-annotator agreements for SR, MER, and KE tasks were $$88{.}61\%, 64{.}83\%$$ 88.61 % , 64.83 % , and $$35{.}01\%$$ 35.01 % respectively. In single-task learning (STL) settings, the best performance for each task was achieved by different model, with $$\textrm{IndoNLU}_{\textrm{LARGE}}$$ IndoNLU LARGE obtained the highest average score. These results suggested that a larger model did not always perform better. We also found no indication of which ones between Indonesian and multilingual language models that generally performed better for our tasks. In pairwise MTL settings, we found that pairing tasks could outperform the STL baseline for all three tasks. Despite varying loss weights across our three-way MTL models, we did not identify a consistent pattern. While some configurations improved MER and KE performance, none surpassed the best pairwise MTL model for the SR task. Conclusion We extended an Indonesian dataset for SR, MER, and KE tasks, resulted in 1, 173 labeled data points which splitted into 773 training instances, 200 validation instances, and 200 testing instances. We then used transformer-based models to set a baseline for all three tasks. Our MTL experiments suggested that additional information regarding the other two tasks could help the learning process for MER and KE tasks, while had only a small effect for SR task.https://doi.org/10.1186/s13326-025-00329-2Consumer health question-answering systemSentence recognitionMedical entity recognitionKeyphrase extractionMulti-task learning
spellingShingle	Tsaqif Naufal Rahmad Mahendra Alfan Farizki Wicaksono Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning Journal of Biomedical Semantics Consumer health question-answering system Sentence recognition Medical entity recognition Keyphrase extraction Multi-task learning
title	Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning
title_full	Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning
title_fullStr	Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning
title_full_unstemmed	Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning
title_short	Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning
title_sort	sentences entities and keyphrases extraction from consumer health forums using multi task learning
topic	Consumer health question-answering system Sentence recognition Medical entity recognition Keyphrase extraction Multi-task learning
url	https://doi.org/10.1186/s13326-025-00329-2
work_keys_str_mv	AT tsaqifnaufal sentencesentitiesandkeyphrasesextractionfromconsumerhealthforumsusingmultitasklearning AT rahmadmahendra sentencesentitiesandkeyphrasesextractionfromconsumerhealthforumsusingmultitasklearning AT alfanfarizkiwicaksono sentencesentitiesandkeyphrasesextractionfromconsumerhealthforumsusingmultitasklearning

Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning

Similar Items