Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning
Abstract Purpose Online consumer health forums offer an alternative source of health-related information for internet users seeking specific details that may not be readily available through articles or other one-way communication channels. However, the effectiveness of these forums can be constrain...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-05-01
|
| Series: | Journal of Biomedical Semantics |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s13326-025-00329-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849311890987024384 |
|---|---|
| author | Tsaqif Naufal Rahmad Mahendra Alfan Farizki Wicaksono |
| author_facet | Tsaqif Naufal Rahmad Mahendra Alfan Farizki Wicaksono |
| author_sort | Tsaqif Naufal |
| collection | DOAJ |
| description | Abstract Purpose Online consumer health forums offer an alternative source of health-related information for internet users seeking specific details that may not be readily available through articles or other one-way communication channels. However, the effectiveness of these forums can be constrained by the limited number of healthcare professionals actively participating, which can impact response times to user inquiries. One potential solution to this issue is the integration of a semi-automatic system. A critical component of such a system is question processing, which often involves sentence recognition (SR), medical entity recognition (MER), and keyphrase extraction (KE) modules. We posit that the development of these three modules would enable the system to identify critical components of the question, thereby facilitating a deeper understanding of the question, and allowing for the re-formulation of more effective questions with extracted key information. Methods This work contributes to two key aspects related to these three tasks. First, we expand and publicly release an Indonesian dataset for each task. Second, we establish a baseline for all three tasks within the Indonesian language domain by employing transformer-based models with nine distinct encoder variations. Our feature studies revealed an interdependence among these three tasks. Consequently, we propose several multi-task learning (MTL) models, both in pairwise and three-way configurations, incorporating parallel and hierarchical architectures. Results Using F1-score at the chunk level, the inter-annotator agreements for SR, MER, and KE tasks were $$88{.}61\%, 64{.}83\%$$ 88.61 % , 64.83 % , and $$35{.}01\%$$ 35.01 % respectively. In single-task learning (STL) settings, the best performance for each task was achieved by different model, with $$\textrm{IndoNLU}_{\textrm{LARGE}}$$ IndoNLU LARGE obtained the highest average score. These results suggested that a larger model did not always perform better. We also found no indication of which ones between Indonesian and multilingual language models that generally performed better for our tasks. In pairwise MTL settings, we found that pairing tasks could outperform the STL baseline for all three tasks. Despite varying loss weights across our three-way MTL models, we did not identify a consistent pattern. While some configurations improved MER and KE performance, none surpassed the best pairwise MTL model for the SR task. Conclusion We extended an Indonesian dataset for SR, MER, and KE tasks, resulted in 1, 173 labeled data points which splitted into 773 training instances, 200 validation instances, and 200 testing instances. We then used transformer-based models to set a baseline for all three tasks. Our MTL experiments suggested that additional information regarding the other two tasks could help the learning process for MER and KE tasks, while had only a small effect for SR task. |
| format | Article |
| id | doaj-art-62b6ed8befa9401ea66e6111123f3119 |
| institution | Kabale University |
| issn | 2041-1480 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | BMC |
| record_format | Article |
| series | Journal of Biomedical Semantics |
| spelling | doaj-art-62b6ed8befa9401ea66e6111123f31192025-08-20T03:53:16ZengBMCJournal of Biomedical Semantics2041-14802025-05-0116111810.1186/s13326-025-00329-2Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learningTsaqif Naufal0Rahmad Mahendra1Alfan Farizki Wicaksono2Faculty of Computer Science, Universitas IndonesiaFaculty of Computer Science, Universitas IndonesiaFaculty of Computer Science, Universitas IndonesiaAbstract Purpose Online consumer health forums offer an alternative source of health-related information for internet users seeking specific details that may not be readily available through articles or other one-way communication channels. However, the effectiveness of these forums can be constrained by the limited number of healthcare professionals actively participating, which can impact response times to user inquiries. One potential solution to this issue is the integration of a semi-automatic system. A critical component of such a system is question processing, which often involves sentence recognition (SR), medical entity recognition (MER), and keyphrase extraction (KE) modules. We posit that the development of these three modules would enable the system to identify critical components of the question, thereby facilitating a deeper understanding of the question, and allowing for the re-formulation of more effective questions with extracted key information. Methods This work contributes to two key aspects related to these three tasks. First, we expand and publicly release an Indonesian dataset for each task. Second, we establish a baseline for all three tasks within the Indonesian language domain by employing transformer-based models with nine distinct encoder variations. Our feature studies revealed an interdependence among these three tasks. Consequently, we propose several multi-task learning (MTL) models, both in pairwise and three-way configurations, incorporating parallel and hierarchical architectures. Results Using F1-score at the chunk level, the inter-annotator agreements for SR, MER, and KE tasks were $$88{.}61\%, 64{.}83\%$$ 88.61 % , 64.83 % , and $$35{.}01\%$$ 35.01 % respectively. In single-task learning (STL) settings, the best performance for each task was achieved by different model, with $$\textrm{IndoNLU}_{\textrm{LARGE}}$$ IndoNLU LARGE obtained the highest average score. These results suggested that a larger model did not always perform better. We also found no indication of which ones between Indonesian and multilingual language models that generally performed better for our tasks. In pairwise MTL settings, we found that pairing tasks could outperform the STL baseline for all three tasks. Despite varying loss weights across our three-way MTL models, we did not identify a consistent pattern. While some configurations improved MER and KE performance, none surpassed the best pairwise MTL model for the SR task. Conclusion We extended an Indonesian dataset for SR, MER, and KE tasks, resulted in 1, 173 labeled data points which splitted into 773 training instances, 200 validation instances, and 200 testing instances. We then used transformer-based models to set a baseline for all three tasks. Our MTL experiments suggested that additional information regarding the other two tasks could help the learning process for MER and KE tasks, while had only a small effect for SR task.https://doi.org/10.1186/s13326-025-00329-2Consumer health question-answering systemSentence recognitionMedical entity recognitionKeyphrase extractionMulti-task learning |
| spellingShingle | Tsaqif Naufal Rahmad Mahendra Alfan Farizki Wicaksono Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning Journal of Biomedical Semantics Consumer health question-answering system Sentence recognition Medical entity recognition Keyphrase extraction Multi-task learning |
| title | Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning |
| title_full | Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning |
| title_fullStr | Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning |
| title_full_unstemmed | Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning |
| title_short | Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning |
| title_sort | sentences entities and keyphrases extraction from consumer health forums using multi task learning |
| topic | Consumer health question-answering system Sentence recognition Medical entity recognition Keyphrase extraction Multi-task learning |
| url | https://doi.org/10.1186/s13326-025-00329-2 |
| work_keys_str_mv | AT tsaqifnaufal sentencesentitiesandkeyphrasesextractionfromconsumerhealthforumsusingmultitasklearning AT rahmadmahendra sentencesentitiesandkeyphrasesextractionfromconsumerhealthforumsusingmultitasklearning AT alfanfarizkiwicaksono sentencesentitiesandkeyphrasesextractionfromconsumerhealthforumsusingmultitasklearning |