BeliN: A novel corpus for Bengali religious news headline generation using contextual feature fusion
Automatic text summarization, particularly headline generation, remains a critical yet under-explored area for Bengali religious news. Existing approaches to headline generation typically rely solely on the article content, overlooking crucial contextual features such as sentiment, category, and asp...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-06-01
|
| Series: | Natural Language Processing Journal |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2949719125000147 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Automatic text summarization, particularly headline generation, remains a critical yet under-explored area for Bengali religious news. Existing approaches to headline generation typically rely solely on the article content, overlooking crucial contextual features such as sentiment, category, and aspect. This limitation significantly hinders their effectiveness and overall performance. This study addresses this limitation by introducing a novel corpus, BeliN (Bengali Religious News) – comprising religious news articles from prominent Bangladeshi online newspapers, and MultiGen – a contextual multi-input feature fusion headline generation approach. Leveraging transformer-based pre-trained language models such as BanglaT5, mBART, mT5, and mT0, MultiGen integrates additional contextual features – including category, aspect, and sentiment – with the news content. This fusion enables the model to capture critical contextual information often overlooked by traditional methods. Experimental results demonstrate the superiority of MultiGen over the baseline approach that uses only news content, achieving a BLEU score of 18.61 and ROUGE-L score of 24.19, compared to baseline approach scores of 16.08 and 23.08, respectively. These findings underscore the importance of incorporating contextual features in headline generation for low-resource languages. By bridging linguistic and cultural gaps, this research advances natural language processing for Bengali and other under-represented languages. To promote reproducibility and further exploration, the dataset and implementation code are publicly accessible at https://github.com/akabircs/BeliN. |
|---|---|
| ISSN: | 2949-7191 |