PEGASUS-XL with saliency-guided scoring and long-input encoding for multi-document abstractive summarization

Abstract With the exponential growth of digital content, Multi-Document Summarization (MDS) has become increasingly critical for synthesizing dispersed information into coherent and contextually relevant summaries. This paper presents PEGASUS-XL, an enhanced abstractive summarization framework that...

Full description

Saved in:

Bibliographic Details
Main Authors:	Rawan Alsultan, Alaa Sagheer, Hala Hamdoun, Lamya Alshamlan, Latifah Alfadhli
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-07-01
Series:	Scientific Reports
Subjects:	Natural language processing Abstractive summarization Multi-document summarization Saliency modeling TF-IDF SBERT embeddings
Online Access:	https://doi.org/10.1038/s41598-025-11062-2
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849766494479581184
author	Rawan Alsultan Alaa Sagheer Hala Hamdoun Lamya Alshamlan Latifah Alfadhli
author_facet	Rawan Alsultan Alaa Sagheer Hala Hamdoun Lamya Alshamlan Latifah Alfadhli
author_sort	Rawan Alsultan
collection	DOAJ
description	Abstract With the exponential growth of digital content, Multi-Document Summarization (MDS) has become increasingly critical for synthesizing dispersed information into coherent and contextually relevant summaries. This paper presents PEGASUS-XL, an enhanced abstractive summarization framework that addresses key challenges in MDS, including salient content selection, redundancy reduction, factual consistency, and input length limitations. PEGASUS-XL is developed through a structured enhancement pipeline that integrates lexical-semantic saliency modeling with long-input encoding. It employs a hybrid scoring mechanism that combines TF-IDF and SBERT representations, modulated by a document-aware adaptive weighting scheme to dynamically balance lexical and semantic importance. To promote diversity and reduce redundancy, Maximal Marginal Relevance (MMR) is applied during content selection. To overcome the 1024-token limitation of standard Transformer models, Longformer is incorporated to enable efficient sparse attention over extended contexts. The vanilla PEGASUS model serves as the decoder and is fine-tuned on saliency-ranked, Longformer-encoded inputs to generate abstractive summaries. Extensive experiments on the Multi-News and XSum datasets demonstrate that PEGASUS-XL consistently outperforms strong baselines, including BART and PRIMERA, across multiple evaluation metrics (ROUGE, METEOR, BERTScore, and SBERT similarity). Ablation studies quantify the contribution of each component, and detailed error analysis identifies remaining issues such as factual drift and residual redundancy. Human evaluations further confirm that PEGASUS-XL produces summaries that are more coherent, informative, and faithful. Efficiency profiling shows that the framework achieves substantial quality gains without incurring disproportionate computational costs. Together, these contributions position PEGASUS-XL as a robust, scalable, and extensible solution for high-quality abstractive summarization in real-world multi-document scenarios.
format	Article
id	doaj-art-e2ead9ce80184e8c91974fd807dec30e
institution	DOAJ
issn	2045-2322
language	English
publishDate	2025-07-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-e2ead9ce80184e8c91974fd807dec30e2025-08-20T03:04:34ZengNature PortfolioScientific Reports2045-23222025-07-0115112810.1038/s41598-025-11062-2PEGASUS-XL with saliency-guided scoring and long-input encoding for multi-document abstractive summarizationRawan Alsultan0Alaa Sagheer1Hala Hamdoun2Lamya Alshamlan3Latifah Alfadhli4Department of Computer Science, College of Computer Sciences and Information Technology, King Faisal UniversityDepartment of Computer Science, College of Computer Sciences and Information Technology, King Faisal UniversityDepartment of Computer Science, College of Computer Sciences and Information Technology, King Faisal UniversityDepartment of Computer Science, College of Computer Sciences and Information Technology, King Faisal UniversityDepartment of Computer Science, College of Computer Sciences and Information Technology, King Faisal UniversityAbstract With the exponential growth of digital content, Multi-Document Summarization (MDS) has become increasingly critical for synthesizing dispersed information into coherent and contextually relevant summaries. This paper presents PEGASUS-XL, an enhanced abstractive summarization framework that addresses key challenges in MDS, including salient content selection, redundancy reduction, factual consistency, and input length limitations. PEGASUS-XL is developed through a structured enhancement pipeline that integrates lexical-semantic saliency modeling with long-input encoding. It employs a hybrid scoring mechanism that combines TF-IDF and SBERT representations, modulated by a document-aware adaptive weighting scheme to dynamically balance lexical and semantic importance. To promote diversity and reduce redundancy, Maximal Marginal Relevance (MMR) is applied during content selection. To overcome the 1024-token limitation of standard Transformer models, Longformer is incorporated to enable efficient sparse attention over extended contexts. The vanilla PEGASUS model serves as the decoder and is fine-tuned on saliency-ranked, Longformer-encoded inputs to generate abstractive summaries. Extensive experiments on the Multi-News and XSum datasets demonstrate that PEGASUS-XL consistently outperforms strong baselines, including BART and PRIMERA, across multiple evaluation metrics (ROUGE, METEOR, BERTScore, and SBERT similarity). Ablation studies quantify the contribution of each component, and detailed error analysis identifies remaining issues such as factual drift and residual redundancy. Human evaluations further confirm that PEGASUS-XL produces summaries that are more coherent, informative, and faithful. Efficiency profiling shows that the framework achieves substantial quality gains without incurring disproportionate computational costs. Together, these contributions position PEGASUS-XL as a robust, scalable, and extensible solution for high-quality abstractive summarization in real-world multi-document scenarios.https://doi.org/10.1038/s41598-025-11062-2Natural language processingAbstractive summarizationMulti-document summarizationSaliency modelingTF-IDFSBERT embeddings
spellingShingle	Rawan Alsultan Alaa Sagheer Hala Hamdoun Lamya Alshamlan Latifah Alfadhli PEGASUS-XL with saliency-guided scoring and long-input encoding for multi-document abstractive summarization Scientific Reports Natural language processing Abstractive summarization Multi-document summarization Saliency modeling TF-IDF SBERT embeddings
title	PEGASUS-XL with saliency-guided scoring and long-input encoding for multi-document abstractive summarization
title_full	PEGASUS-XL with saliency-guided scoring and long-input encoding for multi-document abstractive summarization
title_fullStr	PEGASUS-XL with saliency-guided scoring and long-input encoding for multi-document abstractive summarization
title_full_unstemmed	PEGASUS-XL with saliency-guided scoring and long-input encoding for multi-document abstractive summarization
title_short	PEGASUS-XL with saliency-guided scoring and long-input encoding for multi-document abstractive summarization
title_sort	pegasus xl with saliency guided scoring and long input encoding for multi document abstractive summarization
topic	Natural language processing Abstractive summarization Multi-document summarization Saliency modeling TF-IDF SBERT embeddings
url	https://doi.org/10.1038/s41598-025-11062-2
work_keys_str_mv	AT rawanalsultan pegasusxlwithsaliencyguidedscoringandlonginputencodingformultidocumentabstractivesummarization AT alaasagheer pegasusxlwithsaliencyguidedscoringandlonginputencodingformultidocumentabstractivesummarization AT halahamdoun pegasusxlwithsaliencyguidedscoringandlonginputencodingformultidocumentabstractivesummarization AT lamyaalshamlan pegasusxlwithsaliencyguidedscoringandlonginputencodingformultidocumentabstractivesummarization AT latifahalfadhli pegasusxlwithsaliencyguidedscoringandlonginputencodingformultidocumentabstractivesummarization

PEGASUS-XL with saliency-guided scoring and long-input encoding for multi-document abstractive summarization

Similar Items