Learning-Free Unsupervised Extractive Summarization Model

Text summarization is an information condensation technique that abbreviates a source document to a few representative sentences with the intention to create a coherent summary containing relevant information of source corpora. This promising subject has been rapidly developed since the advent of de...

Full description

Saved in:
Bibliographic Details
Main Authors: Myeongjun Jang, Pilsung Kang
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9321308/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849311819350409216
author Myeongjun Jang
Pilsung Kang
author_facet Myeongjun Jang
Pilsung Kang
author_sort Myeongjun Jang
collection DOAJ
description Text summarization is an information condensation technique that abbreviates a source document to a few representative sentences with the intention to create a coherent summary containing relevant information of source corpora. This promising subject has been rapidly developed since the advent of deep learning. However, summarization models based on deep neural network have several critical shortcomings. First, a large amount of labeled training data is necessary. This problem is standard for low-resource languages in which publicly available labeled data do not exist. In addition, a significant amount of computational ability is required to train neural models with enormous network parameters. In this study, we propose a model called Learning Free Integer Programming Summarizer (LFIP-SUM), which is an unsupervised extractive summarization model. The advantage of our approach is that parameter training is unnecessary because the model does not require any labeled training data. To achieve this, we formulate an integer programming problem based on pre-trained sentence embedding vectors. We also use principal component analysis to automatically determine the number of sentences to be extracted and to evaluate the importance of each sentence. Experimental results demonstrate that the proposed model exhibits generally acceptable performance compared with deep learning summarization models although it does not learn any parameters during the model construction process.
format Article
id doaj-art-e4ac439b04f749eab4b8eb09602bfd3e
institution Kabale University
issn 2169-3536
language English
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-e4ac439b04f749eab4b8eb09602bfd3e2025-08-20T03:53:17ZengIEEEIEEE Access2169-35362021-01-019143581436810.1109/ACCESS.2021.30512379321308Learning-Free Unsupervised Extractive Summarization ModelMyeongjun Jang0https://orcid.org/0000-0002-9352-4799Pilsung Kang1https://orcid.org/0000-0001-7663-3937Department of Computer Science, University of Oxford, Oxford, U.K.School of Industrial Management Engineering, Korea University, Seoul, South KoreaText summarization is an information condensation technique that abbreviates a source document to a few representative sentences with the intention to create a coherent summary containing relevant information of source corpora. This promising subject has been rapidly developed since the advent of deep learning. However, summarization models based on deep neural network have several critical shortcomings. First, a large amount of labeled training data is necessary. This problem is standard for low-resource languages in which publicly available labeled data do not exist. In addition, a significant amount of computational ability is required to train neural models with enormous network parameters. In this study, we propose a model called Learning Free Integer Programming Summarizer (LFIP-SUM), which is an unsupervised extractive summarization model. The advantage of our approach is that parameter training is unnecessary because the model does not require any labeled training data. To achieve this, we formulate an integer programming problem based on pre-trained sentence embedding vectors. We also use principal component analysis to automatically determine the number of sentences to be extracted and to evaluate the importance of each sentence. Experimental results demonstrate that the proposed model exhibits generally acceptable performance compared with deep learning summarization models although it does not learn any parameters during the model construction process.https://ieeexplore.ieee.org/document/9321308/Text summarizationnatural language processingsentence representation vectorinteger linear programming
spellingShingle Myeongjun Jang
Pilsung Kang
Learning-Free Unsupervised Extractive Summarization Model
IEEE Access
Text summarization
natural language processing
sentence representation vector
integer linear programming
title Learning-Free Unsupervised Extractive Summarization Model
title_full Learning-Free Unsupervised Extractive Summarization Model
title_fullStr Learning-Free Unsupervised Extractive Summarization Model
title_full_unstemmed Learning-Free Unsupervised Extractive Summarization Model
title_short Learning-Free Unsupervised Extractive Summarization Model
title_sort learning free unsupervised extractive summarization model
topic Text summarization
natural language processing
sentence representation vector
integer linear programming
url https://ieeexplore.ieee.org/document/9321308/
work_keys_str_mv AT myeongjunjang learningfreeunsupervisedextractivesummarizationmodel
AT pilsungkang learningfreeunsupervisedextractivesummarizationmodel