Performance of Machine Learning Algorithms on Automatic Summarization of Indonesian Language Texts

Automatic text summarization (ATS) has become an essential task for processing huge amounts of information efficiently. ATS has been extensively studied in resource-rich languages like English, but research on summarization for under-resourced languages, such as Bahasa Indonesia, is still limited. I...

Full description

Saved in:
Bibliographic Details
Main Authors: Galih Wiratmoko, Husni Thamrin, Endang Wahyu Pamungkas
Format: Article
Language:English
Published: Department of Informatics, UIN Sunan Gunung Djati Bandung 2025-05-01
Series:JOIN: Jurnal Online Informatika
Subjects:
Online Access:https://join.if.uinsgd.ac.id/index.php/join/article/view/1506
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849330260684832768
author Galih Wiratmoko
Husni Thamrin
Endang Wahyu Pamungkas
author_facet Galih Wiratmoko
Husni Thamrin
Endang Wahyu Pamungkas
author_sort Galih Wiratmoko
collection DOAJ
description Automatic text summarization (ATS) has become an essential task for processing huge amounts of information efficiently. ATS has been extensively studied in resource-rich languages like English, but research on summarization for under-resourced languages, such as Bahasa Indonesia, is still limited. Indonesian presents unique linguistic challenges, including its agglutinative structure, borrowed vocabulary, and limited availability of high-quality training data. This study conducts a comparative evaluation of extractive, abstractive, and hybrid models for Indonesian text summarization, utilizing the IndoSum dataset which contains 20,000 text-summary pairs. We tested several models including LSA (Latent Semantic Analysis), LexRank, T5, and BART, to assess their effectiveness in generating summaries. The results show that the LexRank+BERT hybrid model outperforms traditional extractive methods, achieving better ROUGE precision, recall, and F-measure scores. Among the abstractive methods, the T5-Large model demonstrated the best performance, producing more coherent and semantically rich summaries compared to other models. These findings suggest that hybrid and abstractive approaches are better suited for Indonesian text summarization, especially when leveraging large-scale pre-trained language models.
format Article
id doaj-art-db3aaab93ec54dd7abc2db6192876b57
institution Kabale University
issn 2528-1682
2527-9165
language English
publishDate 2025-05-01
publisher Department of Informatics, UIN Sunan Gunung Djati Bandung
record_format Article
series JOIN: Jurnal Online Informatika
spelling doaj-art-db3aaab93ec54dd7abc2db6192876b572025-08-20T03:46:58ZengDepartment of Informatics, UIN Sunan Gunung Djati BandungJOIN: Jurnal Online Informatika2528-16822527-91652025-05-0110119620410.15575/join.v10i1.15061511Performance of Machine Learning Algorithms on Automatic Summarization of Indonesian Language TextsGalih Wiratmoko0Husni Thamrin1https://orcid.org/0000-0001-5865-9113Endang Wahyu Pamungkas2Department of Informatics, Universitas Muhammadiyah MadiunDepartment of Informatic, Universitas Muhammadiyah SurakartaDepartment of Informatic, Universitas Muhammadiyah SurakartaAutomatic text summarization (ATS) has become an essential task for processing huge amounts of information efficiently. ATS has been extensively studied in resource-rich languages like English, but research on summarization for under-resourced languages, such as Bahasa Indonesia, is still limited. Indonesian presents unique linguistic challenges, including its agglutinative structure, borrowed vocabulary, and limited availability of high-quality training data. This study conducts a comparative evaluation of extractive, abstractive, and hybrid models for Indonesian text summarization, utilizing the IndoSum dataset which contains 20,000 text-summary pairs. We tested several models including LSA (Latent Semantic Analysis), LexRank, T5, and BART, to assess their effectiveness in generating summaries. The results show that the LexRank+BERT hybrid model outperforms traditional extractive methods, achieving better ROUGE precision, recall, and F-measure scores. Among the abstractive methods, the T5-Large model demonstrated the best performance, producing more coherent and semantically rich summaries compared to other models. These findings suggest that hybrid and abstractive approaches are better suited for Indonesian text summarization, especially when leveraging large-scale pre-trained language models.https://join.if.uinsgd.ac.id/index.php/join/article/view/1506abstractive algorithmsbahasa indonesiahybrid modelt5-modeltext summarization
spellingShingle Galih Wiratmoko
Husni Thamrin
Endang Wahyu Pamungkas
Performance of Machine Learning Algorithms on Automatic Summarization of Indonesian Language Texts
JOIN: Jurnal Online Informatika
abstractive algorithms
bahasa indonesia
hybrid model
t5-model
text summarization
title Performance of Machine Learning Algorithms on Automatic Summarization of Indonesian Language Texts
title_full Performance of Machine Learning Algorithms on Automatic Summarization of Indonesian Language Texts
title_fullStr Performance of Machine Learning Algorithms on Automatic Summarization of Indonesian Language Texts
title_full_unstemmed Performance of Machine Learning Algorithms on Automatic Summarization of Indonesian Language Texts
title_short Performance of Machine Learning Algorithms on Automatic Summarization of Indonesian Language Texts
title_sort performance of machine learning algorithms on automatic summarization of indonesian language texts
topic abstractive algorithms
bahasa indonesia
hybrid model
t5-model
text summarization
url https://join.if.uinsgd.ac.id/index.php/join/article/view/1506
work_keys_str_mv AT galihwiratmoko performanceofmachinelearningalgorithmsonautomaticsummarizationofindonesianlanguagetexts
AT husnithamrin performanceofmachinelearningalgorithmsonautomaticsummarizationofindonesianlanguagetexts
AT endangwahyupamungkas performanceofmachinelearningalgorithmsonautomaticsummarizationofindonesianlanguagetexts