Classification of Toraja, Batak and Ambon Languages using Decision Tree and Gradient Boost methods

With its rich diversity of ethnicities, cultures, races, and religions, Indonesia is one of the countries with the highest number of regional languages in the world. This linguistic diversity often leads to communication challenges, particularly when conveying information or engaging in textual conv...

Full description

Saved in:
Bibliographic Details
Main Authors: Bileam Mangalla, Suharyadi Suharyadi
Format: Article
Language:Indonesian
Published: Islamic University of Indragiri 2025-05-01
Series:Sistemasi: Jurnal Sistem Informasi
Subjects:
Online Access:https://sistemasi.ftik.unisi.ac.id/index.php/stmsi/article/view/5100
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With its rich diversity of ethnicities, cultures, races, and religions, Indonesia is one of the countries with the highest number of regional languages in the world. This linguistic diversity often leads to communication challenges, particularly when conveying information or engaging in textual conversations. This study aims to identify and classify the Toraja, Batak, and Ambon languages using machine learning-based computational methods. The techniques employed include Decision Tree and Gradient Boost algorithms to evaluate the accuracy of each model. The results demonstrate that both Decision Tree and Gradient Boost are effective in language identification, achieving accuracy rates above 77%. However, based on the confusion matrix analysis, the Gradient Boost method proved to be more effective, with an accuracy rate of 81.06%, compared to 78.39% achieved by the Decision Tree. These findings suggest that Gradient Boost offers better performance for classifying these regional languages.
ISSN:2302-8149
2540-9719