CABLE NEWS NETWORK (CNN) ARTICLES CLASSIFICATION USING RANDOM FOREST ALGORITHM WITH HYPERPARAMETER OPTIMIZATION

The growth of news articles on the internet occurs in a short period with large amounts so necessary to be grouped into several categories for easy access. There is a method for grouping news articles, namely classification. One of the classification methods is random forest which is built on decisi...

Full description

Saved in:
Bibliographic Details
Main Authors: Dewi Retno Sari Saputro, Krisna Sidiq
Format: Article
Language:English
Published: Universitas Pattimura 2023-06-01
Series:Barekeng
Subjects:
Online Access:https://ojs3.unpatti.ac.id/index.php/barekeng/article/view/7819
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849402377502720000
author Dewi Retno Sari Saputro
Krisna Sidiq
author_facet Dewi Retno Sari Saputro
Krisna Sidiq
author_sort Dewi Retno Sari Saputro
collection DOAJ
description The growth of news articles on the internet occurs in a short period with large amounts so necessary to be grouped into several categories for easy access. There is a method for grouping news articles, namely classification. One of the classification methods is random forest which is built on decision tree. This research discusses the application of random forest as a method of classifying news articles into six categories, these are business, entertainment, health, politics, sport, and news. The data used is Cable News Network (CNN) articles from 2011 to 2022. The data is in form of text and has large amounts so good handling is needed to avoid overfitting and underfitting. Random forest is proper to apply to the data because the algorithm works very well on large amounts of data. However, random forest has a difficult interpretation if the combination of parameters is not appropriate in the data processing. Therefore, hyperparameter optimization is needed to discover the best combination of parameters in the random forest. This research uses search cross-validation (SearchCV) method to optimize hyperparameters in the random forest by testing the combinations one by one and validating those. Then we obtain the classification of news articles into six categories with an accuracy value of 0.81 on training and 0.76 on testing.
format Article
id doaj-art-3e952a296cc84f0bb6d6f7cd9a2c077b
institution Kabale University
issn 1978-7227
2615-3017
language English
publishDate 2023-06-01
publisher Universitas Pattimura
record_format Article
series Barekeng
spelling doaj-art-3e952a296cc84f0bb6d6f7cd9a2c077b2025-08-20T03:37:33ZengUniversitas PattimuraBarekeng1978-72272615-30172023-06-011720847085410.30598/barekengvol17iss2pp0847-08547819CABLE NEWS NETWORK (CNN) ARTICLES CLASSIFICATION USING RANDOM FOREST ALGORITHM WITH HYPERPARAMETER OPTIMIZATIONDewi Retno Sari Saputro0Krisna Sidiq1Department of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Sebelas Maret, IndonesiaDepartment of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Sebelas Maret, IndonesiaThe growth of news articles on the internet occurs in a short period with large amounts so necessary to be grouped into several categories for easy access. There is a method for grouping news articles, namely classification. One of the classification methods is random forest which is built on decision tree. This research discusses the application of random forest as a method of classifying news articles into six categories, these are business, entertainment, health, politics, sport, and news. The data used is Cable News Network (CNN) articles from 2011 to 2022. The data is in form of text and has large amounts so good handling is needed to avoid overfitting and underfitting. Random forest is proper to apply to the data because the algorithm works very well on large amounts of data. However, random forest has a difficult interpretation if the combination of parameters is not appropriate in the data processing. Therefore, hyperparameter optimization is needed to discover the best combination of parameters in the random forest. This research uses search cross-validation (SearchCV) method to optimize hyperparameters in the random forest by testing the combinations one by one and validating those. Then we obtain the classification of news articles into six categories with an accuracy value of 0.81 on training and 0.76 on testing.https://ojs3.unpatti.ac.id/index.php/barekeng/article/view/7819classificationhyperparameterrandom forest
spellingShingle Dewi Retno Sari Saputro
Krisna Sidiq
CABLE NEWS NETWORK (CNN) ARTICLES CLASSIFICATION USING RANDOM FOREST ALGORITHM WITH HYPERPARAMETER OPTIMIZATION
Barekeng
classification
hyperparameter
random forest
title CABLE NEWS NETWORK (CNN) ARTICLES CLASSIFICATION USING RANDOM FOREST ALGORITHM WITH HYPERPARAMETER OPTIMIZATION
title_full CABLE NEWS NETWORK (CNN) ARTICLES CLASSIFICATION USING RANDOM FOREST ALGORITHM WITH HYPERPARAMETER OPTIMIZATION
title_fullStr CABLE NEWS NETWORK (CNN) ARTICLES CLASSIFICATION USING RANDOM FOREST ALGORITHM WITH HYPERPARAMETER OPTIMIZATION
title_full_unstemmed CABLE NEWS NETWORK (CNN) ARTICLES CLASSIFICATION USING RANDOM FOREST ALGORITHM WITH HYPERPARAMETER OPTIMIZATION
title_short CABLE NEWS NETWORK (CNN) ARTICLES CLASSIFICATION USING RANDOM FOREST ALGORITHM WITH HYPERPARAMETER OPTIMIZATION
title_sort cable news network cnn articles classification using random forest algorithm with hyperparameter optimization
topic classification
hyperparameter
random forest
url https://ojs3.unpatti.ac.id/index.php/barekeng/article/view/7819
work_keys_str_mv AT dewiretnosarisaputro cablenewsnetworkcnnarticlesclassificationusingrandomforestalgorithmwithhyperparameteroptimization
AT krisnasidiq cablenewsnetworkcnnarticlesclassificationusingrandomforestalgorithmwithhyperparameteroptimization