CABLE NEWS NETWORK (CNN) ARTICLES CLASSIFICATION USING RANDOM FOREST ALGORITHM WITH HYPERPARAMETER OPTIMIZATION
The growth of news articles on the internet occurs in a short period with large amounts so necessary to be grouped into several categories for easy access. There is a method for grouping news articles, namely classification. One of the classification methods is random forest which is built on decisi...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Universitas Pattimura
2023-06-01
|
| Series: | Barekeng |
| Subjects: | |
| Online Access: | https://ojs3.unpatti.ac.id/index.php/barekeng/article/view/7819 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849402377502720000 |
|---|---|
| author | Dewi Retno Sari Saputro Krisna Sidiq |
| author_facet | Dewi Retno Sari Saputro Krisna Sidiq |
| author_sort | Dewi Retno Sari Saputro |
| collection | DOAJ |
| description | The growth of news articles on the internet occurs in a short period with large amounts so necessary to be grouped into several categories for easy access. There is a method for grouping news articles, namely classification. One of the classification methods is random forest which is built on decision tree. This research discusses the application of random forest as a method of classifying news articles into six categories, these are business, entertainment, health, politics, sport, and news. The data used is Cable News Network (CNN) articles from 2011 to 2022. The data is in form of text and has large amounts so good handling is needed to avoid overfitting and underfitting. Random forest is proper to apply to the data because the algorithm works very well on large amounts of data. However, random forest has a difficult interpretation if the combination of parameters is not appropriate in the data processing. Therefore, hyperparameter optimization is needed to discover the best combination of parameters in the random forest. This research uses search cross-validation (SearchCV) method to optimize hyperparameters in the random forest by testing the combinations one by one and validating those. Then we obtain the classification of news articles into six categories with an accuracy value of 0.81 on training and 0.76 on testing. |
| format | Article |
| id | doaj-art-3e952a296cc84f0bb6d6f7cd9a2c077b |
| institution | Kabale University |
| issn | 1978-7227 2615-3017 |
| language | English |
| publishDate | 2023-06-01 |
| publisher | Universitas Pattimura |
| record_format | Article |
| series | Barekeng |
| spelling | doaj-art-3e952a296cc84f0bb6d6f7cd9a2c077b2025-08-20T03:37:33ZengUniversitas PattimuraBarekeng1978-72272615-30172023-06-011720847085410.30598/barekengvol17iss2pp0847-08547819CABLE NEWS NETWORK (CNN) ARTICLES CLASSIFICATION USING RANDOM FOREST ALGORITHM WITH HYPERPARAMETER OPTIMIZATIONDewi Retno Sari Saputro0Krisna Sidiq1Department of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Sebelas Maret, IndonesiaDepartment of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Sebelas Maret, IndonesiaThe growth of news articles on the internet occurs in a short period with large amounts so necessary to be grouped into several categories for easy access. There is a method for grouping news articles, namely classification. One of the classification methods is random forest which is built on decision tree. This research discusses the application of random forest as a method of classifying news articles into six categories, these are business, entertainment, health, politics, sport, and news. The data used is Cable News Network (CNN) articles from 2011 to 2022. The data is in form of text and has large amounts so good handling is needed to avoid overfitting and underfitting. Random forest is proper to apply to the data because the algorithm works very well on large amounts of data. However, random forest has a difficult interpretation if the combination of parameters is not appropriate in the data processing. Therefore, hyperparameter optimization is needed to discover the best combination of parameters in the random forest. This research uses search cross-validation (SearchCV) method to optimize hyperparameters in the random forest by testing the combinations one by one and validating those. Then we obtain the classification of news articles into six categories with an accuracy value of 0.81 on training and 0.76 on testing.https://ojs3.unpatti.ac.id/index.php/barekeng/article/view/7819classificationhyperparameterrandom forest |
| spellingShingle | Dewi Retno Sari Saputro Krisna Sidiq CABLE NEWS NETWORK (CNN) ARTICLES CLASSIFICATION USING RANDOM FOREST ALGORITHM WITH HYPERPARAMETER OPTIMIZATION Barekeng classification hyperparameter random forest |
| title | CABLE NEWS NETWORK (CNN) ARTICLES CLASSIFICATION USING RANDOM FOREST ALGORITHM WITH HYPERPARAMETER OPTIMIZATION |
| title_full | CABLE NEWS NETWORK (CNN) ARTICLES CLASSIFICATION USING RANDOM FOREST ALGORITHM WITH HYPERPARAMETER OPTIMIZATION |
| title_fullStr | CABLE NEWS NETWORK (CNN) ARTICLES CLASSIFICATION USING RANDOM FOREST ALGORITHM WITH HYPERPARAMETER OPTIMIZATION |
| title_full_unstemmed | CABLE NEWS NETWORK (CNN) ARTICLES CLASSIFICATION USING RANDOM FOREST ALGORITHM WITH HYPERPARAMETER OPTIMIZATION |
| title_short | CABLE NEWS NETWORK (CNN) ARTICLES CLASSIFICATION USING RANDOM FOREST ALGORITHM WITH HYPERPARAMETER OPTIMIZATION |
| title_sort | cable news network cnn articles classification using random forest algorithm with hyperparameter optimization |
| topic | classification hyperparameter random forest |
| url | https://ojs3.unpatti.ac.id/index.php/barekeng/article/view/7819 |
| work_keys_str_mv | AT dewiretnosarisaputro cablenewsnetworkcnnarticlesclassificationusingrandomforestalgorithmwithhyperparameteroptimization AT krisnasidiq cablenewsnetworkcnnarticlesclassificationusingrandomforestalgorithmwithhyperparameteroptimization |