IMPROVING ACCURACY OF PREDICTION INTERVALS OF HOUSEHOLD INCOME USING QUANTILE REGRESSION FOREST AND SELECTION OF EXPLANATORY VARIABLES

Quantile regression forest (QRF) is a non-parametric method for estimating the distribution function of response by using the random forest algorithm and constructing conditional quantile prediction intervals. However, if the explanatory factors (covariates) are highly correlated, the quantile regre...

Full description

Saved in:
Bibliographic Details
Main Authors: Asrirawan Asrirawan, Khairil Anwar Notodiputro, Bagus Sartono
Format: Article
Language:English
Published: Universitas Pattimura 2023-12-01
Series:Barekeng
Subjects:
Online Access:https://ojs3.unpatti.ac.id/index.php/barekeng/article/view/8974
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849405635670573056
author Asrirawan Asrirawan
Khairil Anwar Notodiputro
Bagus Sartono
author_facet Asrirawan Asrirawan
Khairil Anwar Notodiputro
Bagus Sartono
author_sort Asrirawan Asrirawan
collection DOAJ
description Quantile regression forest (QRF) is a non-parametric method for estimating the distribution function of response by using the random forest algorithm and constructing conditional quantile prediction intervals. However, if the explanatory factors (covariates) are highly correlated, the quantile regression forest's performance will decrease, resulting in low accuracy of prediction intervals for the outcome variable. The selection of explanatory variables in quantile regression forest is investigated and addressed in this paper, using several selection scenarios that consist of the full model, forward selection, LASSO, ridge regression, and random forest to improve the accuracy of household income data prediction. This data was obtained from National Labour Force Survey in 2021. The results indicate that the random forest method outperforms other methods for explanatory selection utilizing RMSE metrics. With regard to the criteria of average coverage value just above the 95% target and statistical test results, the RF-QRF and Forward-QRF methods outperform the QRF, LASSO-QRF, and Ridge-QRF methods for constructing prediction intervals.
format Article
id doaj-art-fa83d5d06cb9420886143e4f4c87c6fd
institution Kabale University
issn 1978-7227
2615-3017
language English
publishDate 2023-12-01
publisher Universitas Pattimura
record_format Article
series Barekeng
spelling doaj-art-fa83d5d06cb9420886143e4f4c87c6fd2025-08-20T03:36:37ZengUniversitas PattimuraBarekeng1978-72272615-30172023-12-011741915192610.30598/barekengvol17iss4pp1915-19268974IMPROVING ACCURACY OF PREDICTION INTERVALS OF HOUSEHOLD INCOME USING QUANTILE REGRESSION FOREST AND SELECTION OF EXPLANATORY VARIABLESAsrirawan Asrirawan0Khairil Anwar Notodiputro1Bagus Sartono2Department of Statistics, Faculty of Mathematics and Natural Sciences, University of West Sulawesi, IndonesiaDepartment of Statistics and Data Science, Faculty of Mathematics and Natural Sciences, IPB UniversityDepartment of Statistics and Data Science, Faculty of Mathematics and Natural Sciences, IPB UniversityQuantile regression forest (QRF) is a non-parametric method for estimating the distribution function of response by using the random forest algorithm and constructing conditional quantile prediction intervals. However, if the explanatory factors (covariates) are highly correlated, the quantile regression forest's performance will decrease, resulting in low accuracy of prediction intervals for the outcome variable. The selection of explanatory variables in quantile regression forest is investigated and addressed in this paper, using several selection scenarios that consist of the full model, forward selection, LASSO, ridge regression, and random forest to improve the accuracy of household income data prediction. This data was obtained from National Labour Force Survey in 2021. The results indicate that the random forest method outperforms other methods for explanatory selection utilizing RMSE metrics. With regard to the criteria of average coverage value just above the 95% target and statistical test results, the RF-QRF and Forward-QRF methods outperform the QRF, LASSO-QRF, and Ridge-QRF methods for constructing prediction intervals.https://ojs3.unpatti.ac.id/index.php/barekeng/article/view/8974household incomequantile regression forestrandom forestprediction interval
spellingShingle Asrirawan Asrirawan
Khairil Anwar Notodiputro
Bagus Sartono
IMPROVING ACCURACY OF PREDICTION INTERVALS OF HOUSEHOLD INCOME USING QUANTILE REGRESSION FOREST AND SELECTION OF EXPLANATORY VARIABLES
Barekeng
household income
quantile regression forest
random forest
prediction interval
title IMPROVING ACCURACY OF PREDICTION INTERVALS OF HOUSEHOLD INCOME USING QUANTILE REGRESSION FOREST AND SELECTION OF EXPLANATORY VARIABLES
title_full IMPROVING ACCURACY OF PREDICTION INTERVALS OF HOUSEHOLD INCOME USING QUANTILE REGRESSION FOREST AND SELECTION OF EXPLANATORY VARIABLES
title_fullStr IMPROVING ACCURACY OF PREDICTION INTERVALS OF HOUSEHOLD INCOME USING QUANTILE REGRESSION FOREST AND SELECTION OF EXPLANATORY VARIABLES
title_full_unstemmed IMPROVING ACCURACY OF PREDICTION INTERVALS OF HOUSEHOLD INCOME USING QUANTILE REGRESSION FOREST AND SELECTION OF EXPLANATORY VARIABLES
title_short IMPROVING ACCURACY OF PREDICTION INTERVALS OF HOUSEHOLD INCOME USING QUANTILE REGRESSION FOREST AND SELECTION OF EXPLANATORY VARIABLES
title_sort improving accuracy of prediction intervals of household income using quantile regression forest and selection of explanatory variables
topic household income
quantile regression forest
random forest
prediction interval
url https://ojs3.unpatti.ac.id/index.php/barekeng/article/view/8974
work_keys_str_mv AT asrirawanasrirawan improvingaccuracyofpredictionintervalsofhouseholdincomeusingquantileregressionforestandselectionofexplanatoryvariables
AT khairilanwarnotodiputro improvingaccuracyofpredictionintervalsofhouseholdincomeusingquantileregressionforestandselectionofexplanatoryvariables
AT bagussartono improvingaccuracyofpredictionintervalsofhouseholdincomeusingquantileregressionforestandselectionofexplanatoryvariables