How to Handle Data Imbalance and Feature Selection Problems in CNN-Based Stock Price Forecasting

Stock market forecasting is a time series problem that aims to predict possible future prices or directions of an index/stock. The stock data contains high uncertainty and is influenced by too many factors; hence it isn’t easy to achieve the goal by traditional time series methods. In lit...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zinnet Duygu Aksehir, Erdal Kilic
Format:	Article
Language:	English
Published:	IEEE 2022-01-01
Series:	IEEE Access
Subjects:	CNN model feature selection labeling stock prediction
Online Access:	https://ieeexplore.ieee.org/document/9738619/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850126666164076544
author	Zinnet Duygu Aksehir Erdal Kilic
author_facet	Zinnet Duygu Aksehir Erdal Kilic
author_sort	Zinnet Duygu Aksehir
collection	DOAJ
description	Stock market forecasting is a time series problem that aims to predict possible future prices or directions of an index/stock. The stock data contains high uncertainty and is influenced by too many factors; hence it isn’t easy to achieve the goal by traditional time series methods. In literature, the convolutional neural networks (CNN) models were used for stock market forecasting and gave successful results. But, data imbalance due to labeling and feature selection problems were seen when considering these models. Hence, this study proposed a new rule-based labeling algorithm and a new feature selection approach to solve the issues. In addition, a CNN-based model, which was presented to predict the next day’s trade action of stocks in the Dow30 index, was constructed to check the effectiveness of the data labeling and the feature selection approach. Different image-based input variable sets were created using technical indicators, gold, and oil price data to feed the CNN model. The prediction performance of CNN models was compared with other studies in the literature. The experimental results showed that the CNN prediction model, which uses the proposed feature selection and labeling approaches in this study, performs 3-22% higher accuracy than the CNN-based models taking part in other studies. Also, the labeling approach proposed is more successful than Chen and Huang’s data weighting approach to solve the stock data imbalance problem. This algorithm reduced the ratio between labeled data from 15 times to 1.8 times.
format	Article
id	doaj-art-75ab1270b7fb48259b8e79cdf72aeb41
institution	OA Journals
issn	2169-3536
language	English
publishDate	2022-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-75ab1270b7fb48259b8e79cdf72aeb412025-08-20T02:33:51ZengIEEEIEEE Access2169-35362022-01-0110312973130510.1109/ACCESS.2022.31607979738619How to Handle Data Imbalance and Feature Selection Problems in CNN-Based Stock Price ForecastingZinnet Duygu Aksehir0https://orcid.org/0000-0002-6834-6847Erdal Kilic1https://orcid.org/0000-0003-1585-0991Department of Computer Engineering, Ondokuz Mayıs University, Samsun, TurkeyDepartment of Computer Engineering, Ondokuz Mayıs University, Samsun, TurkeyStock market forecasting is a time series problem that aims to predict possible future prices or directions of an index/stock. The stock data contains high uncertainty and is influenced by too many factors; hence it isn’t easy to achieve the goal by traditional time series methods. In literature, the convolutional neural networks (CNN) models were used for stock market forecasting and gave successful results. But, data imbalance due to labeling and feature selection problems were seen when considering these models. Hence, this study proposed a new rule-based labeling algorithm and a new feature selection approach to solve the issues. In addition, a CNN-based model, which was presented to predict the next day’s trade action of stocks in the Dow30 index, was constructed to check the effectiveness of the data labeling and the feature selection approach. Different image-based input variable sets were created using technical indicators, gold, and oil price data to feed the CNN model. The prediction performance of CNN models was compared with other studies in the literature. The experimental results showed that the CNN prediction model, which uses the proposed feature selection and labeling approaches in this study, performs 3-22% higher accuracy than the CNN-based models taking part in other studies. Also, the labeling approach proposed is more successful than Chen and Huang’s data weighting approach to solve the stock data imbalance problem. This algorithm reduced the ratio between labeled data from 15 times to 1.8 times.https://ieeexplore.ieee.org/document/9738619/CNN modelfeature selectionlabelingstock prediction
spellingShingle	Zinnet Duygu Aksehir Erdal Kilic How to Handle Data Imbalance and Feature Selection Problems in CNN-Based Stock Price Forecasting IEEE Access CNN model feature selection labeling stock prediction
title	How to Handle Data Imbalance and Feature Selection Problems in CNN-Based Stock Price Forecasting
title_full	How to Handle Data Imbalance and Feature Selection Problems in CNN-Based Stock Price Forecasting
title_fullStr	How to Handle Data Imbalance and Feature Selection Problems in CNN-Based Stock Price Forecasting
title_full_unstemmed	How to Handle Data Imbalance and Feature Selection Problems in CNN-Based Stock Price Forecasting
title_short	How to Handle Data Imbalance and Feature Selection Problems in CNN-Based Stock Price Forecasting
title_sort	how to handle data imbalance and feature selection problems in cnn based stock price forecasting
topic	CNN model feature selection labeling stock prediction
url	https://ieeexplore.ieee.org/document/9738619/
work_keys_str_mv	AT zinnetduyguaksehir howtohandledataimbalanceandfeatureselectionproblemsincnnbasedstockpriceforecasting AT erdalkilic howtohandledataimbalanceandfeatureselectionproblemsincnnbasedstockpriceforecasting

How to Handle Data Imbalance and Feature Selection Problems in CNN-Based Stock Price Forecasting

Similar Items