A Multi-Modal Approach Using a Hybrid Vision Transformer and Temporal Fusion Transformer Model for Stock Price Movement Classification

Stock market price movement primarily focuses on accurately classifying buy and sell signals, which enables traders to maximize profits with well-timed market entry and exit trading positions. This study presents and implements a multi-modal deep learning approach to classifying stock price movement...

Full description

Saved in:
Bibliographic Details
Main Authors: Ibanga Kpereobong Friday, Sarada Prasanna Pati, Debahuti Mishra
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11080418/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850071031641800704
author Ibanga Kpereobong Friday
Sarada Prasanna Pati
Debahuti Mishra
author_facet Ibanga Kpereobong Friday
Sarada Prasanna Pati
Debahuti Mishra
author_sort Ibanga Kpereobong Friday
collection DOAJ
description Stock market price movement primarily focuses on accurately classifying buy and sell signals, which enables traders to maximize profits with well-timed market entry and exit trading positions. This study presents and implements a multi-modal deep learning approach to classifying stock price movement. Our approach adequately captures potential price reversals or continuations by utilizing two modalities (candlestick chart patterns and historical price data). Specifically, the proposed framework converts the historical data into candlestick charts of <inline-formula> <tex-math notation="LaTeX">$256\times 256$ </tex-math></inline-formula>-pixel images where both modalities are effectively integrated and processed. A key innovation employed is the application of the histogram of oriented gradients (HOG) to extract relevant descriptors, including the candlestick colour, body-to-wick proportions, and wick size. Concurrently, the vision transformer (ViT) model is used to process the images using an embedded projection and multi-head self-attention to extract salient spatial features into a non-overlapping patch of <inline-formula> <tex-math notation="LaTeX">$16\times 16$ </tex-math></inline-formula> pixels, which are treated as input tokens for the model. After which, the temporal fusion transformer (TFT) model processes the historical features, candlestick chart features, and the extracted HOG features via a decision-level (late feature fusion) strategy that concatenates these inputs to predict short-term price movements over different horizons (1 day, 3 days, 7 days, and 10 days ahead). We systematically evaluate the model performance using a time series cross-validation split to demonstrate the proposed model&#x2019;s efficacy and generalization across eight indices (BSE, IXIC, N225, NIFTY-50, NSE-30, NYSE, S&#x0026;P 500, and SSE). The results demonstrate the superior performance of our multi-modal approach, achieving average accuracy, precision, recall, and matthew correlation coefficient (MCC) of 96.17%, 96.24%, 96.15%, and 0.9367, respectively across all evaluated indices. Furthermore, using a real-time trading simulation, the study assesses the practical implications of different window sizes (5, 10, and 15 days). A paired t-test is also conducted to validate the proposed model against benchmarks statistically. The analysis provides valuable insights into how short and long-term traders can effectively maximize the proposed model, highlighting its adaptability for real-world applications.
format Article
id doaj-art-cc13de7a643d4d03beab6e1136874af0
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-cc13de7a643d4d03beab6e1136874af02025-08-20T02:47:24ZengIEEEIEEE Access2169-35362025-01-011312722112723910.1109/ACCESS.2025.358906311080418A Multi-Modal Approach Using a Hybrid Vision Transformer and Temporal Fusion Transformer Model for Stock Price Movement ClassificationIbanga Kpereobong Friday0https://orcid.org/0000-0002-9012-1679Sarada Prasanna Pati1Debahuti Mishra2https://orcid.org/0000-0002-6827-6121Department of Computer Science and Engineering, Siksha &#x2018;O&#x2019; Anusandhan (Deemed to be) University, Bhubaneswar, Odisha, IndiaDepartment of Computer Science and Engineering, Siksha &#x2018;O&#x2019; Anusandhan (Deemed to be) University, Bhubaneswar, Odisha, IndiaDepartment of Computer Science and Engineering, Siksha &#x2018;O&#x2019; Anusandhan (Deemed to be) University, Bhubaneswar, Odisha, IndiaStock market price movement primarily focuses on accurately classifying buy and sell signals, which enables traders to maximize profits with well-timed market entry and exit trading positions. This study presents and implements a multi-modal deep learning approach to classifying stock price movement. Our approach adequately captures potential price reversals or continuations by utilizing two modalities (candlestick chart patterns and historical price data). Specifically, the proposed framework converts the historical data into candlestick charts of <inline-formula> <tex-math notation="LaTeX">$256\times 256$ </tex-math></inline-formula>-pixel images where both modalities are effectively integrated and processed. A key innovation employed is the application of the histogram of oriented gradients (HOG) to extract relevant descriptors, including the candlestick colour, body-to-wick proportions, and wick size. Concurrently, the vision transformer (ViT) model is used to process the images using an embedded projection and multi-head self-attention to extract salient spatial features into a non-overlapping patch of <inline-formula> <tex-math notation="LaTeX">$16\times 16$ </tex-math></inline-formula> pixels, which are treated as input tokens for the model. After which, the temporal fusion transformer (TFT) model processes the historical features, candlestick chart features, and the extracted HOG features via a decision-level (late feature fusion) strategy that concatenates these inputs to predict short-term price movements over different horizons (1 day, 3 days, 7 days, and 10 days ahead). We systematically evaluate the model performance using a time series cross-validation split to demonstrate the proposed model&#x2019;s efficacy and generalization across eight indices (BSE, IXIC, N225, NIFTY-50, NSE-30, NYSE, S&#x0026;P 500, and SSE). The results demonstrate the superior performance of our multi-modal approach, achieving average accuracy, precision, recall, and matthew correlation coefficient (MCC) of 96.17%, 96.24%, 96.15%, and 0.9367, respectively across all evaluated indices. Furthermore, using a real-time trading simulation, the study assesses the practical implications of different window sizes (5, 10, and 15 days). A paired t-test is also conducted to validate the proposed model against benchmarks statistically. The analysis provides valuable insights into how short and long-term traders can effectively maximize the proposed model, highlighting its adaptability for real-world applications.https://ieeexplore.ieee.org/document/11080418/Candlestickhistogram of oriented gradients (HOG)stock price movementmulti-modaltemporal fusion transformer (TFT)vision transformer (ViT)
spellingShingle Ibanga Kpereobong Friday
Sarada Prasanna Pati
Debahuti Mishra
A Multi-Modal Approach Using a Hybrid Vision Transformer and Temporal Fusion Transformer Model for Stock Price Movement Classification
IEEE Access
Candlestick
histogram of oriented gradients (HOG)
stock price movement
multi-modal
temporal fusion transformer (TFT)
vision transformer (ViT)
title A Multi-Modal Approach Using a Hybrid Vision Transformer and Temporal Fusion Transformer Model for Stock Price Movement Classification
title_full A Multi-Modal Approach Using a Hybrid Vision Transformer and Temporal Fusion Transformer Model for Stock Price Movement Classification
title_fullStr A Multi-Modal Approach Using a Hybrid Vision Transformer and Temporal Fusion Transformer Model for Stock Price Movement Classification
title_full_unstemmed A Multi-Modal Approach Using a Hybrid Vision Transformer and Temporal Fusion Transformer Model for Stock Price Movement Classification
title_short A Multi-Modal Approach Using a Hybrid Vision Transformer and Temporal Fusion Transformer Model for Stock Price Movement Classification
title_sort multi modal approach using a hybrid vision transformer and temporal fusion transformer model for stock price movement classification
topic Candlestick
histogram of oriented gradients (HOG)
stock price movement
multi-modal
temporal fusion transformer (TFT)
vision transformer (ViT)
url https://ieeexplore.ieee.org/document/11080418/
work_keys_str_mv AT ibangakpereobongfriday amultimodalapproachusingahybridvisiontransformerandtemporalfusiontransformermodelforstockpricemovementclassification
AT saradaprasannapati amultimodalapproachusingahybridvisiontransformerandtemporalfusiontransformermodelforstockpricemovementclassification
AT debahutimishra amultimodalapproachusingahybridvisiontransformerandtemporalfusiontransformermodelforstockpricemovementclassification
AT ibangakpereobongfriday multimodalapproachusingahybridvisiontransformerandtemporalfusiontransformermodelforstockpricemovementclassification
AT saradaprasannapati multimodalapproachusingahybridvisiontransformerandtemporalfusiontransformermodelforstockpricemovementclassification
AT debahutimishra multimodalapproachusingahybridvisiontransformerandtemporalfusiontransformermodelforstockpricemovementclassification