A Multi-Modal Approach Using a Hybrid Vision Transformer and Temporal Fusion Transformer Model for Stock Price Movement Classification
Stock market price movement primarily focuses on accurately classifying buy and sell signals, which enables traders to maximize profits with well-timed market entry and exit trading positions. This study presents and implements a multi-modal deep learning approach to classifying stock price movement...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11080418/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850071031641800704 |
|---|---|
| author | Ibanga Kpereobong Friday Sarada Prasanna Pati Debahuti Mishra |
| author_facet | Ibanga Kpereobong Friday Sarada Prasanna Pati Debahuti Mishra |
| author_sort | Ibanga Kpereobong Friday |
| collection | DOAJ |
| description | Stock market price movement primarily focuses on accurately classifying buy and sell signals, which enables traders to maximize profits with well-timed market entry and exit trading positions. This study presents and implements a multi-modal deep learning approach to classifying stock price movement. Our approach adequately captures potential price reversals or continuations by utilizing two modalities (candlestick chart patterns and historical price data). Specifically, the proposed framework converts the historical data into candlestick charts of <inline-formula> <tex-math notation="LaTeX">$256\times 256$ </tex-math></inline-formula>-pixel images where both modalities are effectively integrated and processed. A key innovation employed is the application of the histogram of oriented gradients (HOG) to extract relevant descriptors, including the candlestick colour, body-to-wick proportions, and wick size. Concurrently, the vision transformer (ViT) model is used to process the images using an embedded projection and multi-head self-attention to extract salient spatial features into a non-overlapping patch of <inline-formula> <tex-math notation="LaTeX">$16\times 16$ </tex-math></inline-formula> pixels, which are treated as input tokens for the model. After which, the temporal fusion transformer (TFT) model processes the historical features, candlestick chart features, and the extracted HOG features via a decision-level (late feature fusion) strategy that concatenates these inputs to predict short-term price movements over different horizons (1 day, 3 days, 7 days, and 10 days ahead). We systematically evaluate the model performance using a time series cross-validation split to demonstrate the proposed model’s efficacy and generalization across eight indices (BSE, IXIC, N225, NIFTY-50, NSE-30, NYSE, S&P 500, and SSE). The results demonstrate the superior performance of our multi-modal approach, achieving average accuracy, precision, recall, and matthew correlation coefficient (MCC) of 96.17%, 96.24%, 96.15%, and 0.9367, respectively across all evaluated indices. Furthermore, using a real-time trading simulation, the study assesses the practical implications of different window sizes (5, 10, and 15 days). A paired t-test is also conducted to validate the proposed model against benchmarks statistically. The analysis provides valuable insights into how short and long-term traders can effectively maximize the proposed model, highlighting its adaptability for real-world applications. |
| format | Article |
| id | doaj-art-cc13de7a643d4d03beab6e1136874af0 |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-cc13de7a643d4d03beab6e1136874af02025-08-20T02:47:24ZengIEEEIEEE Access2169-35362025-01-011312722112723910.1109/ACCESS.2025.358906311080418A Multi-Modal Approach Using a Hybrid Vision Transformer and Temporal Fusion Transformer Model for Stock Price Movement ClassificationIbanga Kpereobong Friday0https://orcid.org/0000-0002-9012-1679Sarada Prasanna Pati1Debahuti Mishra2https://orcid.org/0000-0002-6827-6121Department of Computer Science and Engineering, Siksha ‘O’ Anusandhan (Deemed to be) University, Bhubaneswar, Odisha, IndiaDepartment of Computer Science and Engineering, Siksha ‘O’ Anusandhan (Deemed to be) University, Bhubaneswar, Odisha, IndiaDepartment of Computer Science and Engineering, Siksha ‘O’ Anusandhan (Deemed to be) University, Bhubaneswar, Odisha, IndiaStock market price movement primarily focuses on accurately classifying buy and sell signals, which enables traders to maximize profits with well-timed market entry and exit trading positions. This study presents and implements a multi-modal deep learning approach to classifying stock price movement. Our approach adequately captures potential price reversals or continuations by utilizing two modalities (candlestick chart patterns and historical price data). Specifically, the proposed framework converts the historical data into candlestick charts of <inline-formula> <tex-math notation="LaTeX">$256\times 256$ </tex-math></inline-formula>-pixel images where both modalities are effectively integrated and processed. A key innovation employed is the application of the histogram of oriented gradients (HOG) to extract relevant descriptors, including the candlestick colour, body-to-wick proportions, and wick size. Concurrently, the vision transformer (ViT) model is used to process the images using an embedded projection and multi-head self-attention to extract salient spatial features into a non-overlapping patch of <inline-formula> <tex-math notation="LaTeX">$16\times 16$ </tex-math></inline-formula> pixels, which are treated as input tokens for the model. After which, the temporal fusion transformer (TFT) model processes the historical features, candlestick chart features, and the extracted HOG features via a decision-level (late feature fusion) strategy that concatenates these inputs to predict short-term price movements over different horizons (1 day, 3 days, 7 days, and 10 days ahead). We systematically evaluate the model performance using a time series cross-validation split to demonstrate the proposed model’s efficacy and generalization across eight indices (BSE, IXIC, N225, NIFTY-50, NSE-30, NYSE, S&P 500, and SSE). The results demonstrate the superior performance of our multi-modal approach, achieving average accuracy, precision, recall, and matthew correlation coefficient (MCC) of 96.17%, 96.24%, 96.15%, and 0.9367, respectively across all evaluated indices. Furthermore, using a real-time trading simulation, the study assesses the practical implications of different window sizes (5, 10, and 15 days). A paired t-test is also conducted to validate the proposed model against benchmarks statistically. The analysis provides valuable insights into how short and long-term traders can effectively maximize the proposed model, highlighting its adaptability for real-world applications.https://ieeexplore.ieee.org/document/11080418/Candlestickhistogram of oriented gradients (HOG)stock price movementmulti-modaltemporal fusion transformer (TFT)vision transformer (ViT) |
| spellingShingle | Ibanga Kpereobong Friday Sarada Prasanna Pati Debahuti Mishra A Multi-Modal Approach Using a Hybrid Vision Transformer and Temporal Fusion Transformer Model for Stock Price Movement Classification IEEE Access Candlestick histogram of oriented gradients (HOG) stock price movement multi-modal temporal fusion transformer (TFT) vision transformer (ViT) |
| title | A Multi-Modal Approach Using a Hybrid Vision Transformer and Temporal Fusion Transformer Model for Stock Price Movement Classification |
| title_full | A Multi-Modal Approach Using a Hybrid Vision Transformer and Temporal Fusion Transformer Model for Stock Price Movement Classification |
| title_fullStr | A Multi-Modal Approach Using a Hybrid Vision Transformer and Temporal Fusion Transformer Model for Stock Price Movement Classification |
| title_full_unstemmed | A Multi-Modal Approach Using a Hybrid Vision Transformer and Temporal Fusion Transformer Model for Stock Price Movement Classification |
| title_short | A Multi-Modal Approach Using a Hybrid Vision Transformer and Temporal Fusion Transformer Model for Stock Price Movement Classification |
| title_sort | multi modal approach using a hybrid vision transformer and temporal fusion transformer model for stock price movement classification |
| topic | Candlestick histogram of oriented gradients (HOG) stock price movement multi-modal temporal fusion transformer (TFT) vision transformer (ViT) |
| url | https://ieeexplore.ieee.org/document/11080418/ |
| work_keys_str_mv | AT ibangakpereobongfriday amultimodalapproachusingahybridvisiontransformerandtemporalfusiontransformermodelforstockpricemovementclassification AT saradaprasannapati amultimodalapproachusingahybridvisiontransformerandtemporalfusiontransformermodelforstockpricemovementclassification AT debahutimishra amultimodalapproachusingahybridvisiontransformerandtemporalfusiontransformermodelforstockpricemovementclassification AT ibangakpereobongfriday multimodalapproachusingahybridvisiontransformerandtemporalfusiontransformermodelforstockpricemovementclassification AT saradaprasannapati multimodalapproachusingahybridvisiontransformerandtemporalfusiontransformermodelforstockpricemovementclassification AT debahutimishra multimodalapproachusingahybridvisiontransformerandtemporalfusiontransformermodelforstockpricemovementclassification |