Presenting an Order-Aware Multimodal Fusion Framework for Financial Advisory Summarization With an Exclusive Video Dataset
Amidst the current digital era, global entrepreneurship and financial awareness dissemination has surged via online podcasts and videos showcasing insightful expertise from diverse financial domain professionals. However, existing financial summarization techniques predominantly focus on textual and...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10925332/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850099214914158592 |
|---|---|
| author | Sarmistha Das Samrat Ghosh Abhisek Tiwari R. E. Zera Marveen Lynghoi Sriparna Saha Zak Murad Alka Maurya |
| author_facet | Sarmistha Das Samrat Ghosh Abhisek Tiwari R. E. Zera Marveen Lynghoi Sriparna Saha Zak Murad Alka Maurya |
| author_sort | Sarmistha Das |
| collection | DOAJ |
| description | Amidst the current digital era, global entrepreneurship and financial awareness dissemination has surged via online podcasts and videos showcasing insightful expertise from diverse financial domain professionals. However, existing financial summarization techniques predominantly focus on textual and numerical data, neglecting the potential of a time-saving multimodal data. Addressing this gap, we introduce FinMSG, a pioneering framework for generating concise and informative summaries from lengthy financial expert videos. Leveraging a multimodal transformer-based architecture and an ordered-aware fusion algorithm, FinMSG processes text, audio, and video features to distil key opinions and insights from diverse financial domains. Subsequently, we present FAV, a one-of-its-kind multimodal financial advice video corpus comprising 420 videos across diverse domains, with gold-standard summary annotations. Through extensive experimentation and human evaluation, we demonstrate the efficacy of FinMSG in producing high-quality financial summaries while also investigating the interplay between different modalities. In addition, we investigated the capabilities of widely recognized small language models, such as BART and T5, alongside advanced large language models, including LLaMA-2 and GPT-3.5, to evaluate their proficiency in handling financial tasks within a multimodal configuration. By offering a transparent and time-efficient means for laypeople to access and comprehend finance insights, our work represents a significant advancement in multimodal financial summarisation. Code and dataset are available at (<uri>https://github.com/sarmistha-D/Fin-OAF</uri>). |
| format | Article |
| id | doaj-art-57ebed97a8da4e82830b3c92b353d1c0 |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-57ebed97a8da4e82830b3c92b353d1c02025-08-20T02:40:32ZengIEEEIEEE Access2169-35362025-01-0113483674837910.1109/ACCESS.2025.355112410925332Presenting an Order-Aware Multimodal Fusion Framework for Financial Advisory Summarization With an Exclusive Video DatasetSarmistha Das0https://orcid.org/0009-0002-3429-306XSamrat Ghosh1Abhisek Tiwari2R. E. Zera Marveen Lynghoi3Sriparna Saha4https://orcid.org/0000-0001-5494-9391Zak Murad5Alka Maurya6CSE Department, IIT Patna, Patna, IndiaRamakrishna Mission Vivekananda Educational and Research Institute, Belur, West Bengal, IndiaCSE Department, IIT Patna, Patna, IndiaCSE Department, IIT Patna, Patna, IndiaCSE Department, IIT Patna, Patna, IndiaCRISIL Ltd., Mumbai, IndiaCRISIL Ltd., Mumbai, IndiaAmidst the current digital era, global entrepreneurship and financial awareness dissemination has surged via online podcasts and videos showcasing insightful expertise from diverse financial domain professionals. However, existing financial summarization techniques predominantly focus on textual and numerical data, neglecting the potential of a time-saving multimodal data. Addressing this gap, we introduce FinMSG, a pioneering framework for generating concise and informative summaries from lengthy financial expert videos. Leveraging a multimodal transformer-based architecture and an ordered-aware fusion algorithm, FinMSG processes text, audio, and video features to distil key opinions and insights from diverse financial domains. Subsequently, we present FAV, a one-of-its-kind multimodal financial advice video corpus comprising 420 videos across diverse domains, with gold-standard summary annotations. Through extensive experimentation and human evaluation, we demonstrate the efficacy of FinMSG in producing high-quality financial summaries while also investigating the interplay between different modalities. In addition, we investigated the capabilities of widely recognized small language models, such as BART and T5, alongside advanced large language models, including LLaMA-2 and GPT-3.5, to evaluate their proficiency in handling financial tasks within a multimodal configuration. By offering a transparent and time-efficient means for laypeople to access and comprehend finance insights, our work represents a significant advancement in multimodal financial summarisation. Code and dataset are available at (<uri>https://github.com/sarmistha-D/Fin-OAF</uri>).https://ieeexplore.ieee.org/document/10925332/Financial datasetfinancial advisory videosmultimodalitysummary generationorder-aware fusionLLMs in finance |
| spellingShingle | Sarmistha Das Samrat Ghosh Abhisek Tiwari R. E. Zera Marveen Lynghoi Sriparna Saha Zak Murad Alka Maurya Presenting an Order-Aware Multimodal Fusion Framework for Financial Advisory Summarization With an Exclusive Video Dataset IEEE Access Financial dataset financial advisory videos multimodality summary generation order-aware fusion LLMs in finance |
| title | Presenting an Order-Aware Multimodal Fusion Framework for Financial Advisory Summarization With an Exclusive Video Dataset |
| title_full | Presenting an Order-Aware Multimodal Fusion Framework for Financial Advisory Summarization With an Exclusive Video Dataset |
| title_fullStr | Presenting an Order-Aware Multimodal Fusion Framework for Financial Advisory Summarization With an Exclusive Video Dataset |
| title_full_unstemmed | Presenting an Order-Aware Multimodal Fusion Framework for Financial Advisory Summarization With an Exclusive Video Dataset |
| title_short | Presenting an Order-Aware Multimodal Fusion Framework for Financial Advisory Summarization With an Exclusive Video Dataset |
| title_sort | presenting an order aware multimodal fusion framework for financial advisory summarization with an exclusive video dataset |
| topic | Financial dataset financial advisory videos multimodality summary generation order-aware fusion LLMs in finance |
| url | https://ieeexplore.ieee.org/document/10925332/ |
| work_keys_str_mv | AT sarmisthadas presentinganorderawaremultimodalfusionframeworkforfinancialadvisorysummarizationwithanexclusivevideodataset AT samratghosh presentinganorderawaremultimodalfusionframeworkforfinancialadvisorysummarizationwithanexclusivevideodataset AT abhisektiwari presentinganorderawaremultimodalfusionframeworkforfinancialadvisorysummarizationwithanexclusivevideodataset AT rezeramarveenlynghoi presentinganorderawaremultimodalfusionframeworkforfinancialadvisorysummarizationwithanexclusivevideodataset AT sriparnasaha presentinganorderawaremultimodalfusionframeworkforfinancialadvisorysummarizationwithanexclusivevideodataset AT zakmurad presentinganorderawaremultimodalfusionframeworkforfinancialadvisorysummarizationwithanexclusivevideodataset AT alkamaurya presentinganorderawaremultimodalfusionframeworkforfinancialadvisorysummarizationwithanexclusivevideodataset |