Presenting an Order-Aware Multimodal Fusion Framework for Financial Advisory Summarization With an Exclusive Video Dataset

Amidst the current digital era, global entrepreneurship and financial awareness dissemination has surged via online podcasts and videos showcasing insightful expertise from diverse financial domain professionals. However, existing financial summarization techniques predominantly focus on textual and...

Full description

Saved in:
Bibliographic Details
Main Authors: Sarmistha Das, Samrat Ghosh, Abhisek Tiwari, R. E. Zera Marveen Lynghoi, Sriparna Saha, Zak Murad, Alka Maurya
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10925332/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850099214914158592
author Sarmistha Das
Samrat Ghosh
Abhisek Tiwari
R. E. Zera Marveen Lynghoi
Sriparna Saha
Zak Murad
Alka Maurya
author_facet Sarmistha Das
Samrat Ghosh
Abhisek Tiwari
R. E. Zera Marveen Lynghoi
Sriparna Saha
Zak Murad
Alka Maurya
author_sort Sarmistha Das
collection DOAJ
description Amidst the current digital era, global entrepreneurship and financial awareness dissemination has surged via online podcasts and videos showcasing insightful expertise from diverse financial domain professionals. However, existing financial summarization techniques predominantly focus on textual and numerical data, neglecting the potential of a time-saving multimodal data. Addressing this gap, we introduce FinMSG, a pioneering framework for generating concise and informative summaries from lengthy financial expert videos. Leveraging a multimodal transformer-based architecture and an ordered-aware fusion algorithm, FinMSG processes text, audio, and video features to distil key opinions and insights from diverse financial domains. Subsequently, we present FAV, a one-of-its-kind multimodal financial advice video corpus comprising 420 videos across diverse domains, with gold-standard summary annotations. Through extensive experimentation and human evaluation, we demonstrate the efficacy of FinMSG in producing high-quality financial summaries while also investigating the interplay between different modalities. In addition, we investigated the capabilities of widely recognized small language models, such as BART and T5, alongside advanced large language models, including LLaMA-2 and GPT-3.5, to evaluate their proficiency in handling financial tasks within a multimodal configuration. By offering a transparent and time-efficient means for laypeople to access and comprehend finance insights, our work represents a significant advancement in multimodal financial summarisation. Code and dataset are available at (<uri>https://github.com/sarmistha-D/Fin-OAF</uri>).
format Article
id doaj-art-57ebed97a8da4e82830b3c92b353d1c0
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-57ebed97a8da4e82830b3c92b353d1c02025-08-20T02:40:32ZengIEEEIEEE Access2169-35362025-01-0113483674837910.1109/ACCESS.2025.355112410925332Presenting an Order-Aware Multimodal Fusion Framework for Financial Advisory Summarization With an Exclusive Video DatasetSarmistha Das0https://orcid.org/0009-0002-3429-306XSamrat Ghosh1Abhisek Tiwari2R. E. Zera Marveen Lynghoi3Sriparna Saha4https://orcid.org/0000-0001-5494-9391Zak Murad5Alka Maurya6CSE Department, IIT Patna, Patna, IndiaRamakrishna Mission Vivekananda Educational and Research Institute, Belur, West Bengal, IndiaCSE Department, IIT Patna, Patna, IndiaCSE Department, IIT Patna, Patna, IndiaCSE Department, IIT Patna, Patna, IndiaCRISIL Ltd., Mumbai, IndiaCRISIL Ltd., Mumbai, IndiaAmidst the current digital era, global entrepreneurship and financial awareness dissemination has surged via online podcasts and videos showcasing insightful expertise from diverse financial domain professionals. However, existing financial summarization techniques predominantly focus on textual and numerical data, neglecting the potential of a time-saving multimodal data. Addressing this gap, we introduce FinMSG, a pioneering framework for generating concise and informative summaries from lengthy financial expert videos. Leveraging a multimodal transformer-based architecture and an ordered-aware fusion algorithm, FinMSG processes text, audio, and video features to distil key opinions and insights from diverse financial domains. Subsequently, we present FAV, a one-of-its-kind multimodal financial advice video corpus comprising 420 videos across diverse domains, with gold-standard summary annotations. Through extensive experimentation and human evaluation, we demonstrate the efficacy of FinMSG in producing high-quality financial summaries while also investigating the interplay between different modalities. In addition, we investigated the capabilities of widely recognized small language models, such as BART and T5, alongside advanced large language models, including LLaMA-2 and GPT-3.5, to evaluate their proficiency in handling financial tasks within a multimodal configuration. By offering a transparent and time-efficient means for laypeople to access and comprehend finance insights, our work represents a significant advancement in multimodal financial summarisation. Code and dataset are available at (<uri>https://github.com/sarmistha-D/Fin-OAF</uri>).https://ieeexplore.ieee.org/document/10925332/Financial datasetfinancial advisory videosmultimodalitysummary generationorder-aware fusionLLMs in finance
spellingShingle Sarmistha Das
Samrat Ghosh
Abhisek Tiwari
R. E. Zera Marveen Lynghoi
Sriparna Saha
Zak Murad
Alka Maurya
Presenting an Order-Aware Multimodal Fusion Framework for Financial Advisory Summarization With an Exclusive Video Dataset
IEEE Access
Financial dataset
financial advisory videos
multimodality
summary generation
order-aware fusion
LLMs in finance
title Presenting an Order-Aware Multimodal Fusion Framework for Financial Advisory Summarization With an Exclusive Video Dataset
title_full Presenting an Order-Aware Multimodal Fusion Framework for Financial Advisory Summarization With an Exclusive Video Dataset
title_fullStr Presenting an Order-Aware Multimodal Fusion Framework for Financial Advisory Summarization With an Exclusive Video Dataset
title_full_unstemmed Presenting an Order-Aware Multimodal Fusion Framework for Financial Advisory Summarization With an Exclusive Video Dataset
title_short Presenting an Order-Aware Multimodal Fusion Framework for Financial Advisory Summarization With an Exclusive Video Dataset
title_sort presenting an order aware multimodal fusion framework for financial advisory summarization with an exclusive video dataset
topic Financial dataset
financial advisory videos
multimodality
summary generation
order-aware fusion
LLMs in finance
url https://ieeexplore.ieee.org/document/10925332/
work_keys_str_mv AT sarmisthadas presentinganorderawaremultimodalfusionframeworkforfinancialadvisorysummarizationwithanexclusivevideodataset
AT samratghosh presentinganorderawaremultimodalfusionframeworkforfinancialadvisorysummarizationwithanexclusivevideodataset
AT abhisektiwari presentinganorderawaremultimodalfusionframeworkforfinancialadvisorysummarizationwithanexclusivevideodataset
AT rezeramarveenlynghoi presentinganorderawaremultimodalfusionframeworkforfinancialadvisorysummarizationwithanexclusivevideodataset
AT sriparnasaha presentinganorderawaremultimodalfusionframeworkforfinancialadvisorysummarizationwithanexclusivevideodataset
AT zakmurad presentinganorderawaremultimodalfusionframeworkforfinancialadvisorysummarizationwithanexclusivevideodataset
AT alkamaurya presentinganorderawaremultimodalfusionframeworkforfinancialadvisorysummarizationwithanexclusivevideodataset