Enterprise chart question and answer method based on multi modal cross fusion

Abstract To enhance enterprises’ interactive exploration capabilities for unstructured chart data, this paper proposes a multimodal chart question-answering method. Facing the challenge of recognizing curved and irregular text in charts, we introduce Gaussian heatmap encoding technology to achieve c...

Full description

Saved in:
Bibliographic Details
Main Authors: Xinxin Wang, Liang Chen, Changhong Liu, Jinyu Liu
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-024-83652-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841544580791533568
author Xinxin Wang
Liang Chen
Changhong Liu
Jinyu Liu
author_facet Xinxin Wang
Liang Chen
Changhong Liu
Jinyu Liu
author_sort Xinxin Wang
collection DOAJ
description Abstract To enhance enterprises’ interactive exploration capabilities for unstructured chart data, this paper proposes a multimodal chart question-answering method. Facing the challenge of recognizing curved and irregular text in charts, we introduce Gaussian heatmap encoding technology to achieve character-level precise text annotation. Additionally, we combine a key point detection algorithm to extract numerical information from the charts and convert it into structured table data. Finally, by employing a multimodal cross-fusion model, we deeply integrate the queried charts, user questions, and generated table data to ensure that the model can comprehensively capture chart information and accurately answer user questions. Experimental validation has demonstrated that our method achieves a precision of 91.58% in chart information extraction and a chart question-answering accuracy of 82.24%, fully proving the significant advantages of our proposed method in enhancing chart text recognition and question-answering capabilities. Through practical enterprise application cases, our method has shown its ability to answer four types of chart questions, exhibiting mathematical reasoning capabilities and providing robust support for enterprise data analysis and decision-making.
format Article
id doaj-art-45c9202efcf54312846b4c63980b20d5
institution Kabale University
issn 2045-2322
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-45c9202efcf54312846b4c63980b20d52025-01-12T12:24:10ZengNature PortfolioScientific Reports2045-23222025-01-0115111610.1038/s41598-024-83652-5Enterprise chart question and answer method based on multi modal cross fusionXinxin Wang0Liang Chen1Changhong Liu2Jinyu Liu3School of Economics and Management, Shangluo UniversityThe Shannxi Key Laboratory of Clothing Intelligence, Xi’an Polytechnic UniversityChina Tobacco Chongqing Industrial Co.Ltd Qianjiang Cigarette FactoryChongqing Vocational Institute of TourismAbstract To enhance enterprises’ interactive exploration capabilities for unstructured chart data, this paper proposes a multimodal chart question-answering method. Facing the challenge of recognizing curved and irregular text in charts, we introduce Gaussian heatmap encoding technology to achieve character-level precise text annotation. Additionally, we combine a key point detection algorithm to extract numerical information from the charts and convert it into structured table data. Finally, by employing a multimodal cross-fusion model, we deeply integrate the queried charts, user questions, and generated table data to ensure that the model can comprehensively capture chart information and accurately answer user questions. Experimental validation has demonstrated that our method achieves a precision of 91.58% in chart information extraction and a chart question-answering accuracy of 82.24%, fully proving the significant advantages of our proposed method in enhancing chart text recognition and question-answering capabilities. Through practical enterprise application cases, our method has shown its ability to answer four types of chart questions, exhibiting mathematical reasoning capabilities and providing robust support for enterprise data analysis and decision-making.https://doi.org/10.1038/s41598-024-83652-5
spellingShingle Xinxin Wang
Liang Chen
Changhong Liu
Jinyu Liu
Enterprise chart question and answer method based on multi modal cross fusion
Scientific Reports
title Enterprise chart question and answer method based on multi modal cross fusion
title_full Enterprise chart question and answer method based on multi modal cross fusion
title_fullStr Enterprise chart question and answer method based on multi modal cross fusion
title_full_unstemmed Enterprise chart question and answer method based on multi modal cross fusion
title_short Enterprise chart question and answer method based on multi modal cross fusion
title_sort enterprise chart question and answer method based on multi modal cross fusion
url https://doi.org/10.1038/s41598-024-83652-5
work_keys_str_mv AT xinxinwang enterprisechartquestionandanswermethodbasedonmultimodalcrossfusion
AT liangchen enterprisechartquestionandanswermethodbasedonmultimodalcrossfusion
AT changhongliu enterprisechartquestionandanswermethodbasedonmultimodalcrossfusion
AT jinyuliu enterprisechartquestionandanswermethodbasedonmultimodalcrossfusion