Enterprise chart question and answer method based on multi modal cross fusion
Abstract To enhance enterprises’ interactive exploration capabilities for unstructured chart data, this paper proposes a multimodal chart question-answering method. Facing the challenge of recognizing curved and irregular text in charts, we introduce Gaussian heatmap encoding technology to achieve c...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-024-83652-5 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841544580791533568 |
---|---|
author | Xinxin Wang Liang Chen Changhong Liu Jinyu Liu |
author_facet | Xinxin Wang Liang Chen Changhong Liu Jinyu Liu |
author_sort | Xinxin Wang |
collection | DOAJ |
description | Abstract To enhance enterprises’ interactive exploration capabilities for unstructured chart data, this paper proposes a multimodal chart question-answering method. Facing the challenge of recognizing curved and irregular text in charts, we introduce Gaussian heatmap encoding technology to achieve character-level precise text annotation. Additionally, we combine a key point detection algorithm to extract numerical information from the charts and convert it into structured table data. Finally, by employing a multimodal cross-fusion model, we deeply integrate the queried charts, user questions, and generated table data to ensure that the model can comprehensively capture chart information and accurately answer user questions. Experimental validation has demonstrated that our method achieves a precision of 91.58% in chart information extraction and a chart question-answering accuracy of 82.24%, fully proving the significant advantages of our proposed method in enhancing chart text recognition and question-answering capabilities. Through practical enterprise application cases, our method has shown its ability to answer four types of chart questions, exhibiting mathematical reasoning capabilities and providing robust support for enterprise data analysis and decision-making. |
format | Article |
id | doaj-art-45c9202efcf54312846b4c63980b20d5 |
institution | Kabale University |
issn | 2045-2322 |
language | English |
publishDate | 2025-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj-art-45c9202efcf54312846b4c63980b20d52025-01-12T12:24:10ZengNature PortfolioScientific Reports2045-23222025-01-0115111610.1038/s41598-024-83652-5Enterprise chart question and answer method based on multi modal cross fusionXinxin Wang0Liang Chen1Changhong Liu2Jinyu Liu3School of Economics and Management, Shangluo UniversityThe Shannxi Key Laboratory of Clothing Intelligence, Xi’an Polytechnic UniversityChina Tobacco Chongqing Industrial Co.Ltd Qianjiang Cigarette FactoryChongqing Vocational Institute of TourismAbstract To enhance enterprises’ interactive exploration capabilities for unstructured chart data, this paper proposes a multimodal chart question-answering method. Facing the challenge of recognizing curved and irregular text in charts, we introduce Gaussian heatmap encoding technology to achieve character-level precise text annotation. Additionally, we combine a key point detection algorithm to extract numerical information from the charts and convert it into structured table data. Finally, by employing a multimodal cross-fusion model, we deeply integrate the queried charts, user questions, and generated table data to ensure that the model can comprehensively capture chart information and accurately answer user questions. Experimental validation has demonstrated that our method achieves a precision of 91.58% in chart information extraction and a chart question-answering accuracy of 82.24%, fully proving the significant advantages of our proposed method in enhancing chart text recognition and question-answering capabilities. Through practical enterprise application cases, our method has shown its ability to answer four types of chart questions, exhibiting mathematical reasoning capabilities and providing robust support for enterprise data analysis and decision-making.https://doi.org/10.1038/s41598-024-83652-5 |
spellingShingle | Xinxin Wang Liang Chen Changhong Liu Jinyu Liu Enterprise chart question and answer method based on multi modal cross fusion Scientific Reports |
title | Enterprise chart question and answer method based on multi modal cross fusion |
title_full | Enterprise chart question and answer method based on multi modal cross fusion |
title_fullStr | Enterprise chart question and answer method based on multi modal cross fusion |
title_full_unstemmed | Enterprise chart question and answer method based on multi modal cross fusion |
title_short | Enterprise chart question and answer method based on multi modal cross fusion |
title_sort | enterprise chart question and answer method based on multi modal cross fusion |
url | https://doi.org/10.1038/s41598-024-83652-5 |
work_keys_str_mv | AT xinxinwang enterprisechartquestionandanswermethodbasedonmultimodalcrossfusion AT liangchen enterprisechartquestionandanswermethodbasedonmultimodalcrossfusion AT changhongliu enterprisechartquestionandanswermethodbasedonmultimodalcrossfusion AT jinyuliu enterprisechartquestionandanswermethodbasedonmultimodalcrossfusion |