Visual Commonsense Causal Reasoning From a Still Image
Even from a still image, humans exhibit the ability to ratiocinate diverse visual cause-and-effect relationships of events preceding, succeeding, and extending beyond the given image scope. Previous work on commonsense causal reasoning (CCR) aimed at understanding general causal dependencies among c...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10950140/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849328740629217280 |
|---|---|
| author | Xiaojing Wu Rui Guo Qin Li Ning Zhu |
| author_facet | Xiaojing Wu Rui Guo Qin Li Ning Zhu |
| author_sort | Xiaojing Wu |
| collection | DOAJ |
| description | Even from a still image, humans exhibit the ability to ratiocinate diverse visual cause-and-effect relationships of events preceding, succeeding, and extending beyond the given image scope. Previous work on commonsense causal reasoning (CCR) aimed at understanding general causal dependencies among common events in natural language descriptions. However, in real-world scenarios, CCR is fundamentally a multisensory task and is more susceptible to spurious correlations, given that commonsense causal relationships manifest in various modalities and involve multiple sources of confounders. In this work, to the best of our knowledge, we present the first comprehensive study focusing on visual commonsense causal reasoning (VCCR) within the potential outcomes framework. By drawing parallels between vision-language data and human subjects in the observational study, we tailor a foundational framework, VCC-Reasoner, for detecting implicit visual commonsense causation. It combines inverse propensity score weighting and outcome regression, offering dual robust estimates of the average treatment effect. Empirical evidence underscores the efficacy and superiority of VCC-Reasoner, showcasing its outstanding VCCR capabilities. |
| format | Article |
| id | doaj-art-99064251dfd64449885571d292f44f2f |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-99064251dfd64449885571d292f44f2f2025-08-20T03:47:28ZengIEEEIEEE Access2169-35362025-01-0113850848509710.1109/ACCESS.2025.355842910950140Visual Commonsense Causal Reasoning From a Still ImageXiaojing Wu0https://orcid.org/0009-0002-4064-9720Rui Guo1Qin Li2Ning Zhu3School of Undergraduate Education, Shenzhen Polytechnic University, Shenzhen, ChinaSchool of Information Engineering, Beijing Polytechnic College, Beijing, ChinaSchool of Information Engineering, Weifang Engineering Vocational College, Qingzhou, Shandong, ChinaShangyu Technology (Beijing) Company Ltd., Beijing, ChinaEven from a still image, humans exhibit the ability to ratiocinate diverse visual cause-and-effect relationships of events preceding, succeeding, and extending beyond the given image scope. Previous work on commonsense causal reasoning (CCR) aimed at understanding general causal dependencies among common events in natural language descriptions. However, in real-world scenarios, CCR is fundamentally a multisensory task and is more susceptible to spurious correlations, given that commonsense causal relationships manifest in various modalities and involve multiple sources of confounders. In this work, to the best of our knowledge, we present the first comprehensive study focusing on visual commonsense causal reasoning (VCCR) within the potential outcomes framework. By drawing parallels between vision-language data and human subjects in the observational study, we tailor a foundational framework, VCC-Reasoner, for detecting implicit visual commonsense causation. It combines inverse propensity score weighting and outcome regression, offering dual robust estimates of the average treatment effect. Empirical evidence underscores the efficacy and superiority of VCC-Reasoner, showcasing its outstanding VCCR capabilities.https://ieeexplore.ieee.org/document/10950140/Visual commonsense reasoningcommonsense reasoningvisual event reasoningmultimodal large language model |
| spellingShingle | Xiaojing Wu Rui Guo Qin Li Ning Zhu Visual Commonsense Causal Reasoning From a Still Image IEEE Access Visual commonsense reasoning commonsense reasoning visual event reasoning multimodal large language model |
| title | Visual Commonsense Causal Reasoning From a Still Image |
| title_full | Visual Commonsense Causal Reasoning From a Still Image |
| title_fullStr | Visual Commonsense Causal Reasoning From a Still Image |
| title_full_unstemmed | Visual Commonsense Causal Reasoning From a Still Image |
| title_short | Visual Commonsense Causal Reasoning From a Still Image |
| title_sort | visual commonsense causal reasoning from a still image |
| topic | Visual commonsense reasoning commonsense reasoning visual event reasoning multimodal large language model |
| url | https://ieeexplore.ieee.org/document/10950140/ |
| work_keys_str_mv | AT xiaojingwu visualcommonsensecausalreasoningfromastillimage AT ruiguo visualcommonsensecausalreasoningfromastillimage AT qinli visualcommonsensecausalreasoningfromastillimage AT ningzhu visualcommonsensecausalreasoningfromastillimage |