Causal contextual bandits with one-shot data integration
We study a contextual bandit setting where the agent has access to causal side information, in addition to the ability to perform multiple targeted experiments corresponding to potentially different context-action pairs—simultaneously in one-shot within a budget. This new formalism provides a natura...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Frontiers Media S.A.
2024-12-01
|
| Series: | Frontiers in Artificial Intelligence |
| Subjects: | |
| Online Access: | https://www.frontiersin.org/articles/10.3389/frai.2024.1346700/full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850174810130219008 |
|---|---|
| author | Chandrasekar Subramanian Chandrasekar Subramanian Balaraman Ravindran Balaraman Ravindran |
| author_facet | Chandrasekar Subramanian Chandrasekar Subramanian Balaraman Ravindran Balaraman Ravindran |
| author_sort | Chandrasekar Subramanian |
| collection | DOAJ |
| description | We study a contextual bandit setting where the agent has access to causal side information, in addition to the ability to perform multiple targeted experiments corresponding to potentially different context-action pairs—simultaneously in one-shot within a budget. This new formalism provides a natural model for several real-world scenarios where parallel targeted experiments can be conducted and where some domain knowledge of causal relationships is available. We propose a new algorithm that utilizes a novel entropy-like measure that we introduce. We perform several experiments, both using purely synthetic data and using a real-world dataset. In addition, we study sensitivity of our algorithm's performance to various aspects of the problem setting. The results show that our algorithm performs better than baselines in all of the experiments. We also show that the algorithm is sound; that is, as budget increases, the learned policy eventually converges to an optimal policy. Further, we theoretically bound our algorithm's regret under additional assumptions. Finally, we provide ways to achieve two popular notions of fairness, namely counterfactual fairness and demographic parity, with our algorithm. |
| format | Article |
| id | doaj-art-65113dd23e8d4a63b6267caa1225c4bf |
| institution | OA Journals |
| issn | 2624-8212 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | Frontiers Media S.A. |
| record_format | Article |
| series | Frontiers in Artificial Intelligence |
| spelling | doaj-art-65113dd23e8d4a63b6267caa1225c4bf2025-08-20T02:19:34ZengFrontiers Media S.A.Frontiers in Artificial Intelligence2624-82122024-12-01710.3389/frai.2024.13467001346700Causal contextual bandits with one-shot data integrationChandrasekar Subramanian0Chandrasekar Subramanian1Balaraman Ravindran2Balaraman Ravindran3Robert Bosch Center for Data Science and Artificial Intelligence, Indian Institute of Technology Madras, Chennai, IndiaDepartment of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, IndiaRobert Bosch Center for Data Science and Artificial Intelligence, Indian Institute of Technology Madras, Chennai, IndiaDepartment of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, IndiaWe study a contextual bandit setting where the agent has access to causal side information, in addition to the ability to perform multiple targeted experiments corresponding to potentially different context-action pairs—simultaneously in one-shot within a budget. This new formalism provides a natural model for several real-world scenarios where parallel targeted experiments can be conducted and where some domain knowledge of causal relationships is available. We propose a new algorithm that utilizes a novel entropy-like measure that we introduce. We perform several experiments, both using purely synthetic data and using a real-world dataset. In addition, we study sensitivity of our algorithm's performance to various aspects of the problem setting. The results show that our algorithm performs better than baselines in all of the experiments. We also show that the algorithm is sound; that is, as budget increases, the learned policy eventually converges to an optimal policy. Further, we theoretically bound our algorithm's regret under additional assumptions. Finally, we provide ways to achieve two popular notions of fairness, namely counterfactual fairness and demographic parity, with our algorithm.https://www.frontiersin.org/articles/10.3389/frai.2024.1346700/fullcausalityfairnesscausal contextual banditscausal banditscontextual bandit algorithm |
| spellingShingle | Chandrasekar Subramanian Chandrasekar Subramanian Balaraman Ravindran Balaraman Ravindran Causal contextual bandits with one-shot data integration Frontiers in Artificial Intelligence causality fairness causal contextual bandits causal bandits contextual bandit algorithm |
| title | Causal contextual bandits with one-shot data integration |
| title_full | Causal contextual bandits with one-shot data integration |
| title_fullStr | Causal contextual bandits with one-shot data integration |
| title_full_unstemmed | Causal contextual bandits with one-shot data integration |
| title_short | Causal contextual bandits with one-shot data integration |
| title_sort | causal contextual bandits with one shot data integration |
| topic | causality fairness causal contextual bandits causal bandits contextual bandit algorithm |
| url | https://www.frontiersin.org/articles/10.3389/frai.2024.1346700/full |
| work_keys_str_mv | AT chandrasekarsubramanian causalcontextualbanditswithoneshotdataintegration AT chandrasekarsubramanian causalcontextualbanditswithoneshotdataintegration AT balaramanravindran causalcontextualbanditswithoneshotdataintegration AT balaramanravindran causalcontextualbanditswithoneshotdataintegration |