Causal contextual bandits with one-shot data integration

We study a contextual bandit setting where the agent has access to causal side information, in addition to the ability to perform multiple targeted experiments corresponding to potentially different context-action pairs—simultaneously in one-shot within a budget. This new formalism provides a natura...

Full description

Saved in:
Bibliographic Details
Main Authors: Chandrasekar Subramanian, Balaraman Ravindran
Format: Article
Language:English
Published: Frontiers Media S.A. 2024-12-01
Series:Frontiers in Artificial Intelligence
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/frai.2024.1346700/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850174810130219008
author Chandrasekar Subramanian
Chandrasekar Subramanian
Balaraman Ravindran
Balaraman Ravindran
author_facet Chandrasekar Subramanian
Chandrasekar Subramanian
Balaraman Ravindran
Balaraman Ravindran
author_sort Chandrasekar Subramanian
collection DOAJ
description We study a contextual bandit setting where the agent has access to causal side information, in addition to the ability to perform multiple targeted experiments corresponding to potentially different context-action pairs—simultaneously in one-shot within a budget. This new formalism provides a natural model for several real-world scenarios where parallel targeted experiments can be conducted and where some domain knowledge of causal relationships is available. We propose a new algorithm that utilizes a novel entropy-like measure that we introduce. We perform several experiments, both using purely synthetic data and using a real-world dataset. In addition, we study sensitivity of our algorithm's performance to various aspects of the problem setting. The results show that our algorithm performs better than baselines in all of the experiments. We also show that the algorithm is sound; that is, as budget increases, the learned policy eventually converges to an optimal policy. Further, we theoretically bound our algorithm's regret under additional assumptions. Finally, we provide ways to achieve two popular notions of fairness, namely counterfactual fairness and demographic parity, with our algorithm.
format Article
id doaj-art-65113dd23e8d4a63b6267caa1225c4bf
institution OA Journals
issn 2624-8212
language English
publishDate 2024-12-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Artificial Intelligence
spelling doaj-art-65113dd23e8d4a63b6267caa1225c4bf2025-08-20T02:19:34ZengFrontiers Media S.A.Frontiers in Artificial Intelligence2624-82122024-12-01710.3389/frai.2024.13467001346700Causal contextual bandits with one-shot data integrationChandrasekar Subramanian0Chandrasekar Subramanian1Balaraman Ravindran2Balaraman Ravindran3Robert Bosch Center for Data Science and Artificial Intelligence, Indian Institute of Technology Madras, Chennai, IndiaDepartment of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, IndiaRobert Bosch Center for Data Science and Artificial Intelligence, Indian Institute of Technology Madras, Chennai, IndiaDepartment of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, IndiaWe study a contextual bandit setting where the agent has access to causal side information, in addition to the ability to perform multiple targeted experiments corresponding to potentially different context-action pairs—simultaneously in one-shot within a budget. This new formalism provides a natural model for several real-world scenarios where parallel targeted experiments can be conducted and where some domain knowledge of causal relationships is available. We propose a new algorithm that utilizes a novel entropy-like measure that we introduce. We perform several experiments, both using purely synthetic data and using a real-world dataset. In addition, we study sensitivity of our algorithm's performance to various aspects of the problem setting. The results show that our algorithm performs better than baselines in all of the experiments. We also show that the algorithm is sound; that is, as budget increases, the learned policy eventually converges to an optimal policy. Further, we theoretically bound our algorithm's regret under additional assumptions. Finally, we provide ways to achieve two popular notions of fairness, namely counterfactual fairness and demographic parity, with our algorithm.https://www.frontiersin.org/articles/10.3389/frai.2024.1346700/fullcausalityfairnesscausal contextual banditscausal banditscontextual bandit algorithm
spellingShingle Chandrasekar Subramanian
Chandrasekar Subramanian
Balaraman Ravindran
Balaraman Ravindran
Causal contextual bandits with one-shot data integration
Frontiers in Artificial Intelligence
causality
fairness
causal contextual bandits
causal bandits
contextual bandit algorithm
title Causal contextual bandits with one-shot data integration
title_full Causal contextual bandits with one-shot data integration
title_fullStr Causal contextual bandits with one-shot data integration
title_full_unstemmed Causal contextual bandits with one-shot data integration
title_short Causal contextual bandits with one-shot data integration
title_sort causal contextual bandits with one shot data integration
topic causality
fairness
causal contextual bandits
causal bandits
contextual bandit algorithm
url https://www.frontiersin.org/articles/10.3389/frai.2024.1346700/full
work_keys_str_mv AT chandrasekarsubramanian causalcontextualbanditswithoneshotdataintegration
AT chandrasekarsubramanian causalcontextualbanditswithoneshotdataintegration
AT balaramanravindran causalcontextualbanditswithoneshotdataintegration
AT balaramanravindran causalcontextualbanditswithoneshotdataintegration