Bias Adaptive Statistical Inference Learning Agents for Learning from Human Feedback

We present a novel technique for learning behaviors from ahuman provided feedback signal that is distorted by system-atic bias. Our technique, which we refer to as BASIL, modelsthe feedback signal as being separable into a heuristic evalu-ation of the utility of an action and a bias value that is dr...

Full description

Saved in:
Bibliographic Details
Main Author: Jonathan I Watson
Format: Article
Language:English
Published: LibraryPress@UF 2021-04-01
Series:Proceedings of the International Florida Artificial Intelligence Research Society Conference
Subjects:
Online Access:https://journals.flvc.org/FLAIRS/article/view/128471
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850271137021296640
author Jonathan I Watson
author_facet Jonathan I Watson
author_sort Jonathan I Watson
collection DOAJ
description We present a novel technique for learning behaviors from ahuman provided feedback signal that is distorted by system-atic bias. Our technique, which we refer to as BASIL, modelsthe feedback signal as being separable into a heuristic evalu-ation of the utility of an action and a bias value that is drawnfrom a parametric distribution probabilistically, where thedistribution is defined by unknown parameters. We presentthe general form of the technique as well as a specific algo-rithm for integrating the technique with the TAMER algo-rithm for bias values drawn from a normal distribution. Wetest our algorithm against standard TAMER in the domain ofTetris using a synthetic oracle that provides feedback undervarying levels of distortion. We find our algorithm can learnvery quickly under bias distortions that entirely stymie thelearning of classic TAMER.
format Article
id doaj-art-777708c61ce94946a7028a593ecae4e6
institution OA Journals
issn 2334-0754
2334-0762
language English
publishDate 2021-04-01
publisher LibraryPress@UF
record_format Article
series Proceedings of the International Florida Artificial Intelligence Research Society Conference
spelling doaj-art-777708c61ce94946a7028a593ecae4e62025-08-20T01:52:19ZengLibraryPress@UFProceedings of the International Florida Artificial Intelligence Research Society Conference2334-07542334-07622021-04-013410.32473/flairs.v34i1.12847162865Bias Adaptive Statistical Inference Learning Agents for Learning from Human FeedbackJonathan I Watson0University of KentuckyWe present a novel technique for learning behaviors from ahuman provided feedback signal that is distorted by system-atic bias. Our technique, which we refer to as BASIL, modelsthe feedback signal as being separable into a heuristic evalu-ation of the utility of an action and a bias value that is drawnfrom a parametric distribution probabilistically, where thedistribution is defined by unknown parameters. We presentthe general form of the technique as well as a specific algo-rithm for integrating the technique with the TAMER algo-rithm for bias values drawn from a normal distribution. Wetest our algorithm against standard TAMER in the domain ofTetris using a synthetic oracle that provides feedback undervarying levels of distortion. We find our algorithm can learnvery quickly under bias distortions that entirely stymie thelearning of classic TAMER.https://journals.flvc.org/FLAIRS/article/view/128471interactive machine learningimlinteractive reinforcement learningirlbiashuman factorsbayesiantamertetris
spellingShingle Jonathan I Watson
Bias Adaptive Statistical Inference Learning Agents for Learning from Human Feedback
Proceedings of the International Florida Artificial Intelligence Research Society Conference
interactive machine learning
iml
interactive reinforcement learning
irl
bias
human factors
bayesian
tamer
tetris
title Bias Adaptive Statistical Inference Learning Agents for Learning from Human Feedback
title_full Bias Adaptive Statistical Inference Learning Agents for Learning from Human Feedback
title_fullStr Bias Adaptive Statistical Inference Learning Agents for Learning from Human Feedback
title_full_unstemmed Bias Adaptive Statistical Inference Learning Agents for Learning from Human Feedback
title_short Bias Adaptive Statistical Inference Learning Agents for Learning from Human Feedback
title_sort bias adaptive statistical inference learning agents for learning from human feedback
topic interactive machine learning
iml
interactive reinforcement learning
irl
bias
human factors
bayesian
tamer
tetris
url https://journals.flvc.org/FLAIRS/article/view/128471
work_keys_str_mv AT jonathaniwatson biasadaptivestatisticalinferencelearningagentsforlearningfromhumanfeedback