RL Perceptron: Generalization Dynamics of Policy Learning in High Dimensions

Reinforcement learning (RL) algorithms have transformed many domains of machine learning. To tackle real-world problems, RL often relies on neural networks to learn policies directly from pixels or other high-dimensional sensory input. By contrast, many theories of RL have focused on discrete state...

Full description

Saved in:
Bibliographic Details
Main Authors: Nishil Patel, Sebastian Lee, Stefano Sarao Mannelli, Sebastian Goldt, Andrew Saxe
Format: Article
Language:English
Published: American Physical Society 2025-05-01
Series:Physical Review X
Online Access:http://doi.org/10.1103/PhysRevX.15.021051
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850135630775844864
author Nishil Patel
Sebastian Lee
Stefano Sarao Mannelli
Sebastian Goldt
Andrew Saxe
author_facet Nishil Patel
Sebastian Lee
Stefano Sarao Mannelli
Sebastian Goldt
Andrew Saxe
author_sort Nishil Patel
collection DOAJ
description Reinforcement learning (RL) algorithms have transformed many domains of machine learning. To tackle real-world problems, RL often relies on neural networks to learn policies directly from pixels or other high-dimensional sensory input. By contrast, many theories of RL have focused on discrete state spaces or worst-case analysis, and fundamental questions remain about the dynamics of policy learning in high-dimensional settings. Here, we propose a solvable high-dimensional RL model that can capture a variety of learning protocols, and we derive its typical policy learning dynamics as a set of closed-form ordinary differential equations. We obtain optimal schedules for the learning rates and task difficulty—analogous to annealing schemes and curricula during training in RL—and show that the model exhibits rich behavior, including delayed learning under sparse rewards, a variety of learning regimes depending on reward baselines, and a speed-accuracy trade-off driven by reward stringency. Experiments on variants of the Procgen game “Bossfight” and Arcade Learning Environment game “Pong” also show such a speed-accuracy trade-off in practice. Together, these results take a step toward closing the gap between theory and practice in high-dimensional RL.
format Article
id doaj-art-8e653d4fda1446ab9a40f90daef89c76
institution OA Journals
issn 2160-3308
language English
publishDate 2025-05-01
publisher American Physical Society
record_format Article
series Physical Review X
spelling doaj-art-8e653d4fda1446ab9a40f90daef89c762025-08-20T02:31:21ZengAmerican Physical SocietyPhysical Review X2160-33082025-05-0115202105110.1103/PhysRevX.15.021051RL Perceptron: Generalization Dynamics of Policy Learning in High DimensionsNishil PatelSebastian LeeStefano Sarao MannelliSebastian GoldtAndrew SaxeReinforcement learning (RL) algorithms have transformed many domains of machine learning. To tackle real-world problems, RL often relies on neural networks to learn policies directly from pixels or other high-dimensional sensory input. By contrast, many theories of RL have focused on discrete state spaces or worst-case analysis, and fundamental questions remain about the dynamics of policy learning in high-dimensional settings. Here, we propose a solvable high-dimensional RL model that can capture a variety of learning protocols, and we derive its typical policy learning dynamics as a set of closed-form ordinary differential equations. We obtain optimal schedules for the learning rates and task difficulty—analogous to annealing schemes and curricula during training in RL—and show that the model exhibits rich behavior, including delayed learning under sparse rewards, a variety of learning regimes depending on reward baselines, and a speed-accuracy trade-off driven by reward stringency. Experiments on variants of the Procgen game “Bossfight” and Arcade Learning Environment game “Pong” also show such a speed-accuracy trade-off in practice. Together, these results take a step toward closing the gap between theory and practice in high-dimensional RL.http://doi.org/10.1103/PhysRevX.15.021051
spellingShingle Nishil Patel
Sebastian Lee
Stefano Sarao Mannelli
Sebastian Goldt
Andrew Saxe
RL Perceptron: Generalization Dynamics of Policy Learning in High Dimensions
Physical Review X
title RL Perceptron: Generalization Dynamics of Policy Learning in High Dimensions
title_full RL Perceptron: Generalization Dynamics of Policy Learning in High Dimensions
title_fullStr RL Perceptron: Generalization Dynamics of Policy Learning in High Dimensions
title_full_unstemmed RL Perceptron: Generalization Dynamics of Policy Learning in High Dimensions
title_short RL Perceptron: Generalization Dynamics of Policy Learning in High Dimensions
title_sort rl perceptron generalization dynamics of policy learning in high dimensions
url http://doi.org/10.1103/PhysRevX.15.021051
work_keys_str_mv AT nishilpatel rlperceptrongeneralizationdynamicsofpolicylearninginhighdimensions
AT sebastianlee rlperceptrongeneralizationdynamicsofpolicylearninginhighdimensions
AT stefanosaraomannelli rlperceptrongeneralizationdynamicsofpolicylearninginhighdimensions
AT sebastiangoldt rlperceptrongeneralizationdynamicsofpolicylearninginhighdimensions
AT andrewsaxe rlperceptrongeneralizationdynamicsofpolicylearninginhighdimensions