Q-learning with temporal memory to navigate turbulence

We consider the problem of olfactory searches in a turbulent environment. We focus on agents that respond solely to odor stimuli, with no access to spatial perception nor prior information about the odor. We ask whether navigation to a target can be learned robustly within a sequential decision maki...

Full description

Saved in:
Bibliographic Details
Main Authors: Marco Rando, Martin James, Alessandro Verri, Lorenzo Rosasco, Agnese Seminara
Format: Article
Language:English
Published: eLife Sciences Publications Ltd 2025-07-01
Series:eLife
Subjects:
Online Access:https://elifesciences.org/articles/102906
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850099139957751808
author Marco Rando
Martin James
Alessandro Verri
Lorenzo Rosasco
Agnese Seminara
author_facet Marco Rando
Martin James
Alessandro Verri
Lorenzo Rosasco
Agnese Seminara
author_sort Marco Rando
collection DOAJ
description We consider the problem of olfactory searches in a turbulent environment. We focus on agents that respond solely to odor stimuli, with no access to spatial perception nor prior information about the odor. We ask whether navigation to a target can be learned robustly within a sequential decision making framework. We develop a reinforcement learning algorithm using a small set of interpretable olfactory states and train it with realistic turbulent odor cues. By introducing a temporal memory, we demonstrate that two salient features of odor traces, discretized in a few olfactory states, are sufficient to learn navigation in a realistic odor plume. Performance is dictated by the sparse nature of turbulent odors. An optimal memory exists which ignores blanks within the plume and activates a recovery strategy outside the plume. We obtain the best performance by letting agents learn their recovery strategy and show that it is mostly casting cross wind, similar to behavior observed in flying insects. The optimal strategy is robust to substantial changes in the odor plumes, suggesting minor parameter tuning may be sufficient to adapt to different environments.
format Article
id doaj-art-bcbfce71297c403b8b464a699d6488e4
institution DOAJ
issn 2050-084X
language English
publishDate 2025-07-01
publisher eLife Sciences Publications Ltd
record_format Article
series eLife
spelling doaj-art-bcbfce71297c403b8b464a699d6488e42025-08-20T02:40:32ZengeLife Sciences Publications LtdeLife2050-084X2025-07-011310.7554/eLife.102906Q-learning with temporal memory to navigate turbulenceMarco Rando0https://orcid.org/0009-0008-3839-1429Martin James1Alessandro Verri2Lorenzo Rosasco3Agnese Seminara4https://orcid.org/0000-0001-5633-8180MaLGa, Department of Computer Science, Bioengineering, Robotics and Systems Engineering, University of Genova, Genoa, ItalyMalGa, Department of Civil, Chemical and Environmental Engineering, University of Genoa, Genova, ItalyMaLGa, Department of Computer Science, Bioengineering, Robotics and Systems Engineering, University of Genova, Genoa, ItalyMaLGa, Department of Computer Science, Bioengineering, Robotics and Systems Engineering, University of Genova, Genoa, ItalyMalGa, Department of Civil, Chemical and Environmental Engineering, University of Genoa, Genova, ItalyWe consider the problem of olfactory searches in a turbulent environment. We focus on agents that respond solely to odor stimuli, with no access to spatial perception nor prior information about the odor. We ask whether navigation to a target can be learned robustly within a sequential decision making framework. We develop a reinforcement learning algorithm using a small set of interpretable olfactory states and train it with realistic turbulent odor cues. By introducing a temporal memory, we demonstrate that two salient features of odor traces, discretized in a few olfactory states, are sufficient to learn navigation in a realistic odor plume. Performance is dictated by the sparse nature of turbulent odors. An optimal memory exists which ignores blanks within the plume and activates a recovery strategy outside the plume. We obtain the best performance by letting agents learn their recovery strategy and show that it is mostly casting cross wind, similar to behavior observed in flying insects. The optimal strategy is robust to substantial changes in the odor plumes, suggesting minor parameter tuning may be sufficient to adapt to different environments.https://elifesciences.org/articles/102906olfactory navigationturbulencereinforcement learningtime seriesmemory
spellingShingle Marco Rando
Martin James
Alessandro Verri
Lorenzo Rosasco
Agnese Seminara
Q-learning with temporal memory to navigate turbulence
eLife
olfactory navigation
turbulence
reinforcement learning
time series
memory
title Q-learning with temporal memory to navigate turbulence
title_full Q-learning with temporal memory to navigate turbulence
title_fullStr Q-learning with temporal memory to navigate turbulence
title_full_unstemmed Q-learning with temporal memory to navigate turbulence
title_short Q-learning with temporal memory to navigate turbulence
title_sort q learning with temporal memory to navigate turbulence
topic olfactory navigation
turbulence
reinforcement learning
time series
memory
url https://elifesciences.org/articles/102906
work_keys_str_mv AT marcorando qlearningwithtemporalmemorytonavigateturbulence
AT martinjames qlearningwithtemporalmemorytonavigateturbulence
AT alessandroverri qlearningwithtemporalmemorytonavigateturbulence
AT lorenzorosasco qlearningwithtemporalmemorytonavigateturbulence
AT agneseseminara qlearningwithtemporalmemorytonavigateturbulence