Q-learning with temporal memory to navigate turbulence
We consider the problem of olfactory searches in a turbulent environment. We focus on agents that respond solely to odor stimuli, with no access to spatial perception nor prior information about the odor. We ask whether navigation to a target can be learned robustly within a sequential decision maki...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
eLife Sciences Publications Ltd
2025-07-01
|
| Series: | eLife |
| Subjects: | |
| Online Access: | https://elifesciences.org/articles/102906 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850099139957751808 |
|---|---|
| author | Marco Rando Martin James Alessandro Verri Lorenzo Rosasco Agnese Seminara |
| author_facet | Marco Rando Martin James Alessandro Verri Lorenzo Rosasco Agnese Seminara |
| author_sort | Marco Rando |
| collection | DOAJ |
| description | We consider the problem of olfactory searches in a turbulent environment. We focus on agents that respond solely to odor stimuli, with no access to spatial perception nor prior information about the odor. We ask whether navigation to a target can be learned robustly within a sequential decision making framework. We develop a reinforcement learning algorithm using a small set of interpretable olfactory states and train it with realistic turbulent odor cues. By introducing a temporal memory, we demonstrate that two salient features of odor traces, discretized in a few olfactory states, are sufficient to learn navigation in a realistic odor plume. Performance is dictated by the sparse nature of turbulent odors. An optimal memory exists which ignores blanks within the plume and activates a recovery strategy outside the plume. We obtain the best performance by letting agents learn their recovery strategy and show that it is mostly casting cross wind, similar to behavior observed in flying insects. The optimal strategy is robust to substantial changes in the odor plumes, suggesting minor parameter tuning may be sufficient to adapt to different environments. |
| format | Article |
| id | doaj-art-bcbfce71297c403b8b464a699d6488e4 |
| institution | DOAJ |
| issn | 2050-084X |
| language | English |
| publishDate | 2025-07-01 |
| publisher | eLife Sciences Publications Ltd |
| record_format | Article |
| series | eLife |
| spelling | doaj-art-bcbfce71297c403b8b464a699d6488e42025-08-20T02:40:32ZengeLife Sciences Publications LtdeLife2050-084X2025-07-011310.7554/eLife.102906Q-learning with temporal memory to navigate turbulenceMarco Rando0https://orcid.org/0009-0008-3839-1429Martin James1Alessandro Verri2Lorenzo Rosasco3Agnese Seminara4https://orcid.org/0000-0001-5633-8180MaLGa, Department of Computer Science, Bioengineering, Robotics and Systems Engineering, University of Genova, Genoa, ItalyMalGa, Department of Civil, Chemical and Environmental Engineering, University of Genoa, Genova, ItalyMaLGa, Department of Computer Science, Bioengineering, Robotics and Systems Engineering, University of Genova, Genoa, ItalyMaLGa, Department of Computer Science, Bioengineering, Robotics and Systems Engineering, University of Genova, Genoa, ItalyMalGa, Department of Civil, Chemical and Environmental Engineering, University of Genoa, Genova, ItalyWe consider the problem of olfactory searches in a turbulent environment. We focus on agents that respond solely to odor stimuli, with no access to spatial perception nor prior information about the odor. We ask whether navigation to a target can be learned robustly within a sequential decision making framework. We develop a reinforcement learning algorithm using a small set of interpretable olfactory states and train it with realistic turbulent odor cues. By introducing a temporal memory, we demonstrate that two salient features of odor traces, discretized in a few olfactory states, are sufficient to learn navigation in a realistic odor plume. Performance is dictated by the sparse nature of turbulent odors. An optimal memory exists which ignores blanks within the plume and activates a recovery strategy outside the plume. We obtain the best performance by letting agents learn their recovery strategy and show that it is mostly casting cross wind, similar to behavior observed in flying insects. The optimal strategy is robust to substantial changes in the odor plumes, suggesting minor parameter tuning may be sufficient to adapt to different environments.https://elifesciences.org/articles/102906olfactory navigationturbulencereinforcement learningtime seriesmemory |
| spellingShingle | Marco Rando Martin James Alessandro Verri Lorenzo Rosasco Agnese Seminara Q-learning with temporal memory to navigate turbulence eLife olfactory navigation turbulence reinforcement learning time series memory |
| title | Q-learning with temporal memory to navigate turbulence |
| title_full | Q-learning with temporal memory to navigate turbulence |
| title_fullStr | Q-learning with temporal memory to navigate turbulence |
| title_full_unstemmed | Q-learning with temporal memory to navigate turbulence |
| title_short | Q-learning with temporal memory to navigate turbulence |
| title_sort | q learning with temporal memory to navigate turbulence |
| topic | olfactory navigation turbulence reinforcement learning time series memory |
| url | https://elifesciences.org/articles/102906 |
| work_keys_str_mv | AT marcorando qlearningwithtemporalmemorytonavigateturbulence AT martinjames qlearningwithtemporalmemorytonavigateturbulence AT alessandroverri qlearningwithtemporalmemorytonavigateturbulence AT lorenzorosasco qlearningwithtemporalmemorytonavigateturbulence AT agneseseminara qlearningwithtemporalmemorytonavigateturbulence |