文献收藏与分享平台

DeDe宝 (2023-09-23 08:50):

#paper https://www.nature.com/articles/s41598-022-18245-1: A comparison of reinforcement learning models of human spatial navigation , Scientific Reports, 2022,强化学习Reinforcement Learning, RL是机器学习的一个子领域，通过最大化长期的奖励的方式更新状态和行为进行学习。强化学习被广泛应用于决策、价值学习等领域，但用于描述人类空间导航的研究比较少，尤其是量化描述导航策略以及使用策略的一致性的研究就更少。本文比较了三类（共五个）强化学习模型对人类空间导航学习策略的量化描述，结果表明Model-Based RL和Model-Free RL线性加权所得的混合模型表现最好。

IF:3.800Q1 Scientific reports, 2022-08-17. DOI: 10.1038/s41598-022-18245-1 PMID: 35978035

A comparison of reinforcement learning models of human spatial navigation

Qiliang He, Jancy Ling Liu, Lou Eschapasse, Elizabeth H Beveridge, Thackery I Brown

Abstract:

Reinforcement learning (RL) models have been influential in characterizing human learning and decision making, but few studies apply them to characterizing human spatial navigation and even fewer systematically compare RL models under different navigation requirements. Because RL can characterize one's learning strategies quantitatively and in a continuous manner, and one's consistency of using such strategies, it can provide a novel and important perspective for understanding the marked individual differences in human navigation and disentangle navigation strategies from navigation performance. One-hundred and fourteen participants completed wayfinding tasks in a virtual environment where different phases manipulated navigation requirements. We compared performance of five RL models (3 model-free, 1 model-based and 1 "hybrid") at fitting navigation behaviors in different phases. Supporting implications from prior literature, the hybrid model provided the best fit regardless of navigation requirements, suggesting the majority of participants rely on a blend of model-free (route-following) and model-based (cognitive mapping) learning in such navigation scenarios. Furthermore, consistent with a key prediction, there was a correlation in the hybrid model between the weight on model-based learning (i.e., navigation strategy) and the navigator's exploration vs. exploitation tendency (i.e., consistency of using such navigation strategy), which was modulated by navigation task requirements. Together, we not only show how computational findings from RL align with the spatial navigation literature, but also reveal how the relationship between navigation strategy and a person's consistency using such strategies changes as navigation requirements change.

Related Links: