DeDe宝
(2026-02-05 02:37):
#paper, https://doi.org/10.1371/journal.pcbi.1013879 Information uncertainty influences learning strategy from sequentially delayed rewards. Plos Computational Biology.
本研究探索信息不确定性对延时奖励学习策略的影响。奖励延迟出现时,人类如何将奖励与先前事件关联?研究者操纵了奖励信息的不确定性:分离条件(即时奖励和延迟奖励的信息则分别呈现)和整合条件(奖励以即时奖励和延迟奖励的总和形式呈现),以探索信息不确定性对学习策略的影响。研究主要比较了三种模型:回顾模型(Elg,基于时间序列更新先前选择的价值)、前瞻模型(Tab,仅系统更新与奖励相关的过往选择)和混合模型(Hybrid,通过β参数调整两个模型的权重)。行为数据分析表明,被试能够掌握延迟奖励的关联规则,且低信息不确定性有助于促进学习表现,初始的低不确定性环境的学习体验会形成 “认知启动”,持续影响后续高不确定性环境中的策略使用。模型比较结果表明,两种模型都能捕捉被试核心行为特征,低信息不确定性时,前瞻模型更能解释被试行为;高不确定性时,回顾模型成为有效补充。上述结果表明人类会根据信息不确定性灵活切换混合学习策略。
PLOS Computational Biology,
2026-2-2.
DOI: 10.1371/journal.pcbi.1013879
Information uncertainty influences learning strategy from sequentially delayed rewards
翻译
Abstract:
When receiving a reward after a sequence of multiple events, how do we determine which event caused the reward? This problem, known as temporal credit assignment, can be difficult for humans to solve given the temporal uncertainty in the environment. Research to date has attempted to isolate dimensions of delay and reward during decision-making, but algorithmic solutions to temporal learning problems and the effect of uncertainty on learning remain underexplored. To further our understanding, we adapted a reward learning task that creates a temporal credit assignment problem by combining sequentially delayed rewards, intervening events, and varying uncertainty via the amount of information presented during feedback. Using computational modeling, two learning strategies were developed: an eligibility trace, whereby previously selected actions are updated as a function of the temporal sequence, and a tabular update, whereby only systematically related past actions (rather than unrelated intervening events) are updated. We hypothesized that reduced information uncertainty would correlate with increased use of the tabular strategy, given the model’s capacity to incorporate additional feedback information. Both models effectively learned the task, and predicted choices made by participants (N = 142) as well as specific behavioral signatures of credit assignment. Consistent with our hypothesis, the tabular model outperformed the eligibility model under low information uncertainty, as evidenced by more accurate predictions of participants’ behavior and an increase in tabular weight. These findings provide new insights into the mechanisms implemented by humans to solve temporal credit assignment and adapt their strategy in varying environments.
翻译
Related Links: