刘昊辰 (2024-09-06 09:51):
#paper arXiv:2012.11045v1 [cs.AI] 20 Dec 2020, Monte-Carlo Graph Search for AlphaZero. 这是一篇关于如何改进AlphaZero算法的研究论文。AlphaZero算法在棋类游戏中取得了显著成果,但传统的MCTS算法并不共享不同子树之间的信息,这限制了其效率。论文将AlphaZero的搜索树从有向树扩展到有向无环图,允许不同子树之间的信息流动,显著减少内存消耗;并提出了结合蒙特卡洛图搜索(MCGS)的一系列改进,包括 ϵ-greedy、改进的残局求解器和领域知识的整合作为约束条件。使用CrazyAra引擎在国际象棋和crazyhouse上进行评估,展示了这些改进为AlphaZero带来的显著提升。下载地址:https://arxiv.org/pdf/2012.11045
arXiv, 2020-12-20T22:51:38Z. DOI: 10.48550/arXiv.2012.11045
Monte-Carlo Graph Search for AlphaZero
翻译
Abstract:
The AlphaZero algorithm has been successfully applied in a range of discretedomains, most notably board games. It utilizes a neural network, that learns avalue and policy function to guide the exploration in a Monte-Carlo TreeSearch. Although many search improvements have been proposed for Monte-CarloTree Search in the past, most of them refer to an older variant of the UpperConfidence bounds for Trees algorithm that does not use a policy for planning.We introduce a new, improved search algorithm for AlphaZero which generalizesthe search tree to a directed acyclic graph. This enables information flowacross different subtrees and greatly reduces memory consumption. Along withMonte-Carlo Graph Search, we propose a number of further extensions, such asthe inclusion of Epsilon-greedy exploration, a revised terminal solver and theintegration of domain knowledge as constraints. In our evaluations, we use theCrazyAra engine on chess and crazyhouse as examples to show that these changesbring significant improvements to AlphaZero.
翻译
回到顶部