文献收藏与分享平台

1.

刘昊辰 (2025-07-09 14:59):

#paper Rapfi Distilling Efficient Neural Network for the Game of Gomoku. 本文提出 Rapfi，一种高效的五子棋智能体，在有限计算环境中表现优于基于 CNN 的智能体。Rapfi 利用从 CNN 提炼的基于模式的码本压缩神经网络，以及在输入变化较小时最小化计算的增量更新方案。这种新网络使用数量级更少的计算量，达到与 ResNet 等更大神经网络相似的精度。得益于增量更新方案，深度优先搜索方法（如 α-β 搜索）可以显著加速。通过精心调整评估和搜索，Rapfi 在缺乏 GPU 等加速器的有限计算资源下，实力超越了基于 AlphaZero 算法的最强开源五子棋 AI Katagomo。Rapfi 在 Botzone 的 520 个五子棋智能体中排名第一，并在 2024 年 GomoCup 中夺冠。下载地址：https://arxiv.org/pdf/2503.13178

arXiv, 2025-03-17T13:53:57Z. DOI: 10.48550/arXiv.2503.13178

Rapfi: Distilling Efficient Neural Network for the Game of Gomoku

翻译

Zhanggen Jin, Haobin Duan, Zhiyang Hang

Abstract:

Games have played a pivotal role in advancing artificial intelligence, withAI agents using sophisticated techniques to compete. Despite the success ofneural network based game AIs, their performance often requires significantcomputational … >>>

翻译

2.

刘昊辰 (2025-06-03 16:34):

#paper AlphaZero-Edu Making AlphaZero Accessible to Everyone. AlphaZero-Edu 是基于 AlphaZero 数学框架开发的轻量化强化学习框架，专为教育场景和五子棋设计，具有模块化架构（解耦蒙特卡洛树搜索、自我对弈训练、策略价值网络）、资源高效训练（单块 NVIDIA RTX 3090 GPU 即可运行）和高度并行自我对弈数据生成（8 进程实现 3.2 倍加速）等特点。其状态特征采用 21 层张量（含当前状态和 20 层历史状态），输出包含策略概率分布和价值评估标量，并通过旋转 / 翻转数据增强提升泛化能力。训练中结合循环学习率调度器，使策略损失和价值损失均收敛，且在与 4 名人类玩家的对战中实现最高 100% 胜率，最低 60% 胜率（含 20% 平局）。该框架已开源，为学术研究和工业应用提供了可访问的基准。下载地址：https://arxiv.org/pdf/2504.14636

arXiv, 2025-04-20T14:29:39Z. DOI: 10.48550/arXiv.2504.14636

AlphaZero-Edu: Making AlphaZero Accessible to Everyone

翻译

Abstract:

Recent years have witnessed significant progress in reinforcement learning,especially with Zero-like paradigms, which have greatly boosted thegeneralization and reasoning abilities of large-scale language models.Nevertheless, existing frameworks are often plagued by … >>>

翻译

3.

刘昊辰 (2025-05-10 11:40):

#paper doi.org/10.1007/978-3-030-35288-2, The Application of AlphaZero to Wargaming. 这是一篇关于尝试将AlphaZero应用于军事推演游戏 “珊瑚海”，以实现推演自动化的研究论文。军事推演与传统棋盘游戏的差异包括问题表示、游戏不对称、战略深度。通过监督学习引导AlphaZero，结合启发式知识和 MCTS 探索，可有效应对挑战，训练出的模型表现优于训练所用的启发式策略，且计算时间更短。

Lecture Notes in Computer Science AI 2019: Advances in Artificial Intelligence, 2019. DOI: 10.1007/978-3-030-35288-2_1

The Application of AlphaZero to Wargaming

翻译

Glennn Moy, Slava Shekh

Abstract: No abstract available.

4.

刘昊辰 (2025-04-10 15:46):

#paper Ancestor-Based α-β Bounds for Monte-Carlo Tree Search. 这是一篇关于如何通过引入祖先节点的α-β剪枝思想，改进MCTS的探索-利用平衡，提升决策准确性的研究论文。论文动态调整路径上的α（下界）和β（上界）值，结合UCT（Upper Confidence Bound for Trees）的置信区间。通过公式计算边界差异度量和探索调整项，修改UCT的选择策略。当α<β时优先利用已知路径，当α>β时转向探索新路径，类似极小极大剪枝。下载地址：https://bnaic2024.sites.uu.nl/wp-content/uploads/sites/986/2024/10/Ancestor-Based-%CE%B1-%CE%B2-Bounds-for-Monte-Carlo-Tree-Search.pdf

BNAIC/BeNeLearn 2024, 2024/10.

Ancestor-Based α-β Bounds for Monte-Carlo Tree Search

翻译

Tom Pepels, Mark H.M. Winands

Abstract: No abstract available.

5.

刘昊辰 (2025-03-19 16:41):

#paper Implementing the AlphaZero algorithm for Connect Four A deep reinforcement learning approach. 这是一篇关于如何将DeepMind的AlphaZero算法应用于Connect Four（四连棋）游戏的研究论文。论文成功地将AlphaZero算法应用于Connect Four游戏，展示了深度强化学习在复杂棋类游戏中的潜力。下载地址：https://www.researchgate.net/publication/377943555_Implementing_the_AlphaZero_algorithm_for_Connect_Four_A_deep_reinforcement_learning_approach/fulltext/65be3687790074549760c1bc/Implementing-the-AlphaZero-algorithm-for-Connect-Four-A-deep-reinforcement-learning-approach.pdf?_tp=eyJjb250ZXh0Ijp7ImZpcnN0UGFnZSI6InB1YmxpY2F0aW9uIiwicGFnZSI6InB1YmxpY2F0aW9uIn19

Applied and Computational Engineering, 2024-2-4. DOI: 10.54254/2755-2721/33/20230228

Implementing the AlphaZero algorithm for Connect Four: A deep reinforcement learning approach

翻译

Yubo Guo

Abstract:

The realm of board games presents a challenging domain for the application of artificial intelligence (AI), given their vast state-action space and inherent complexity. This paper explores the development of … >>>

翻译

6.

刘昊辰 (2025-02-25 22:38):

#paper Playing Hex and Counter Wargames using Reinforcement Learning and Recurrent Neural Networks. 这是一篇关于如何使用强化学习（Reinforcement Learning）和循环神经网络（Recurrent Neural Networks, RNN）来玩六角格战棋游戏（Hex and Counter Wargames）的研究论文。论文提出一种结合AlphaZero强化学习算法和循环神经网络的新系统，以应对六角格战棋游戏的战略复杂性。该系统能够在不同地形和战术情况下进行泛化，并探索其在更大地图上的扩展能力。提出的系统在有限的训练资源和计算能力下，能够在复杂的六角格战棋游戏中取得良好的表现，展示了其在复杂场景中的泛化能力。下载地址：https://arxiv.org/abs/2502.13918

arXiv, 2025-02-19T17:52:45Z. DOI: 10.48550/arXiv.2502.13918

Playing Hex and Counter Wargames using Reinforcement Learning and Recurrent Neural Networks

翻译

Guilherme Palma, Pedro A. Santos, João Dias

Abstract:

Hex and Counter Wargames are adversarial two-player simulations of realmilitary conflicts requiring complex strategic decision-making. Unlikeclassical board games, these games feature intricate terrain/unit interactions,unit stacking, large maps of varying sizes, … >>>

翻译

7.

刘昊辰 (2025-01-24 14:04):

#paper Proof Number Based Monte-Carlo Tree Search. 这篇论文提出了 PN-MCTS 算法，将蒙特卡洛树搜索（MCTS）和证明数搜索（PNS）相结合，通过在多个游戏领域实验，验证了该算法在部分游戏上相比传统 MCTS 的优势，为游戏搜索算法改进提供了新方向。下载地址：https://arxiv.org/pdf/2303.09449

arXiv, 2023-03-16T16:27:07Z. DOI: 10.48550/arXiv.2303.09449

Proof Number Based Monte-Carlo Tree Search

翻译

Jakub Kowalski, Elliot Doe, Mark H. M. Winands, Daniel Górski, Dennis J. N. J. Soemers

Abstract:

This paper proposes a new game-search algorithm, PN-MCTS, which combinesMonte-Carlo Tree Search (MCTS) and Proof-Number Search (PNS). These twoalgorithms have been successfully applied for decision making in a range ofdomains. … >>>

翻译

8.

刘昊辰 (2024-12-01 20:41):

#paper Connect6 Opening Leveraging AlphaZero Algorithm and Job-Level Computing. 这是一篇基于 AlphaZero算法和作业级计算构建 Connect6 开局库的方法的研究论文。开局库构建可提升棋类程序实力，在限时比赛中优势明显。过去依赖游戏特定知识构建开局库，存在质量依赖人类知识、方法难以通用等问题。本文提出基于 AlphaZero 构建高质量开局库的方法，不依赖领域知识。实验和比赛证明，该开局库可提升 Connect6 程序实力，在常见开局位置表现良好，助力程序在实际比赛中夺冠。本文提出的方法为构建高质量开局库提供了新思路，有望在其他棋类游戏中得到应用和推广。下载地址：https://www.jstage.jst.go.jp/article/pjsai/JSAI2021/0/JSAI2021_4N4IS1c05/_pdf/-char/ja

The 35th Annual Conference of the Japanese Society for Artificial Intelligence, 2021.

Connect6 Opening Leveraging AlphaZero Algorithm and Job-Level Computing

翻译

Abstract: No abstract available.

9.

刘昊辰 (2024-11-21 16:22):

#paper The *-Minimax Search Procedure for Trees Containing Chance Nodes. 这是一篇关于树搜索模型的研究论文。论文开发了一种将 alpha - beta 树剪枝策略扩展到包含 “概率” 节点（* 节点）的游戏树的方法，节点的值定义为其后续节点值的（可能加权的）平均值，这些树被称为 “ - minimax” 树，适用于涉及机会但无隐藏信息的游戏。基于搜索策略，重新制定并分析了几种用于 * - minimax 树的算法。首先开发了一种从左到右的深度优先算法，该算法可将穷举搜索策略的复杂度降低 25 - 30%。然后制定了一种改进算法，用于 “探测”“常规”* - minimax 树的机会节点下方，在这种树中玩家交替移动且机会事件穿插其中。在后继节点随机排序的情况下，该改进算法可减少超过 50% 的搜索量，在最优排序下，可将搜索复杂度降低一个数量级。在研究了前两种算法在更深层次树中的节省情况后，又提出并分析了另外两种算法。下载地址：https://www.cs.uleth.ca/~benkoczi/3750/data/ballard83-star_alpha_beta.pdf

Artificial Intelligence, 1983.

The *-Minimax Search Procedure for Trees Containing Chance Nodes

翻译

Abstract: No abstract available.

10.

刘昊辰 (2024-10-12 10:09):

#paper arXiv:2409.12272v1 [cs.LG] 18 Sep 2024, Mastering Chess with a Transformer Model. 这是一篇关于Transformer模型在国际象棋中的应用的研究论文。论文证明了Transformer在国际象棋中的有效性在很大程度上取决于注意力机制中位置编码的选择。基于这一观察，论文采用了Shaw等人的通用位置编码方案，并大规模地训练了具有这种技术和其他增强功能的模型，将得到的架构称为ChessFormer。这种架构在对弈实力和解谜能力方面显著优于先前的工作，且计算成本只是其一小部分。下载地址：https://arxiv.org/pdf/2409.12272

arXiv, 2024-09-18T19:05:21Z. DOI: 10.48550/arXiv.2409.12272

Mastering Chess with a Transformer Model

翻译

Daniel Monroe, The Leela Chess Zero Team

Abstract:

Transformer models have demonstrated impressive capabilities when trained atscale, excelling at difficult cognitive tasks requiring complex reasoning andrational decision-making. In this paper, we explore the application oftransformer models to chess, … >>>

翻译

11.

刘昊辰 (2024-09-06 09:51):

#paper arXiv:2012.11045v1 [cs.AI] 20 Dec 2020, Monte-Carlo Graph Search for AlphaZero. 这是一篇关于如何改进AlphaZero算法的研究论文。AlphaZero算法在棋类游戏中取得了显著成果，但传统的MCTS算法并不共享不同子树之间的信息，这限制了其效率。论文将AlphaZero的搜索树从有向树扩展到有向无环图，允许不同子树之间的信息流动，显著减少内存消耗；并提出了结合蒙特卡洛图搜索（MCGS）的一系列改进，包括 ϵ-greedy、改进的残局求解器和领域知识的整合作为约束条件。使用CrazyAra引擎在国际象棋和crazyhouse上进行评估，展示了这些改进为AlphaZero带来的显著提升。下载地址：https://arxiv.org/pdf/2012.11045

arXiv, 2020-12-20T22:51:38Z. DOI: 10.48550/arXiv.2012.11045

Monte-Carlo Graph Search for AlphaZero

翻译

Johannes Czech, Patrick Korus, Kristian Kersting

Abstract:

The AlphaZero algorithm has been successfully applied in a range of discretedomains, most notably board games. It utilizes a neural network, that learns avalue and policy function to guide the … >>>

翻译

12.

刘昊辰 (2024-08-20 15:24):

#paper arXiv:2406.00741v1 [cs.AI] 2 Jun 2024, Learning to Play 7 Wonders Duel Without Human Supervision. 这篇论文介绍了玩桌游七大奇迹对决的人工智能程序ZeusAI。ZeusAI的灵感来源于AlphaZero强化学习算法，它结合了MCTS和Transformer，在没有人类监督的情况下学习游戏。ZeusAI与人类玩家的对弈结果显示，它达到了非常高的竞技水平，赢得了38局中的26局。文章以ZeusAI为工具研究了该桌游的平衡性。社区普遍认为先手玩家有显著优势，ZeusAI的自我对弈游戏证实了这一点。文章提出了一些规则变体，以减少这种不平衡，例如改变初始金币数量或改变奇迹选择阶段。下载地址：https://arxiv.org/pdf/2406.00741

arXiv, 2024-06-02T13:28:57Z. DOI: 10.48550/arXiv.2406.00741

Learning to Play 7 Wonders Duel Without Human Supervision

翻译

Giovanni Paolini, Lorenzo Moreschini, Francesco Veneziano, Alessandro Iraci

Abstract:

This paper introduces ZeusAI, an artificial intelligence system developed toplay the board game 7 Wonders Duel. Inspired by the AlphaZero reinforcementlearning algorithm, ZeusAI relies on a combination of Monte Carlo … >>>

翻译