文献收藏与分享平台

刘昊辰 (2025-06-03 16:34):

#paper AlphaZero-Edu Making AlphaZero Accessible to Everyone. AlphaZero-Edu 是基于 AlphaZero 数学框架开发的轻量化强化学习框架，专为教育场景和五子棋设计，具有模块化架构（解耦蒙特卡洛树搜索、自我对弈训练、策略价值网络）、资源高效训练（单块 NVIDIA RTX 3090 GPU 即可运行）和高度并行自我对弈数据生成（8 进程实现 3.2 倍加速）等特点。其状态特征采用 21 层张量（含当前状态和 20 层历史状态），输出包含策略概率分布和价值评估标量，并通过旋转 / 翻转数据增强提升泛化能力。训练中结合循环学习率调度器，使策略损失和价值损失均收敛，且在与 4 名人类玩家的对战中实现最高 100% 胜率，最低 60% 胜率（含 20% 平局）。该框架已开源，为学术研究和工业应用提供了可访问的基准。下载地址：https://arxiv.org/pdf/2504.14636

arXiv, 2025-04-20T14:29:39Z. DOI: 10.48550/arXiv.2504.14636

AlphaZero-Edu: Making AlphaZero Accessible to Everyone

翻译

Binjie Guo, Hanyu Zheng, Guowei Su, Ru Zhang, Haohan Jiang, Xurong Lin, Hongyan Wei, Aisheng Mo, Jie Li, Zhiyuan Qian, ... >>>

Abstract:

Recent years have witnessed significant progress in reinforcement learning,especially with Zero-like paradigms, which have greatly boosted thegeneralization and reasoning abilities of large-scale language models.Nevertheless, existing frameworks are often plagued by high implementationcomplexity and poor reproducibility. To tackle these challenges, we presentAlphaZero-Edu, a lightweight, education-focused implementation built upon themathematical framework of AlphaZero. It boasts a modular architecture thatdisentangles key components, enabling transparent visualization of thealgorithmic processes. Additionally, it is optimized for resource-efficienttraining on a single NVIDIA RTX 3090 GPU and features highly parallelizedself-play data generation, achieving a 3.2-fold speedup with 8 processes. InGomoku matches, the framework has demonstrated exceptional performance,achieving a consistently high win rate against human opponents. AlphaZero-Eduhas been open-sourced at https://github.com/StarLight1212/AlphaZero_Edu,providing an accessible and practical benchmark for both academic research andindustrial applications.

翻译

Related Links:

https://doi.org/10.48550/arXiv.2504.14636