来自杂志 arXiv 的文献。
当前共找到 169 篇文献分享,本页显示第 1 - 20 篇。
1.
刘昊辰
(2026-05-01 09:17):
#paper Not all Chess960 positions are equally complex. 本文以Stockfish 17.1为分析工具,对Chess960全部960种起始局面开展量化研究,证实并非所有Chess960局面复杂度均等;研究发现白方先手优势是国际象棋固有结构特征(均值+0.33±0.12兵),并提出信息成本度量S(n)量化决策复杂度,总复杂度区间2.6–17.2比特、决策不对称性-4.5~+4.2比特;传统国象起始局#518在复杂度与平衡性上均处于平均水平,无特殊最优地位,仅为历史文化选择。下载地址:https://arxiv.org/pdf/2512.14319
arXiv,
16 Dec 2025.
Marc Barthelemy
Abstract:
We analyze strategic complexity across all 960 Chess960 (Fischer Random Chess) starting positions. Stockfish evaluations reveal a near-universal first-move advantage for White ( pawns), indicating that the initiative is a robust structural feature of the game. To quantify decision difficulty, we introduce an information-based measure that captures the cumulative information required to identify optimal moves over the first plies. This measure decomposes into White and Black contributions, and , defining a total opening complexity and a decision asymmetry . Across the ensemble, ranges from to … >>>
We analyze strategic complexity across all 960 Chess960 (Fischer Random Chess) starting positions. Stockfish evaluations reveal a near-universal first-move advantage for White ( pawns), indicating that the initiative is a robust structural feature of the game. To quantify decision difficulty, we introduce an information-based measure that captures the cumulative information required to identify optimal moves over the first plies. This measure decomposes into White and Black contributions, and , defining a total opening complexity and a decision asymmetry . Across the ensemble, ranges from to bits, while spans from to bits (mean ), showing that openings are nearly evenly split between those that burden White and those that burden Black, with a slight average excess complexity for White. Standard chess (position \#518, \texttt{RNBQKBNR}) exhibits near-average total complexity and asymmetry, yet lies far from the configuration that jointly minimizes evaluation imbalance and decision asymmetry. These results reveal a highly heterogeneous Chess960 landscape in which small rearrangements of back-rank pieces can substantially alter strategic depth and competitive balance. The classical starting position--despite centuries of refinement--appears not as an extremum, but as one configuration among many in a broad statistical ensemble. <<<
2.
符毓
(2026-04-30 22:46):
#paper doi: arXiv:2604.26509v1, 2026, 3D Generation for Embodied AI and Robotic Simulation: A Survey. 本文以仿真为中心,对具身人工智能的3D生成技术进行了综述,并围绕三个部分展开:数据生成器——用于生成可用于仿真的资源,仿真环境——用于构建交互式世界,以及Sim2Real桥梁——用于支持现实世界的迁移。在每个部分中,都追踪了从面向外观的生成到感知物理特性、兼容仿真器的输出的演进过程:数据生成器越来越多地生成带有物理标注和运动学结构的资源;场景级方法将物理和语义约束集成到布局合成中;而Sim2Real方法则利用生成模型来缩小外观和动力学方面的领域差距
在这三个方面,都呈现出一个一致的趋势:3D生成的目标已从视觉上的逼真性转向仿真就绪性,这使得生成成为具身学习的核心基础设施层。然而,关键挑战依然存在,包括物理标注的匮乏、几何真实性和仿真器部署能力之间的差距、对可变形和动态资源的支持有限、评估标准分散以及持续存在的仿真与现实之间的差距
从根本上讲,当前的生态系统仍然是模块化且互不相连的,生成模型、物理引擎和机器人学习系统各自独立优化,并通过脆弱的转换流程连接起来
arXiv,
2026-04-29T10:17:55Z.
DOI: 10.48550/arXiv.2604.26509
Tianwei Ye,
Yifan Mao,
Minwen Liao,
Jian Liu,
Chunchao Guo,
Dazhao Du,
Quanxin Shou,
Fangqi Zhu,
Song Guo
Abstract:
Embodied AI and robotic systems increasingly depend on scalable, diverse, and physically grounded 3D content for simulation-based training and real-world deployment. While 3D generative modeling has advanced rapidly, embodied applications impose requirements far beyond visual realism: generated objects must carry kinematic structure and material properties, scenes must support interaction and task execution, and the resulting content must bridge the gap between simulation and reality. This survey presents the first survey of 3D generation for embodied AI and organizes the literature around th… >>>
Embodied AI and robotic systems increasingly depend on scalable, diverse, and physically grounded 3D content for simulation-based training and real-world deployment. While 3D generative modeling has advanced rapidly, embodied applications impose requirements far beyond visual realism: generated objects must carry kinematic structure and material properties, scenes must support interaction and task execution, and the resulting content must bridge the gap between simulation and reality. This survey presents the first survey of 3D generation for embodied AI and organizes the literature around three roles that 3D generation plays in embodied systems. In \emph{Data Generator}, 3D generation produces simulation-ready objects and assets, including articulated, physically grounded, and deformable content for downstream interaction; in \emph{Simulation Environments}, it constructs interactive and task-oriented worlds, spanning structure-aware, controllable, and agentic scene generation; and in \emph{Sim2Real Bridge}, it supports digital twin reconstruction, data augmentation, and synthetic demonstrations for downstream robot learning and real-world transfer. We also show that the field is shifting from visual realism toward interaction readiness, and we identify the main bottlenecks, including limited physical annotations, the gap between geometric quality and physical validity, fragmented evaluation, and the persistent sim-to-real divide, that must be addressed for 3D generation to become a dependable foundation for embodied intelligence. Our project page is at https://3dgen4robot.github.io. <<<
3.
林海onrush
(2026-04-30 01:30):
#paper, DFT: A Dual-branch Framework of Fluctuation and Trend for Stock Price Prediction",DOI:https://arxiv.org/abs/2411.06065,这篇论文提出 DFT(Dual-branch Framework of Fluctuation and Trend),用于股票价格/收益预测:作者认为传统模型容易混合股票的长期趋势与短期波动,且对跨时间因果关系建模不足,因此将股票表征分解为趋势分支和波动分支,分别用不同顺序建模时间相关性与股票间相关性;其中时间建模采用 RWKV 以保留时序因果性,股票相关性则用自注意力机制捕捉。实验在 CSI300、CSI800 和 S&P500 上显示,DFT 在 IC、RankIC、年化收益 AR 和信息比率 IR 等指标上显著优于 LSTM、Informer、StockMixer、MASTER 等基线,消融实验也表明趋势/波动分解、双分支结构和时间因果建模都是性能提升的关键。
arXiv,
9 Nov 2024.
Chengqi Dong, Zhiyuan Cao, S Kevin Zhou, Jia Liu
Abstract:
Stock price prediction is of significant importance in quantitative investment. Existing approaches encounter two primary issues: First, they often overlook the crucial role of capturing short-term stock fluctuations for predicting high-volatility returns. Second, mainstream methods, relying on graphs or attention mechanisms, inadequately explore the temporal relationships among stocks, often blurring distinctions in their characteristics over time and the causal relationships before and after. However, the high volatility of stocks and the intricate market correlations are crucial to accurat… >>>
Stock price prediction is of significant importance in quantitative investment. Existing approaches encounter two primary issues: First, they often overlook the crucial role of capturing short-term stock fluctuations for predicting high-volatility returns. Second, mainstream methods, relying on graphs or attention mechanisms, inadequately explore the temporal relationships among stocks, often blurring distinctions in their characteristics over time and the causal relationships before and after. However, the high volatility of stocks and the intricate market correlations are crucial to accurately predicting stock prices. To address these challenges, we propose a Dual-branch Framework of Fluctuation and Trend (DFT), which decomposes stocks into trend and fluctuation components. By employing a carefully design decomposition module, DFT effectively extracts short-term fluctuations and trend information from stocks while explicitly modeling temporal variations and causal correlations. Our extensive experiments demonstrate that DFT outperforms existing methods across multiple metrics, including a 300% improvement in ranking metrics and a 400% improvement in portfolio-based indicators. Through detailed experiments, we provide valuable insights into different roles of trends and fluctuations in stock price prediction. <<<
4.
刘昊辰
(2026-04-01 15:10):
#paper THE CDE METHOD A TECHNIQUE IN FUNCTIONAL EQUATIONS. 本文提出了一种解决中学数学竞赛中函数方程问题的新的较为通用的方法(CDE方法),并给出3个相关引理,28个例题和若干习题。此方法在最近几年的数学竞赛中已经有所应用,也被AoPS论坛讨论过,值得关注中学数学竞赛动向的人学习。下载地址:https://arxiv.org/abs/1901.11131
arXiv,
2019-01-30T22:42:29Z.
DOI: 10.48550/arXiv.1901.11131
Athanasios Kontogeorgis,
Rafail Tsiamis
Abstract:
In this article we present an extremely effective and relatively unknown approach to solving functional equations that appear in mathematical competitions. We aim to explain the philosophy of this novel method through numerous examples, which also highlight how this idea can be paired with other useful techniques to crack challenging problems.
5.
尹志
(2026-03-31 23:30):
#paper, Quantum-HPC hybrid computation of biomolecular excited-state energies, DOI: 10.48550/arXiv.2601.15677.
通过ONIOM框架,结合TE-QSCI算法,在离子阱方案上实现了视网膜醛的光异构化的S0、S1以及T0的能量计算。非常好的量子+HPC混合计算的例子。
arXiv,
2026-01-22T05:57:54Z.
DOI: 10.48550/arXiv.2601.15677
Abstract:
We develop a workflow within the ONIOM framework and demonstrate it on the hybrid computing system consisting of the supercomputer Fugaku and the Quantinuum Reimei trapped-ion quantum computer. This hybrid platform extends the layered approach for biomolecular chemical reactions to accurately treat the active site, such as a protein, and the large and often weakly correlated molecular environment. Our result marks a significant milestone in enabling scalable and accurate simulation of complex biomolecular reactions
6.
林海onrush
(2026-03-31 20:08):
#paper, Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation, DOI: 10.48550/arXiv.2412.08139. 论文提出用 Wasserstein Distance来替代知识蒸馏中长期主流的 KL Divergence(KL 散度).作者认为 KL 只擅长做“同类别对同类别”的概率对齐,难以显式利用类别之间的相似关系,而且在中间层特征蒸馏中对高维、稀疏、分布不重叠的数据也不够合适;因此他们分别设计了基于离散 WD 的WKD-L来做 logit 蒸馏、基于连续 WD 的WKD-F来做特征蒸馏,并在 ImageNet、CIFAR-100、Self-KD 和 MS-COCO 上都取得了优于多种 KL 系方法和强基线的方法效果,说明 WD 在知识蒸馏里不仅可用,而且在不少场景下甚至优于 KL 散度。
arXiv,
2024/12/11.
Jiaming Lv, Haoyuan Yang, Peihua Li
Abstract:
Since pioneering work of Hinton et al., knowledge distillation based on Kullback-Leibler Divergence (KL-Div) has been predominant, and recently its variants have achieved compelling performance. However, KL-Div only compares probabilities of the corresponding category between the teacher and student while lacking a mechanism for cross-category comparison. Besides, KL-Div is problematic when applied to intermediate layers, as it cannot handle non-overlapping distributions and is unaware of geometry of the underlying manifold. To address these downsides, we propose a methodology of Wasserstein … >>>
Since pioneering work of Hinton et al., knowledge distillation based on Kullback-Leibler Divergence (KL-Div) has been predominant, and recently its variants have achieved compelling performance. However, KL-Div only compares probabilities of the corresponding category between the teacher and student while lacking a mechanism for cross-category comparison. Besides, KL-Div is problematic when applied to intermediate layers, as it cannot handle non-overlapping distributions and is unaware of geometry of the underlying manifold. To address these downsides, we propose a methodology of Wasserstein Distance (WD) based knowledge distillation. Specifically, we propose a logit distillation method called WKD-L based on discrete WD, which performs cross-category comparison of probabilities and thus can explicitly leverage rich interrelations among categories. Moreover, we introduce a feature distillation method called WKD-F, which uses a parametric method for modeling feature distributions and adopts continuous WD for transferring knowledge from intermediate layers. Comprehensive evaluations on image classification and object detection have shown (1) for logit distillation WKD-L outperforms very strong KL-Div variants; (2) for feature distillation WKD-F is superior to the KL-Div counterparts and state-of-the-art competitors. The source code is available at https://peihuali.org/WKD <<<
7.
刘昊辰
(2026-03-02 09:15):
#paper Resource-Efficient Model-Free Reinforcement Learning for Board Games. 本文介绍了一种名为 KLENT (Kullback-Leibler and Entropy Regularized Policy Optimization) 的新型无模型(Model-Free)强化学习算法,旨在解决传统基于搜索的棋类游戏AI(如AlphaZero)计算资源消耗巨大的问题。KLENT 展示了通过合理组合现有的RL技术(KL正则、熵正则、λ-returns),可以在不牺牲性能的前提下,大幅降低棋类AI的训练门槛。下载地址:https://arxiv.org/pdf/2602.10894
arXiv,
2026-02-11T14:25:38Z.
DOI: 10.48550/arXiv.2602.10894
Kazuki Ota,
Takayuki Osa,
Motoki Omura,
Tatsuya Harada
Abstract:
Board games have long served as complex decision-making benchmarks in artificial intelligence. In this field, search-based reinforcement learning methods such as AlphaZero have achieved remarkable success. However, their significant computational demands have been pointed out as barriers to their reproducibility. In this study, we propose a model-free reinforcement learning algorithm designed for board games to achieve more efficient learning. To validate the efficiency of the proposed method, we conducted comprehensive experiments on five board games: Animal Shogi, Gardner Chess, Go, Hex, an… >>>
Board games have long served as complex decision-making benchmarks in artificial intelligence. In this field, search-based reinforcement learning methods such as AlphaZero have achieved remarkable success. However, their significant computational demands have been pointed out as barriers to their reproducibility. In this study, we propose a model-free reinforcement learning algorithm designed for board games to achieve more efficient learning. To validate the efficiency of the proposed method, we conducted comprehensive experiments on five board games: Animal Shogi, Gardner Chess, Go, Hex, and Othello. The results demonstrate that the proposed method achieves more efficient learning than existing methods across these environments. In addition, our extensive ablation study shows the importance of core techniques used in the proposed method. We believe that our efficient algorithm shows the potential of model-free reinforcement learning in domains traditionally dominated by search-based methods. <<<
8.
尹志
(2026-02-28 23:14):
#paper,DOI: arXiv:2601.10144,Bridging Superconducting and Neutral-Atom Platforms for Efficient Fault-Tolerant Quantum Architectures,
本文提出了一种整合超导和中性原子方案的混合量子计算架构,且面向容错。很有启发性很有前瞻性。考虑到不同量子计算体系的特点,混合方案确实有机会在未来带来有价值的变革。今年我们也会从问题域视角进行混合架构的探索。
arXiv,
2026-01-15T07:39:05Z.
DOI: 10.48550/arXiv.2601.10144
Xiang Fang,
Jixuan Ruan,
Sharanya Prabhu,
Ang Li,
Travis Humble,
Dean Tullsen,
Yufei Ding
Abstract:
The transition to the fault-tolerant era exposes the limitations of homogeneous quantum systems, where no single qubit modality simultaneously offers optimal operation speed, connectivity, and scalability. In this work, we propose a strategic approach to Heterogeneous Quantum Architectures (HQA) that synthesizes the distinct advantages of the superconducting (SC) and neutral atom (NA) platforms. We explore two architectural role assignment strategies based on hardware characteristics: (1) We offload the latency-critical Magic State Factory (MSF) to fast SC devices while performing computation… >>>
The transition to the fault-tolerant era exposes the limitations of homogeneous quantum systems, where no single qubit modality simultaneously offers optimal operation speed, connectivity, and scalability. In this work, we propose a strategic approach to Heterogeneous Quantum Architectures (HQA) that synthesizes the distinct advantages of the superconducting (SC) and neutral atom (NA) platforms. We explore two architectural role assignment strategies based on hardware characteristics: (1) We offload the latency-critical Magic State Factory (MSF) to fast SC devices while performing computation on scalable NA arrays, a design we term MagicAcc, which effectively mitigates the resource-preparation bottleneck. (2) We explore a Memory-Compute Separation (MCSep) paradigm that utilizes NA arrays for high-density qLDPC memory storage and SC devices for fast surface-code processing. Our evaluation, based on a comprehensive end-to-end cost model, demonstrates that principled heterogeneity yields significant performance gains. Specifically, our designs achieve $752\times$ speedup over NA-only baselines on average and reduce the physical qubit footprint by over $10\times$ compared to SC-only systems. These results chart a clear pathway for leveraging cross-modality interconnects to optimize the space-time efficiency of future fault-tolerant quantum computers. <<<
9.
刘昊辰
(2026-02-02 09:27):
#paper Particle Builder A Board Game for the Teaching of the Standard Model of Particle Physics at a Secondary Level.《Particle Builder》是一款于2016年由国际物理教师团队研发的桌游,后推出浏览器在线版本(支持与基础AI对战),专为高中阶段教学设计,通过7个难度递增的关卡,以互动gameplay传授粒子物理学标准模型的核心知识(如夸克、轻子、反物质等),经281名澳大利亚高中生测试,225人完成前后测,平均学习增益达0.16,媲美1.5周(约7小时)传统教学效果,且94%的学生认为其比常规科学课更有趣,88%认为更具参与感,物理版和在线版均免费向教师开放。下载地址:https://arxiv.org/pdf/2511.21116
arXiv,
2025-11-26T07:02:18Z.
DOI: 10.48550/arXiv.2511.21116
Abstract:
We present Particle Builder, an online board game which teaches students about concepts from the Standard Model of Particle Physics at a high school level. This short activity resulted in a gain of 0.16, indicating that students learned a significant amount of particle physics knowledge. Students found the activity was more engaging and less difficult than a normal classroom lesson.
10.
林海onrush
(2026-01-31 23:55):
#paper,DOI: arXiv:2406.03816,ReST-MCTS: LLM Self-Training via Process Reward Guided Tree Search,本文提出ReST-MCTS,一种将过程奖励(Process Reward)与改进的蒙特卡洛树搜索(MCTS)相结合的大语言模型自训练框架,旨在解决现有自训练方法仅依赖最终正确答案、却容易引入低质量中间推理的问题。该方法在仅已知最终正确答案的情况下,通过树搜索中的多次 rollout 自动推断每一步中间推理对通向正确解的贡献概率,从而生成高质量的过程奖励信号,用于同时训练策略模型和过程奖励模型。实验结果表明,在相同搜索预算下,ReST-MCTS*在推理准确率上优于 Best-of-N、Tree-of-Thought 等方法,并在多轮自训练中持续提升模型性能,显著超过 ReSTEM、Self-Rewarding 等已有自训练范式,验证了其在高质量推理轨迹获取和稳定自提升方面的有效性
arXiv,
2024-06-06T07:40:00Z.
DOI: 10.48550/arXiv.2406.03816
Dan Zhang,
Sining Zhoubian,
Ziniu Hu,
Yisong Yue,
Yuxiao Dong,
Jie Tang
Abstract:
Recent methodologies in LLM self-training mostly rely on LLM generating responses and filtering those with correct output answers as training data. This approach often yields a low-quality fine-tuning training set (e.g., incorrect plans or intermediate reasoning). In this paper, we develop a reinforced self-training approach, called ReST-MCTS*, based on integrating process reward guidance with tree search MCTS* for collecting higher-quality reasoning traces as well as per-step value to train policy and reward models. ReST-MCTS* circumvents the per-step manual annotation typically used to trai… >>>
Recent methodologies in LLM self-training mostly rely on LLM generating responses and filtering those with correct output answers as training data. This approach often yields a low-quality fine-tuning training set (e.g., incorrect plans or intermediate reasoning). In this paper, we develop a reinforced self-training approach, called ReST-MCTS*, based on integrating process reward guidance with tree search MCTS* for collecting higher-quality reasoning traces as well as per-step value to train policy and reward models. ReST-MCTS* circumvents the per-step manual annotation typically used to train process rewards by tree-search-based reinforcement learning: Given oracle final correct answers, ReST-MCTS* is able to infer the correct process rewards by estimating the probability this step can help lead to the correct answer. These inferred rewards serve dual purposes: they act as value targets for further refining the process reward model and also facilitate the selection of high-quality traces for policy model self-training. We first show that the tree-search policy in ReST-MCTS* achieves higher accuracy compared with prior LLM reasoning baselines such as Best-of-N and Tree-of-Thought, within the same search budget. We then show that by using traces searched by this tree-search policy as training data, we can continuously enhance the three language models for multiple iterations, and outperform other self-training algorithms such as ReST$^\text{EM}$ and Self-Rewarding LM. We release all code at https://github.com/THUDM/ReST-MCTS. <<<
11.
尹志
(2026-01-31 23:53):
#paper https://arxiv.org/abs/2601.21571. arxiv 2026. Shaping capabilities with token-level data filtering。文档级过滤过渡到Token 级过滤确实是很直接的想法,但用良好的工程实现获得洞见,确实是alec的风格。
arXiv,
2026-01-29T11:34:01Z.
DOI: 10.48550/arXiv.2601.21571
Neil Rathi,
Alec Radford
Abstract:
Current approaches to reducing undesired capabilities in language models are largely post hoc, and can thus be easily bypassed by adversaries. A natural alternative is to shape capabilities during pretraining itself. On the proxy task of removing medical capabilities, we show that the simple intervention of filtering pretraining data is highly effective, robust, and inexpensive at scale. Inspired by work on data attribution, we show that filtering tokens is more effective than filtering documents, achieving the same hit to undesired capabilities at a lower cost to benign ones. Training models… >>>
Current approaches to reducing undesired capabilities in language models are largely post hoc, and can thus be easily bypassed by adversaries. A natural alternative is to shape capabilities during pretraining itself. On the proxy task of removing medical capabilities, we show that the simple intervention of filtering pretraining data is highly effective, robust, and inexpensive at scale. Inspired by work on data attribution, we show that filtering tokens is more effective than filtering documents, achieving the same hit to undesired capabilities at a lower cost to benign ones. Training models spanning two orders of magnitude, we then demonstrate that filtering gets more effective with scale: for our largest models, token filtering leads to a 7000x compute slowdown on the forget domain. We also show that models trained with token filtering can still be aligned on the forget domain. Along the way, we introduce a methodology for labeling tokens with sparse autoencoders and distilling cheap, high-quality classifiers. We also demonstrate that filtering can be robust to noisy labels with sufficient pretraining compute. <<<
12.
Vincent
(2026-01-31 17:31):
#paper https://arxiv.org/abs/2201.11903
arxiv 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.
这篇文首次提出了Chain-of-Thought(CoT)的思路,通过在少样本提示中显式提供中间自然语言推理步骤,可以显著提升大语言模型在复杂推理任务上的表现。作者在多种推理任务基准测试上展示了 CoT 的显著增益,尤其在 100B+ 参数规模模型上表现为一种随规模涌现(emergent)的能力。消融实验表明,性能提升并非仅来自“多算一步”,而是顺序化、可读的推理过程本身在发挥作用。该方法无需额外训练或微调,仅通过提示即可实现,因而得以广泛运用,为大模型的可解释推理研究开辟了新方向
arXiv,
2022-01-28T02:33:07Z.
DOI: 10.48550/arXiv.2201.11903
Jason Wei,
Xuezhi Wang,
Dale Schuurmans,
Maarten Bosma,
Brian Ichter,
Fei Xia,
Ed Chi,
Quoc Le,
Denny Zhou
Abstract:
We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. T… >>>
We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier. <<<
13.
刘昊辰
(2026-01-04 09:37):
#paper Collapsi is strongly solved. 2025年6月由Mark S. Ball发布的两人完全信息游戏Collapsi,在16张牌(含4张A、4张2、4张3、2张4、2张Joker)组成的4×4环形棋盘上进行,玩家轮流依据所在牌面数值移动棋子,移动后起始牌翻面,无合法移动者输;Michael Young通过对称破缺将初始16!(约2.1×10¹³)种牌局简化,用带α-β剪枝的极小极大搜索算法开发求解器,20毫秒内可找最优移动,在13代Intel Core i5-13500处理器上耗时7小时29分钟完成47,297,250种等效牌局分析,发现先手(红方)仅37.5%牌局可必赢,后手(蓝方)62.5%牌局可必赢,游戏最短必赢步数为7回合,6.4%牌局中败方能将游戏拖至最大14回合,最终证明该游戏被强解。下载地址:https://arxiv.org/pdf/2507.16823
arXiv,
4 Jul 2025.
DOI: 10.48550/arXiv.2507.16823
14.
林海onrush
(2025-12-31 21:49):
#paper, Superposition Yields Robust Neural Scaling, DOI: 10.48550/arXiv.2505.10465. NIPS2025的亚军论文奖,MIT物理团队出身的AI工作,这篇论文提出:神经网络的幂律缩放(模型越宽/维度越大,loss 越低)可能主要源自表示层的“叠加/超位置(superposition)”机制——当需要表示的特征数远大于隐藏维度时,模型会把许多特征压进同一组维度里,导致表示向量之间的重叠干扰;随着维度 (m) 增大,随机几何使这种重叠的平均强度自然按 (~ 1/m) 下降,从而产生鲁棒的 (L∝ 1/m) 幂律缩放。作者用可控的 toy model 对比了弱与强 superposition:弱 superposition 下缩放更依赖数据特征频率的幂律尾部,而强 superposition 下则更普遍地产生接近指数 1 的缩放;并进一步在多种真实 LLM 上测得 token输出权重向量的重叠随宽度近似 (1/m) 下降、宽度指数约 0.9,支持“大模型处于强 superposition、几何干扰驱动缩放”的解释。
arXiv,
2025-05-15T16:18:13Z.
DOI: 10.48550/arXiv.2505.10465
Yizhou Liu,
Ziming Liu,
Jeff Gore
Abstract:
The success of today's large language models (LLMs) depends on the observation that larger models perform better. However, the origin of this neural scaling law, that loss decreases as a power law with model size, remains unclear. We propose that representation superposition, meaning that LLMs represent more features than they have dimensions, can be a key contributor to loss and cause neural scaling. Based on Anthropic's toy model, we use weight decay to control the degree of superposition, allowing us to systematically study how loss scales with model size. When superposition is weak, the l… >>>
The success of today's large language models (LLMs) depends on the observation that larger models perform better. However, the origin of this neural scaling law, that loss decreases as a power law with model size, remains unclear. We propose that representation superposition, meaning that LLMs represent more features than they have dimensions, can be a key contributor to loss and cause neural scaling. Based on Anthropic's toy model, we use weight decay to control the degree of superposition, allowing us to systematically study how loss scales with model size. When superposition is weak, the loss follows a power law only if data feature frequencies are power-law distributed. In contrast, under strong superposition, the loss generically scales inversely with model dimension across a broad class of frequency distributions, due to geometric overlaps between representation vectors. We confirmed that open-sourced LLMs operate in the strong superposition regime and have loss scaling inversely with model dimension, and that the Chinchilla scaling laws are also consistent with this behavior. Our results identify representation superposition as a central driver of neural scaling laws, providing insights into questions like when neural scaling laws can be improved and when they will break down. <<<
15.
Vincent
(2025-12-31 20:29):
#paper https://arxiv.org/abs/1706.03762 arxiv 2017. Attention Is All You Need. 这篇经典论文提出了Transformer,一种全新设计的序列转换模型,完全基于注意力机制而不再使用循环神经网络(RNN)或卷积神经网络(CNN),通过自注意力(Self-Attention)和多头注意力(Multi-Head Attention)有效建模序列中不同位置之间的依赖关系,使得训练可以大规模并行化而不受序列顺序计算的限制。Transformer 采用标准的编码器-解码器架构,其中编码器和解码器都由多个注意力层与前馈网络层堆叠构成,并通过位置编码注入序列中的位置信息,从而弥补没有序列结构时丢失的顺序信息。实验结果表明,该模型在 WMT 2014 英德翻译和英法翻译任务上分别显著优于传统的循环与卷积基线模型,同时训练速度更快,展现出强大的长距离依赖建模能力,并为后续大规模语言模型与多模态 Transformer 架构奠定了基础
arXiv,
2017-06-12T17:57:34Z.
DOI: 10.48550/arXiv.1706.03762
Ashish Vaswani,
Noam Shazeer,
Niki Parmar,
Jakob Uszkoreit,
Llion Jones,
Aidan N. Gomez,
Lukasz Kaiser,
Illia Polosukhin
Abstract:
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the … >>>
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. <<<
16.
符毓
(2025-12-31 17:21):
#paper doi: 10.48550/arXiv.2512.16907, 2025, Flowing from Reasoning to Motion: Learning 3D Hand Trajectory Prediction from Egocentric Human Interaction Videos
Meta推出了 EgoMAN 数据集,这是一个大规模的以第一视角的基准数据集,用于6DoF手部轨迹预测。以及对应的预测模型,这是一个模块化的推理到运动框架,它通过轨迹标记接口和渐进式训练,将高层意图与基于物理的 6DoF 轨迹对齐。实验表明,与仅基于运动和基于VLM基线模型相比,EgoMAN 模型取得了显著优势:流匹配能够生成更平滑、更稳定的轨迹;VLM 驱动的推理提高了语义对齐和对新场景及意图的泛化能力;轨迹标记接口实现了高效的推理,将基于意图的阶段感知推理与精确的底层运动生成相结合。总而言之,EgoMAN 为实现上下文动作预测提供了一个切实可行的步骤,支持机器人操作、语言感知运动合成和意图感知辅助系统等应用。
之前数据集的一个主要瓶颈在于缺乏大规模、高质量的3D轨迹数据。部分数据集提供了准确的标注,但多样性有限;而大规模的以自我为中心的视频数据集包含丰富的真实世界交互,但轨迹噪声较大、目标导向性较弱,且缺乏时间结构。关键在于,它们缺乏明确的交互阶段,例如接近和操作,而这些阶段对于将有目的的运动与背景区分开来,以及将轨迹与意图联系起来至关重要。基于此类原始视频训练的模型通常泛化能力较差,因为缺乏意图、空间关系和运动动态之间的联系。
arXiv,
2025-12-18T18:59:01Z.
DOI: 10.48550/arXiv.2512.16907
Abstract:
Prior works on 3D hand trajectory prediction are constrained by datasets that decouple motion from semantic supervision and by models that weakly link reasoning and action. To address these, we first present the EgoMAN dataset, a large-scale egocentric dataset for interaction stage-aware 3D hand trajectory prediction with 219K 6DoF trajectories and 3M structured QA pairs for semantic, spatial, and motion reasoning. We then introduce the EgoMAN model, a reasoning-to-motion framework that links vision-language reasoning and motion generation via a trajectory-token interface. Trained progressive… >>>
Prior works on 3D hand trajectory prediction are constrained by datasets that decouple motion from semantic supervision and by models that weakly link reasoning and action. To address these, we first present the EgoMAN dataset, a large-scale egocentric dataset for interaction stage-aware 3D hand trajectory prediction with 219K 6DoF trajectories and 3M structured QA pairs for semantic, spatial, and motion reasoning. We then introduce the EgoMAN model, a reasoning-to-motion framework that links vision-language reasoning and motion generation via a trajectory-token interface. Trained progressively to align reasoning with motion dynamics, our approach yields accurate and stage-aware trajectories with generalization across real-world scenes. <<<
17.
刘昊辰
(2025-12-01 09:56):
#paper Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search. 研究团队开发出名为Ataraxos的 Stratego 超级 AI,通过自博弈强化学习与测试时搜索技术突破了该游戏海量隐藏信息的挑战,仅花费约数千美元(16 块 H100 训练 1 周 + 4 块 H100 训练 4 天,成本低于 8000 美元),便在 20 场对局中以15 胜 1 负 4 平(85% 有效胜率)击败史上最杰出的 Stratego 选手 Pim Niemeijer,且在 2025 年 Stratego 世界锦标赛演示中对普通选手取得 95% 有效胜率;其核心创新在于动态阻尼的自博弈强化学习(协调正则化强度、策略更新规模与策略强度)、分离的布局网络与移动网络(均基于 Transformer 架构),以及基于信念网络的测试时搜索,同时通过 GPU 加速模拟器(每秒约 1000 万状态更新)和数据处理优化(如 bfloat16 数据类型、零检索数据传输)实现低成本高效训练,大幅超越此前 DeepNash 等方案的性能与成本水平。下载地址:https://arxiv.org/pdf/2511.07312
arXiv,
2025-11-10T17:13:41Z.
DOI: 10.48550/arXiv.2511.07312
Samuel Sokota,
Eugene Vinitsky,
Hengyuan Hu,
J. Zico Kolter,
Gabriele Farina
Abstract:
Few classical games have been regarded as such significant benchmarks of artificial intelligence as to have justified training costs in the millions of dollars. Among these, Stratego -- a board wargame exemplifying the challenge of strategic decision making under massive amounts of hidden information -- stands apart as a case where such efforts failed to produce performance at the level of top humans. This work establishes a step change in both performance and cost for Stratego, showing that it is now possible not only to reach the level of top humans, but to achieve vastly superhuman level -… >>>
Few classical games have been regarded as such significant benchmarks of artificial intelligence as to have justified training costs in the millions of dollars. Among these, Stratego -- a board wargame exemplifying the challenge of strategic decision making under massive amounts of hidden information -- stands apart as a case where such efforts failed to produce performance at the level of top humans. This work establishes a step change in both performance and cost for Stratego, showing that it is now possible not only to reach the level of top humans, but to achieve vastly superhuman level -- and that doing so requires not an industrial budget, but merely a few thousand dollars. We achieved this result by developing general approaches for self-play reinforcement learning and test-time search under imperfect information. <<<
18.
Vincent
(2025-11-30 21:07):
#paper https://arxiv.org/abs/2104.09864 Arxiv. 2021. RoFormer: Enhanced Transformer with Rotary Position Embedding
这篇论文提出 RoFormer,一种通过旋转式位置编码(Rotary Position Embedding, RoPE)增强 Transformer 推理能力的新方法。传统 Transformer 需要依赖绝对或相对位置向量“相加”到 token 表示中,而 RoPE 另辟蹊径,通过对 query 与 key 施加与位置相关的旋转变换,使自注意力在点积阶段自然地体现相对位置信息。该方法在数学上更优雅、在实现上轻量,并具备更好的长程依赖建模能力,同时与线性注意力等高效变体完全兼容。实验结果显示,RoFormer 在多个长文本任务上均显著优于传统位置编码方案,不需要额外训练成本却能带来更强表示能力,展示出其在更大规模语言模型与复杂序列任务中的广泛应用潜力。
arXiv,
2021-04-20T09:54:06Z.
DOI: 10.48550/arXiv.2104.09864
Jianlin Su,
Yu Lu,
Shengfeng Pan,
Ahmed Murtadha,
Bo Wen,
Yunfeng Liu
Abstract:
Position encoding recently has shown effective in the transformer architecture. It enables valuable supervision for dependency modeling between elements at different positions of the sequence. In this paper, we first investigate various methods to integrate positional information into the learning process of transformer-based language models. Then, we propose a novel method named Rotary Position Embedding(RoPE) to effectively leverage the positional information. Specifically, the proposed RoPE encodes the absolute position with a rotation matrix and meanwhile incorporates the explicit relativ… >>>
Position encoding recently has shown effective in the transformer architecture. It enables valuable supervision for dependency modeling between elements at different positions of the sequence. In this paper, we first investigate various methods to integrate positional information into the learning process of transformer-based language models. Then, we propose a novel method named Rotary Position Embedding(RoPE) to effectively leverage the positional information. Specifically, the proposed RoPE encodes the absolute position with a rotation matrix and meanwhile incorporates the explicit relative position dependency in self-attention formulation. Notably, RoPE enables valuable properties, including the flexibility of sequence length, decaying inter-token dependency with increasing relative distances, and the capability of equipping the linear self-attention with relative position encoding. Finally, we evaluate the enhanced transformer with rotary position embedding, also called RoFormer, on various long text classification benchmark datasets. Our experiments show that it consistently overcomes its alternatives. Furthermore, we provide a theoretical analysis to explain some experimental results. RoFormer is already integrated into Huggingface: \url{https://huggingface.co/docs/transformers/model_doc/roformer}. <<<
19.
符毓
(2025-11-28 00:14):
#paper doi: 10.48550/arXiv.2511.21366, 2025, THybrid Control for Robotic Nut Tightening Task
本文所提出的机器人螺母紧固系统由两部分组成:1是基于运动基元的规划框架,该框架在任务空间中运行;2是混合控制器,该控制器利用感知到的交互力来更高效地执行规划轨迹中接触密集的部分。实验评估表明,与基准系统相比,该系统完成目标的速度提高了 14.5%,同时由于施加在机械臂上的接触力比基准系统小两个数量级,因此更加安全高效。
所提出系统的规划和控制组件的计算成本都很低,与运行它们的仿真软件相比,消耗的 CPU 资源可以忽略不计。
该系统对初始配置的变化表现出很高的鲁棒性,并指明了进一步改进的方向。目前存在的一个鲁棒性瓶颈在于规划框架中的回缩运动基元。规划和控制之间更紧密的耦合将缓解问题。
arXiv,
2025/11/26.
DOI: 10.48550/arXiv.2511.21366
Dmitri Kovalenko
Abstract:
An autonomous robotic nut tightening system for a serial manipulator equipped with a parallel gripper is proposed. The system features a hierarchical motion-primitive-based planner and a control-switching scheme that alternates between force and position control. Extensive simulations demonstrate the system's robustness to variance in initial conditions. Additionally, the proposed controller tightens threaded screws 14% faster than the baseline while applying 40 times less contact force on manipulands. For the benefit of the research community, the system's implementation is open-sourced.
20.
刘昊辰
(2025-11-01 14:44):
#paper Generating Creative Chess Puzzles. Google DeepMind 于 2025 年 10 月提出一种生成创意国际象棋谜题的方法,先通过基准测试多种生成式 AI 架构(如自回归 Transformer、潜在扩散模型等),再引入基于国际象棋引擎搜索统计数据的强化学习(RL)框架,设计奖励函数提升谜题的独特性、反直觉性、多样性和真实性;该 RL 方法使反直觉谜题生成率从监督学习的 0.22% 提升 10 倍至 2.5%,超过现有数据集(2.1%)和最佳 Lichess 训练模型(0.4%),生成的谜题在新颖性和多样性上达标且保留美学主题,经人类专家评估,其创意性、趣味性和反直觉性优于书籍谜题,最终形成的精选谜题手册获三位世界知名专家认可。下载地址:https://arxiv.org/pdf/2510.23881
arXiv,
2025-10-27T21:43:39Z.
DOI: 10.48550/arXiv.2510.23881
Abstract:
While Generative AI rapidly advances in various domains, generating truly creative, aesthetic, and counter-intuitive outputs remains a challenge. This paper presents an approach to tackle these difficulties in the domain of chess puzzles. We start by benchmarking Generative AI architectures, and then introduce an RL framework with novel rewards based on chess engine search statistics to overcome some of those shortcomings. The rewards are designed to enhance a puzzle's uniqueness, counter-intuitiveness, diversity, and realism. Our RL approach dramatically increases counter-intuitive puzzle ge… >>>
While Generative AI rapidly advances in various domains, generating truly creative, aesthetic, and counter-intuitive outputs remains a challenge. This paper presents an approach to tackle these difficulties in the domain of chess puzzles. We start by benchmarking Generative AI architectures, and then introduce an RL framework with novel rewards based on chess engine search statistics to overcome some of those shortcomings. The rewards are designed to enhance a puzzle's uniqueness, counter-intuitiveness, diversity, and realism. Our RL approach dramatically increases counter-intuitive puzzle generation by 10x, from 0.22\% (supervised) to 2.5\%, surpassing existing dataset rates (2.1\%) and the best Lichess-trained model (0.4\%). Our puzzles meet novelty and diversity benchmarks, retain aesthetic themes, and are rated by human experts as more creative, enjoyable, and counter-intuitive than composed book puzzles, even approaching classic compositions. Our final outcome is a curated booklet of these AI-generated puzzles, which is acknowledged for creativity by three world-renowned experts. <<<