来自杂志 arXiv 的文献。
当前共找到 157 篇文献分享,本页显示第 1 - 20 篇。
1.
刘昊辰 (2026-01-04 09:37):
#paper Collapsi is strongly solved. 2025年6月由Mark S. Ball发布的两人完全信息游戏Collapsi,在16张牌(含4张A、4张2、4张3、2张4、2张Joker)组成的4×4环形棋盘上进行,玩家轮流依据所在牌面数值移动棋子,移动后起始牌翻面,无合法移动者输;Michael Young通过对称破缺将初始16!(约2.1×10¹³)种牌局简化,用带α-β剪枝的极小极大搜索算法开发求解器,20毫秒内可找最优移动,在13代Intel Core i5-13500处理器上耗时7小时29分钟完成47,297,250种等效牌局分析,发现先手(红方)仅37.5%牌局可必赢,后手(蓝方)62.5%牌局可必赢,游戏最短必赢步数为7回合,6.4%牌局中败方能将游戏拖至最大14回合,最终证明该游戏被强解。下载地址:https://arxiv.org/pdf/2507.16823
arXiv, 4 Jul 2025. DOI: 10.48550/arXiv.2507.16823
Abstract: No abstract available.
2.
林海onrush (2025-12-31 21:49):
#paper, Superposition Yields Robust Neural Scaling, DOI: 10.48550/arXiv.2505.10465. NIPS2025的亚军论文奖,MIT物理团队出身的AI工作,这篇论文提出:神经网络的幂律缩放(模型越宽/维度越大,loss 越低)可能主要源自表示层的“叠加/超位置(superposition)”机制——当需要表示的特征数远大于隐藏维度时,模型会把许多特征压进同一组维度里,导致表示向量之间的重叠干扰;随着维度 (m) 增大,随机几何使这种重叠的平均强度自然按 (~ 1/m) 下降,从而产生鲁棒的 (L∝ 1/m) 幂律缩放。作者用可控的 toy model 对比了弱与强 superposition:弱 superposition 下缩放更依赖数据特征频率的幂律尾部,而强 superposition 下则更普遍地产生接近指数 1 的缩放;并进一步在多种真实 LLM 上测得 token输出权重向量的重叠随宽度近似 (1/m) 下降、宽度指数约 0.9,支持“大模型处于强 superposition、几何干扰驱动缩放”的解释。
arXiv, 2025-05-15T16:18:13Z. DOI: 10.48550/arXiv.2505.10465
Abstract:
The success of today's large language models (LLMs) depends on the observation that larger models perform better. However, the origin of this neural scaling law, that loss decreases as a … >>>
The success of today's large language models (LLMs) depends on the observation that larger models perform better. However, the origin of this neural scaling law, that loss decreases as a power law with model size, remains unclear. We propose that representation superposition, meaning that LLMs represent more features than they have dimensions, can be a key contributor to loss and cause neural scaling. Based on Anthropic's toy model, we use weight decay to control the degree of superposition, allowing us to systematically study how loss scales with model size. When superposition is weak, the loss follows a power law only if data feature frequencies are power-law distributed. In contrast, under strong superposition, the loss generically scales inversely with model dimension across a broad class of frequency distributions, due to geometric overlaps between representation vectors. We confirmed that open-sourced LLMs operate in the strong superposition regime and have loss scaling inversely with model dimension, and that the Chinchilla scaling laws are also consistent with this behavior. Our results identify representation superposition as a central driver of neural scaling laws, providing insights into questions like when neural scaling laws can be improved and when they will break down. <<<
翻译
3.
Vincent (2025-12-31 20:29):
#paper https://arxiv.org/abs/1706.03762 arxiv 2017. Attention Is All You Need. 这篇经典论文提出了Transformer,一种全新设计的序列转换模型,完全基于注意力机制而不再使用循环神经网络(RNN)或卷积神经网络(CNN),通过自注意力(Self-Attention)和多头注意力(Multi-Head Attention)有效建模序列中不同位置之间的依赖关系,使得训练可以大规模并行化而不受序列顺序计算的限制。Transformer 采用标准的编码器-解码器架构,其中编码器和解码器都由多个注意力层与前馈网络层堆叠构成,并通过位置编码注入序列中的位置信息,从而弥补没有序列结构时丢失的顺序信息。实验结果表明,该模型在 WMT 2014 英德翻译和英法翻译任务上分别显著优于传统的循环与卷积基线模型,同时训练速度更快,展现出强大的长距离依赖建模能力,并为后续大规模语言模型与多模态 Transformer 架构奠定了基础
arXiv, 2017-06-12T17:57:34Z. DOI: 10.48550/arXiv.1706.03762
Abstract:
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an … >>>
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. <<<
翻译
4.
符毓 (2025-12-31 17:21):
#paper doi: 10.48550/arXiv.2512.16907, 2025, Flowing from Reasoning to Motion: Learning 3D Hand Trajectory Prediction from Egocentric Human Interaction Videos Meta推出了 EgoMAN 数据集,这是一个大规模的以第一视角的基准数据集,用于6DoF手部轨迹预测。以及对应的预测模型,这是一个模块化的推理到运动框架,它通过轨迹标记接口和渐进式训练,将高层意图与基于物理的 6DoF 轨迹对齐。实验表明,与仅基于运动和基于VLM基线模型相比,EgoMAN 模型取得了显著优势:流匹配能够生成更平滑、更稳定的轨迹;VLM 驱动的推理提高了语义对齐和对新场景及意图的泛化能力;轨迹标记接口实现了高效的推理,将基于意图的阶段感知推理与精确的底层运动生成相结合。总而言之,EgoMAN 为实现上下文动作预测提供了一个切实可行的步骤,支持机器人操作、语言感知运动合成和意图感知辅助系统等应用。 之前数据集的一个主要瓶颈在于缺乏大规模、高质量的3D轨迹数据。部分数据集提供了准确的标注,但多样性有限;而大规模的以自我为中心的视频数据集包含丰富的真实世界交互,但轨迹噪声较大、目标导向性较弱,且缺乏时间结构。关键在于,它们缺乏明确的交互阶段,例如接近和操作,而这些阶段对于将有目的的运动与背景区分开来,以及将轨迹与意图联系起来至关重要。基于此类原始视频训练的模型通常泛化能力较差,因为缺乏意图、空间关系和运动动态之间的联系。
arXiv, 2025-12-18T18:59:01Z. DOI: 10.48550/arXiv.2512.16907
Abstract:
Prior works on 3D hand trajectory prediction are constrained by datasets that decouple motion from semantic supervision and by models that weakly link reasoning and action. To address these, we … >>>
Prior works on 3D hand trajectory prediction are constrained by datasets that decouple motion from semantic supervision and by models that weakly link reasoning and action. To address these, we first present the EgoMAN dataset, a large-scale egocentric dataset for interaction stage-aware 3D hand trajectory prediction with 219K 6DoF trajectories and 3M structured QA pairs for semantic, spatial, and motion reasoning. We then introduce the EgoMAN model, a reasoning-to-motion framework that links vision-language reasoning and motion generation via a trajectory-token interface. Trained progressively to align reasoning with motion dynamics, our approach yields accurate and stage-aware trajectories with generalization across real-world scenes. <<<
翻译
5.
刘昊辰 (2025-12-01 09:56):
#paper Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search. 研究团队开发出名为Ataraxos的 Stratego 超级 AI,通过自博弈强化学习与测试时搜索技术突破了该游戏海量隐藏信息的挑战,仅花费约数千美元(16 块 H100 训练 1 周 + 4 块 H100 训练 4 天,成本低于 8000 美元),便在 20 场对局中以15 胜 1 负 4 平(85% 有效胜率)击败史上最杰出的 Stratego 选手 Pim Niemeijer,且在 2025 年 Stratego 世界锦标赛演示中对普通选手取得 95% 有效胜率;其核心创新在于动态阻尼的自博弈强化学习(协调正则化强度、策略更新规模与策略强度)、分离的布局网络与移动网络(均基于 Transformer 架构),以及基于信念网络的测试时搜索,同时通过 GPU 加速模拟器(每秒约 1000 万状态更新)和数据处理优化(如 bfloat16 数据类型、零检索数据传输)实现低成本高效训练,大幅超越此前 DeepNash 等方案的性能与成本水平。下载地址:https://arxiv.org/pdf/2511.07312
arXiv, 2025-11-10T17:13:41Z. DOI: 10.48550/arXiv.2511.07312
Abstract:
Few classical games have been regarded as such significant benchmarks of artificial intelligence as to have justified training costs in the millions of dollars. Among these, Stratego -- a board … >>>
Few classical games have been regarded as such significant benchmarks of artificial intelligence as to have justified training costs in the millions of dollars. Among these, Stratego -- a board wargame exemplifying the challenge of strategic decision making under massive amounts of hidden information -- stands apart as a case where such efforts failed to produce performance at the level of top humans. This work establishes a step change in both performance and cost for Stratego, showing that it is now possible not only to reach the level of top humans, but to achieve vastly superhuman level -- and that doing so requires not an industrial budget, but merely a few thousand dollars. We achieved this result by developing general approaches for self-play reinforcement learning and test-time search under imperfect information. <<<
翻译
6.
Vincent (2025-11-30 21:07):
#paper https://arxiv.org/abs/2104.09864 Arxiv. 2021. RoFormer: Enhanced Transformer with Rotary Position Embedding 这篇论文提出 RoFormer,一种通过旋转式位置编码(Rotary Position Embedding, RoPE)增强 Transformer 推理能力的新方法。传统 Transformer 需要依赖绝对或相对位置向量“相加”到 token 表示中,而 RoPE 另辟蹊径,通过对 query 与 key 施加与位置相关的旋转变换,使自注意力在点积阶段自然地体现相对位置信息。该方法在数学上更优雅、在实现上轻量,并具备更好的长程依赖建模能力,同时与线性注意力等高效变体完全兼容。实验结果显示,RoFormer 在多个长文本任务上均显著优于传统位置编码方案,不需要额外训练成本却能带来更强表示能力,展示出其在更大规模语言模型与复杂序列任务中的广泛应用潜力。
arXiv, 2021-04-20T09:54:06Z. DOI: 10.48550/arXiv.2104.09864
Abstract:
Position encoding recently has shown effective in the transformer architecture. It enables valuable supervision for dependency modeling between elements at different positions of the sequence. In this paper, we first … >>>
Position encoding recently has shown effective in the transformer architecture. It enables valuable supervision for dependency modeling between elements at different positions of the sequence. In this paper, we first investigate various methods to integrate positional information into the learning process of transformer-based language models. Then, we propose a novel method named Rotary Position Embedding(RoPE) to effectively leverage the positional information. Specifically, the proposed RoPE encodes the absolute position with a rotation matrix and meanwhile incorporates the explicit relative position dependency in self-attention formulation. Notably, RoPE enables valuable properties, including the flexibility of sequence length, decaying inter-token dependency with increasing relative distances, and the capability of equipping the linear self-attention with relative position encoding. Finally, we evaluate the enhanced transformer with rotary position embedding, also called RoFormer, on various long text classification benchmark datasets. Our experiments show that it consistently overcomes its alternatives. Furthermore, we provide a theoretical analysis to explain some experimental results. RoFormer is already integrated into Huggingface: \url{https://huggingface.co/docs/transformers/model_doc/roformer}. <<<
翻译
7.
符毓 (2025-11-28 00:14):
#paper doi: 10.48550/arXiv.2511.21366, 2025, THybrid Control for Robotic Nut Tightening Task 本文所提出的机器人螺母紧固系统由两部分组成:1是基于运动基元的规划框架,该框架在任务空间中运行;2是混合控制器,该控制器利用感知到的交互力来更高效地执行规划轨迹中接触密集的部分。实验评估表明,与基准系统相比,该系统完成目标的速度提高了 14.5%,同时由于施加在机械臂上的接触力比基准系统小两个数量级,因此更加安全高效。 所提出系统的规划和控制组件的计算成本都很低,与运行它们的仿真软件相比,消耗的 CPU 资源可以忽略不计。 该系统对初始配置的变化表现出很高的鲁棒性,并指明了进一步改进的方向。目前存在的一个鲁棒性瓶颈在于规划框架中的回缩运动基元。规划和控制之间更紧密的耦合将缓解问题。
arXiv, 2025/11/26. DOI: 10.48550/arXiv.2511.21366
Abstract:
An autonomous robotic nut tightening system for a serial manipulator equipped with a parallel gripper is proposed. The system features a hierarchical motion-primitive-based planner and a control-switching scheme that alternates … >>>
An autonomous robotic nut tightening system for a serial manipulator equipped with a parallel gripper is proposed. The system features a hierarchical motion-primitive-based planner and a control-switching scheme that alternates between force and position control. Extensive simulations demonstrate the system's robustness to variance in initial conditions. Additionally, the proposed controller tightens threaded screws 14% faster than the baseline while applying 40 times less contact force on manipulands. For the benefit of the research community, the system's implementation is open-sourced. <<<
翻译
8.
刘昊辰 (2025-11-01 14:44):
#paper Generating Creative Chess Puzzles. Google DeepMind 于 2025 年 10 月提出一种生成创意国际象棋谜题的方法,先通过基准测试多种生成式 AI 架构(如自回归 Transformer、潜在扩散模型等),再引入基于国际象棋引擎搜索统计数据的强化学习(RL)框架,设计奖励函数提升谜题的独特性、反直觉性、多样性和真实性;该 RL 方法使反直觉谜题生成率从监督学习的 0.22% 提升 10 倍至 2.5%,超过现有数据集(2.1%)和最佳 Lichess 训练模型(0.4%),生成的谜题在新颖性和多样性上达标且保留美学主题,经人类专家评估,其创意性、趣味性和反直觉性优于书籍谜题,最终形成的精选谜题手册获三位世界知名专家认可。下载地址:https://arxiv.org/pdf/2510.23881
arXiv, 2025-10-27T21:43:39Z. DOI: 10.48550/arXiv.2510.23881
Abstract:
While Generative AI rapidly advances in various domains, generating truly creative, aesthetic, and counter-intuitive outputs remains a challenge. This paper presents an approach to tackle these difficulties in the domain … >>>
While Generative AI rapidly advances in various domains, generating truly creative, aesthetic, and counter-intuitive outputs remains a challenge. This paper presents an approach to tackle these difficulties in the domain of chess puzzles. We start by benchmarking Generative AI architectures, and then introduce an RL framework with novel rewards based on chess engine search statistics to overcome some of those shortcomings. The rewards are designed to enhance a puzzle's uniqueness, counter-intuitiveness, diversity, and realism. Our RL approach dramatically increases counter-intuitive puzzle generation by 10x, from 0.22\% (supervised) to 2.5\%, surpassing existing dataset rates (2.1\%) and the best Lichess-trained model (0.4\%). Our puzzles meet novelty and diversity benchmarks, retain aesthetic themes, and are rated by human experts as more creative, enjoyable, and counter-intuitive than composed book puzzles, even approaching classic compositions. Our final outcome is a curated booklet of these AI-generated puzzles, which is acknowledged for creativity by three world-renowned experts. <<<
翻译
9.
林海onrush (2025-10-31 23:18):
#paper, PALQO: Physics-informed Model for Accelerating Large-scale Quantum Optimization,DOI:10.48550/arXiv.2509.20733。这篇论文提出了 PALQO,一种基于物理约束神经网络(PINN)的新方法用于加速大规模变分量子算法(VQAs)的训练。作者将 VQA 的参数更新过程重新表述为非线性偏微分方程(PDE)问题,并利用 PINN 在经典计算机上学习优化动力学,仅需少量量子测量数据即可预测后续参数更新,从而显著减少量子设备调用。理论分析表明,PALQO 具有良好的泛化性能,其所需训练样本数量随参数规模多项式增长。 在横场 Ising 模型、Heisenberg 模型及分子体系(如 LiH、BeH₂)等任务上的实验显示,PALQO 能在保持能量精度(误差约 (10^{-3}))的同时,将量子测量开销降低约90%,实现最高30倍加速。该方法在多体系统和量子化学计算中表现出良好的可扩展性,为在受限量子资源条件下推进大规模量子优化提供了新的思路。
arXiv, 2025-09-25T04:26:02Z. DOI: 10.48550/arXiv.2509.20733
Abstract:
Variational quantum algorithms (VQAs) are leading strategies to reachpractical utilities of near-term quantum devices. However, the no-cloningtheorem in quantum mechanics precludes standard backpropagation, leading toprohibitive quantum resource costs when applying … >>>
Variational quantum algorithms (VQAs) are leading strategies to reachpractical utilities of near-term quantum devices. However, the no-cloningtheorem in quantum mechanics precludes standard backpropagation, leading toprohibitive quantum resource costs when applying VQAs to large-scale tasks. Toaddress this challenge, we reformulate the training dynamics of VQAs as anonlinear partial differential equation and propose a novel protocol thatleverages physics-informed neural networks (PINNs) to model this dynamicalsystem efficiently. Given a small amount of training trajectory data collectedfrom quantum devices, our protocol predicts the parameter updates of VQAs overmultiple iterations on the classical side, dramatically reducing quantumresource costs. Through systematic numerical experiments, we demonstrate thatour method achieves up to a 30x speedup compared to conventional methods andreduces quantum resource costs by as much as 90\% for tasks involving up to 40qubits, including ground state preparation of different quantum systems, whilemaintaining competitive accuracy. Our approach complements existing techniquesaimed at improving the efficiency of VQAs and further strengthens theirpotential for practical applications. <<<
翻译
10.
符毓 (2025-10-31 22:50):
#paper doi: 10.48550/arXiv.2510.10903, 2025, Towards a Unified Understanding of Robot Manipulation: A Comprehensive Survey 一篇全面涵盖机器人操作领域的全景视角综述。超 1000 篇参考系统地梳理了机器人操作领域的全景图谱,涵盖硬件与控制基础、任务与数据体系、高低层控制框架,以及跨本体与跨模态的泛化研究,并提出了一个统一的理解框架,揭示机器人如何从“执行任务”走向“理解与学习任务”。
arXiv, 2025-10-13T01:59:27Z. DOI: 10.48550/arXiv.2510.10903
Abstract:
Embodied intelligence has witnessed remarkable progress in recent years,driven by advances in computer vision, natural language processing, and therise of large-scale multimodal models. Among its core challenges, robotmanipulation stands out … >>>
Embodied intelligence has witnessed remarkable progress in recent years,driven by advances in computer vision, natural language processing, and therise of large-scale multimodal models. Among its core challenges, robotmanipulation stands out as a fundamental yet intricate problem, requiring theseamless integration of perception, planning, and control to enable interactionwithin diverse and unstructured environments. This survey presents acomprehensive overview of robotic manipulation, encompassing foundationalbackground, task-organized benchmarks and datasets, and a unified taxonomy ofexisting methods. We extend the classical division between high-level planningand low-level control by broadening high-level planning to include language,code, motion, affordance, and 3D representations, while introducing a newtaxonomy of low-level learning-based control grounded in training paradigmssuch as input modeling, latent learning, and policy learning. Furthermore, weprovide the first dedicated taxonomy of key bottlenecks, focusing on datacollection, utilization, and generalization, and conclude with an extensivereview of real-world applications. Compared with prior surveys, our work offersboth a broader scope and deeper insight, serving as an accessible roadmap fornewcomers and a structured reference for experienced researchers. All relatedresources, including research papers, open-source datasets, and projects, arecurated for the community athttps://github.com/BaiShuanghao/Awesome-Robotics-Manipulation. <<<
翻译
11.
Vincent (2025-10-31 16:28):
#paper https://doi.org/10.48550/arXiv.2510.14901 Arxiv. 2025. Reasoning with Sampling: Your Base Model is Smarter Than You Think. 大语言模型(LLM)+ 强化学习(RL)在众多领域展现出了强大的推理能力,以往研究多集中于探讨强化学习如何赋予基础模型其原本不具备的能力。这篇文章另辟蹊径,提出一个发人深省的问题:是否仅通过采样,而非额外训练,就能让基础模型展现出与强化学习策略相当的推理能力?这篇文章基于模型自身的似然值,提出了一种简单的基于马尔可夫蒙特卡罗(MCMC)的迭代采样方法。实验结果显示,该方法在多种基础模型上均取得了与强化学习算法相当甚至更优的表现。更为重要的是,这一方法避免了强化学习中常见的多样性缺失问题,且无需额外数据或者训练,展现出其在不同领域中的广泛应用潜力
arXiv, 2025-10-16T17:18:11Z. DOI: 10.48550/arXiv.2510.14901
Abstract:
Frontier reasoning models have exhibited incredible capabilities across awide array of disciplines, driven by posttraining large language models (LLMs)with reinforcement learning (RL). However, despite the widespread success ofthis paradigm, much … >>>
Frontier reasoning models have exhibited incredible capabilities across awide array of disciplines, driven by posttraining large language models (LLMs)with reinforcement learning (RL). However, despite the widespread success ofthis paradigm, much of the literature has been devoted to disentangling trulynovel behaviors that emerge during RL but are not present in the base models.In our work, we approach this question from a different angle, instead askingwhether comparable reasoning capabilites can be elicited from base models atinference time by pure sampling, without any additional training. Inspired byMarkov chain Monte Carlo (MCMC) techniques for sampling from sharpeneddistributions, we propose a simple iterative sampling algorithm leveraging thebase models' own likelihoods. Over different base models, we show that ouralgorithm offers substantial boosts in reasoning that nearly match and evenoutperform those from RL on a wide variety of single-shot tasks, includingMATH500, HumanEval, and GPQA. Moreover, our sampler avoids the collapse indiversity over multiple samples that is characteristic of RL-posttraining.Crucially, our method does not require training, curated datasets, or averifier, suggesting broad applicability beyond easily verifiable domains. <<<
翻译
12.
刘昊辰 (2025-10-27 14:21):
#paper Strongly Solving 2048 4×3. 本文由日本东京大学研究者提出,成功强解了 2048 游戏的 4×3 变体(2048₄ₓ₃),核心关键技术是基于 ”年龄(age)”(定义为棋盘上所有方块数字之和)对状态空间进行划分 —— 状态与后续动作后的过渡态(afterstate)年龄保持不变,过渡态到新状态时年龄因新增方块(2 或 4)增加 2 或 4,据此可分阶段枚举状态并控制内存占用;同时采用Elias-Fano 编码实现状态的紧凑存储,将约 4.4TiB 的原始存储需求压缩至 1.4TiB(最优玩法专用存储仅需 300GiB)。研究结果显示,最常见初始状态(两个 2 方块,年龄 4)的最优策略期望得分为50724.26,可到达状态数与过渡态数分别为1.15×10¹²和7.40×10¹¹,且验证了 “生成大数字方块(如 2048)时难度显著提升” 等玩家直觉。下载地址:https://arxiv.org/pdf/2510.04580
arXiv, 2025-10-06T08:31:59Z. DOI: 10.48550/arXiv.2510.04580
Abstract:
2048 is a stochastic single-player game involving 16 cells on a 4 by 4 grid,where a player chooses a direction among up, down, left, and right to obtain ascore by … >>>
2048 is a stochastic single-player game involving 16 cells on a 4 by 4 grid,where a player chooses a direction among up, down, left, and right to obtain ascore by merging two tiles with the same number located in neighboring cellsalong the chosen direction. This paper presents that a variant 2048-4x3 12cells on a 4 by 3 board, one row smaller than the original, has been stronglysolved. In this variant, the expected score achieved by an optimal strategy isabout $50724.26$ for the most common initial states: ones with two tiles ofnumber 2. The numbers of reachable states and afterstates are identified to be$1,152,817,492,752$ and $739,648,886,170$, respectively. The key technique isto partition state space by the sum of tile numbers on a board, which we callthe age of a state. An age is invariant between a state and its successiveafterstate after any valid action and is increased two or four by stochasticresponse from the environment. Therefore, we can partition state space by agesand enumerate all (after)states of an age depending only on states with therecent ages. Similarly, we can identify (after)state values by going along withages in decreasing order. <<<
翻译
13.
符毓 (2025-09-30 23:42):
#paper doi: 10.48550/arXiv.2509.13311, 2025, Towards General Agentic Intelligence via Environment Scaling. 以往训练这类“代理智能”的主要瓶颈在于缺乏高质量、大规模、多样化的交互数据。人工标注成本极高,而单纯用模型生成的数据又往往不够真实或难以验证。这篇由阿里巴巴通义实验室团队发表的论文(通过环境扩展迈向通用代理智能)提出了一条全新的路径:通过程序化、自动化地构建海量、异构、可验证的模拟环境,让语言模型能在其中自主交互、收集经验、学习成长。基于该方法训练的AgentScaler模型系列,仅用数十亿参数就在多项权威测试中达到了与万亿级模型或闭源商业系统媲美的性能,为高效、轻量级代理智能的发展打开了新的可能性。
arXiv, 2025-09-16T17:57:20Z. DOI: 10.48550/arXiv.2509.13311
Abstract:
Advanced agentic intelligence is a prerequisite for deploying Large LanguageModels in practical, real-world applications. Diverse real-world APIs demandprecise, robust function-calling intelligence, which needs agents to developthese capabilities through interaction in … >>>
Advanced agentic intelligence is a prerequisite for deploying Large LanguageModels in practical, real-world applications. Diverse real-world APIs demandprecise, robust function-calling intelligence, which needs agents to developthese capabilities through interaction in varied environments. The breadth offunction-calling competence is closely tied to the diversity of environments inwhich agents are trained. In this work, we scale up environments as a steptowards advancing general agentic intelligence. This gives rise to two centralchallenges: (i) how to scale environments in a principled manner, and (ii) howto effectively train agentic capabilities from experiences derived throughinteractions with these environments. To address these, we design a scalableframework that automatically constructs heterogeneous environments that arefully simulated, systematically broadening the space of function-callingscenarios. We further adapt a two-phase agent fine-tuning strategy: firstendowing agents with fundamental agentic capabilities, then specializing themfor domain-specific contexts. Extensive experiments on agentic benchmarks,tau-bench, tau2-Bench, and ACEBench, demonstrate that our trained model,AgentScaler, significantly enhances the function-calling capability of models. <<<
翻译
14.
尹志 (2025-09-30 22:39):
#paper Quantum computing and artificial intelligence: status and perspectives. doi: 10.48550/arXiv.2505.23860 比较新的一篇QAI的综述。比较细致的介绍了Quantum for AI及AI for Quantum,还有基础问题。最后介绍了一些目前这个领域所遇到的挑战。有两个特点值得一提,一个就是确实很新,目前基本的QAI的问题都有涉及;第二个就是这是一个全欧洲阵容的研究人员写的QAI综述,文章的开头就明确了自己的位置,这点还是很耐人寻味的。
arXiv, 2025-05-29T08:15:23Z. DOI: 10.48550/arXiv.2505.23860
Abstract:
This white paper discusses and explores the various points of intersectionbetween quantum computing and artificial intelligence (AI). It describes howquantum computing could support the development of innovative AI solutions. Italso … >>>
This white paper discusses and explores the various points of intersectionbetween quantum computing and artificial intelligence (AI). It describes howquantum computing could support the development of innovative AI solutions. Italso examines use cases of classical AI that can empower research anddevelopment in quantum technologies, with a focus on quantum computing andquantum sensing. The purpose of this white paper is to provide a long-termresearch agenda aimed at addressing foundational questions about how AI andquantum computing interact and benefit one another. It concludes with a set ofrecommendations and challenges, including how to orchestrate the proposedtheoretical work, align quantum AI developments with quantum hardware roadmaps,estimate both classical and quantum resources - especially with the goal ofmitigating and optimizing energy consumption - advance this emerging hybridsoftware engineering discipline, and enhance European industrialcompetitiveness while considering societal implications. <<<
翻译
15.
刘昊辰 (2025-09-08 15:13):
#paper Gemini 2.5 Pro Capable of Winning Gold at IMO 2025 研究团队通过构建自验证流程(含初始解题、自我改进、验证纠错等步骤)并优化提示词设计,利用 Google 的Gemini 2.5 Pro 模型在 2025 年国际数学奥林匹克竞赛(IMO 2025)的 6 道题目中成功解出 5 道,且为避免数据污染仅使用最新发布的 IMO 2025 题目作为测试集;研究还对比了带提示(如数学归纳法、解析几何)与无提示解题的效果,发现提示主要提升效率而非创造新能力,同时指出模型在第 6 题中因错误假设导致解题失败,最终证实强大 LLM 结合合理策略可实现高水平数学推理,接近人类金牌水平。下载地址:https://arxiv.org/pdf/2507.15855
arXiv, 2025-07-21T17:59:49Z. DOI: 10.48550/arXiv.2507.15855
Abstract:
The International Mathematical Olympiad (IMO) poses uniquely challengingproblems requiring deep insight, creativity, and formal reasoning. While LargeLanguage Models (LLMs) perform well on mathematical benchmarks like AIME, theystruggle with Olympiad-level tasks. … >>>
The International Mathematical Olympiad (IMO) poses uniquely challengingproblems requiring deep insight, creativity, and formal reasoning. While LargeLanguage Models (LLMs) perform well on mathematical benchmarks like AIME, theystruggle with Olympiad-level tasks. We use Google's Gemini 2.5 Pro on the newlyreleased IMO 2025 problems, avoiding data contamination. Using aself-verification pipeline with careful prompt design, 5 (out of 6) problemsare solved correctly. This result underscores the importance of developingoptimal strategies to harness the full potential of powerful LLMs for complexreasoning tasks. <<<
翻译
16.
符毓 (2025-08-31 23:27):
#paper doi: 10.48550/arXiv.2507.21046, 2025, A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence. 本综述首次系统而全面地回顾了自演化的智能体,并围绕三个基本维度:演化什么、何时演化以及如何演化进行了梳理。大型语言模型 (LLM) 其本质上仍处于静态,无法调整其内部参数以适应新任务、不断发展的知识领域或动态交互环境。随着 LLM 越来越多地部署在开放式交互式环境中,这种静态特性已成为关键瓶颈。本文研究了跨代理组件(例如模型、内存、工具、架构)的演化机制,按阶段(例如测试内、测试间)对适应方法进行分类,并分析指导演化适应的算法和架构设计(例如标量奖励、文本反馈、单代理和多代理系统)。
arXiv, 2025/8/1.
Abstract:
Large Language Models (LLMs) have demonstrated strong capabilities but remain fundamentally static, unable to adapt their internal parameters to novel tasks, evolving knowledge domains, or dynamic interaction contexts. As LLMs … >>>
Large Language Models (LLMs) have demonstrated strong capabilities but remain fundamentally static, unable to adapt their internal parameters to novel tasks, evolving knowledge domains, or dynamic interaction contexts. As LLMs are increasingly deployed in open-ended, interactive environments, this static nature has become a critical bottleneck, necessitating agents that can adaptively reason, act, and evolve in real time. This paradigm shift -- from scaling static models to developing self-evolving agents -- has sparked growing interest in architectures and methods enabling continual learning and adaptation from data, interactions, and experiences. This survey provides the first systematic and comprehensive review of self-evolving agents, organized around three foundational dimensions -- what to evolve, when to evolve, and how to evolve. We examine evolutionary mechanisms across agent components (e.g., models, memory, tools, architecture), categorize adaptation methods by stages (e.g., intra-test-time, inter-test-time), and analyze the algorithmic and architectural designs that guide evolutionary adaptation (e.g., scalar rewards, textual feedback, single-agent and multi-agent systems). Additionally, we analyze evaluation metrics and benchmarks tailored for self-evolving agents, highlight applications in domains such as coding, education, and healthcare, and identify critical challenges and research directions in safety, scalability, and co-evolutionary dynamics. By providing a structured framework for understanding and designing self-evolving agents, this survey establishes a roadmap for advancing adaptive agentic systems in both research and real-world deployments, ultimately shedding lights to pave the way for the realization of Artificial Super Intelligence (ASI), where agents evolve autonomously, performing at or beyond human-level intelligence across a wide array of tasks. <<<
翻译
17.
尹志 (2025-08-31 12:56):
#paper doi:10.48550/arXiv.2505.13683, ISCA, 2025, Genesis: A Compiler Framework for Hamiltonian Simulation on Hybrid CV-DV Quantum Computers. 作者引入了第一个基于连续离散混合量子计算系统的针对哈密顿量模拟的量子编译框架,非常有意思的工作。该框架分为哈密顿量初步分解和进一步的mapping和routing。也在几个常见的 物理模型上做了验证。量子编译作为量子计算机的一个重要环节,值得更多关注和技术的突破。
arXiv, 2025-05-19T19:32:06Z. DOI: 10.48550/arXiv.2505.13683
Abstract:
This paper introduces Genesis, the first compiler designed to supportHamiltonian Simulation on hybrid continuous-variable (CV) and discrete-variable(DV) quantum computing systems. Genesis is a two-level compilation system. Atthe first level, it … >>>
This paper introduces Genesis, the first compiler designed to supportHamiltonian Simulation on hybrid continuous-variable (CV) and discrete-variable(DV) quantum computing systems. Genesis is a two-level compilation system. Atthe first level, it decomposes an input Hamiltonian into basis gates using thenative instruction set of the target hybrid CV-DV quantum computer. At thesecond level, it tackles the mapping and routing of qumodes/qubits to implementlong-range interactions for the gates decomposed from the first level. Ratherthan a typical implementation that relies on SWAP primitives similar toqubit-based (or DV-only) systems, we propose an integrated design ofconnectivity-aware gate synthesis and beamsplitter SWAP insertion tailored forhybrid CV-DV systems. We also introduce an OpenQASM-like domain-specificlanguage (DSL) named CVDV-QASM to represent Hamiltonian in terms ofPauli-exponentials and basic gate sequences from the hybrid CV-DV gate set.Genesis has successfully compiled several important Hamiltonians, including theBose-Hubbard model, $\mathbb{Z}_2-$Higgs model, Hubbard-Holstein model,Heisenberg model and Electron-vibration coupling Hamiltonians, which arecritical in domains like quantum field theory, condensed matter physics, andquantum chemistry. Our implementation is available atGenesis-CVDV-Compiler(https://github.com/ruadapt/Genesis-CVDV-Compiler). <<<
翻译
18.
刘昊辰 (2025-08-19 13:25):
#paper Search-contempt a hybrid MCTS algorithm for training AlphaZero-like engines with better computational efficiency提出search-contempt,一种结合PUCT与Thompson Sampling(TS) 的混合 MCTS 算法,通过新参数Nscl调控自对弈中生成的棋局分布,偏好 “挑战性” 局面。在常规国际象棋中,其生成的训练棋局质量更高,使引擎强度提升约70 Elo,且训练所需棋局数量从数千万减少至数十万,计算成本从数千万美元降至数万美元;在Odds Chess(一方开局劣势)中,强度提升约150 Elo,同时增强系统对抗鲁棒性,有望在消费级 GPU 上实现从零训练。下载地址:https://arxiv.org/pdf/2504.07757
19.
尹志 (2025-07-31 23:59):
#paper doi: 10.48550/arXiv.2507.06216 Unitary designs in nearly optimal depth. 文章设计了一种全新的量子电路,该电路可以接近理论最优深度高效构建unitray k-designs. 如果这个方案足够有效,那么对后续的量子算法的设计无疑非常有帮助。
arXiv, 2025-07-08T17:48:33Z. DOI: 10.48550/arXiv.2507.06216
Abstract:
We construct $\varepsilon$-approximate unitary $k$-designs on $n$ qubits incircuit depth $O(\log k \log \log n k / \varepsilon)$. The depth isexponentially improved over all known results in all three parameters … >>>
We construct $\varepsilon$-approximate unitary $k$-designs on $n$ qubits incircuit depth $O(\log k \log \log n k / \varepsilon)$. The depth isexponentially improved over all known results in all three parameters $n$, $k$,$\varepsilon$. We further show that each dependence is optimal up toexponentially smaller factors. Our construction uses $\tilde{{O}}(nk)$ ancillaqubits and ${O}(nk)$ bits of randomness, which are also optimal up to $\log(nk)$ factors. An alternative construction achieves a smaller ancilla count$\tilde{{O}}(n)$ with circuit depth ${O}(k \log \log nk/\varepsilon)$. Toachieve these efficient unitary designs, we introduce a highly-structuredrandom unitary ensemble that leverages long-range two-qubit gates and low-depthimplementations of random classical hash functions. We also develop a newanalytical framework for bounding errors in quantum experiments involving manyqueries to random unitaries. As an illustration of this framework'sversatility, we provide a succinct alternative proof of the existence ofpseudorandom unitaries. <<<
翻译
20.
林海onrush (2025-07-31 23:19):
#paper, 《Efficient Qudit Circuit for Quench Dynamics of 2+1D Quantum Link Electrodynamics》,10.48550/arXiv.2507.12589 , 本研究提出了一种基于多能级量子比特(qudit)的高效量子电路框架,用于模拟2+1维U(1)格点规范电动力学的淬灭动力学。通过利用高斯定律积分出物质场,仅保留规范自由度,作者构建了无需辅助qubit的紧凑电路设计,并通过数值模拟验证其在现实噪声下仍能保持高度相干的动态演化表现。 该方法不仅大幅降低了量子资源消耗,还适用于任意自旋表示和更高维度格点系统,具备良好的可扩展性。相比传统qubit编码,qudit实现更贴近硬件特性,适用于当前和近期的量子处理器,为模拟高能物理非平衡现象提供了一条切实可行的量子计算路径。
arXiv, 2025-07-16T19:16:49Z. DOI: 10.48550/arXiv.2507.12589
Abstract:
A major challenge in the burgeoning field of quantum simulation forhigh-energy physics is the realization of scalable $2+1$D lattice gaugetheories on state-of-the-art quantum hardware, which is an essential steptowards the … >>>
A major challenge in the burgeoning field of quantum simulation forhigh-energy physics is the realization of scalable $2+1$D lattice gaugetheories on state-of-the-art quantum hardware, which is an essential steptowards the overarching goal of probing $3+1$D quantum chromodynamics on aquantum computer. Despite great progress, current experimental implementationsof $2+1$D lattice gauge theories are mostly restricted to relatively smallsystem sizes and two-level representations of the gauge and electric fields.Here, we propose a resource-efficient method for quantum simulating $2+1$Dspin-$S$ $\mathrm{U}(1)$ quantum link lattice gauge theories with dynamicalmatter using qudit-based quantum processors. By integrating out the matterfields through Gauss's law, we reformulate the quantum link model in a purelyspin picture compatible with qudit encoding across arbitrary spatialdimensions, eliminating the need for ancillary qubits and reducing resourceoverhead. Focusing first on the spin-$1/2$ case, we construct explicit circuitsfor the full Hamiltonian and demonstrate through numerical simulations that thefirst-order Trotterized circuits accurately capture the quench dynamics even inthe presence of realistic noise levels. Additionally, we introduce a generalmethod for constructing coupling-term circuits for higher-spin representations$S>1/2$. Compared to conventional qubit encodings, our framework significantlyreduces the number of quantum resources and gate count. Our approachsignificantly enhances scalability and fidelity for probing nonequilibriumphenomena in higher-dimensional lattice gauge theories, and is readily amenableto implementation on state-of-the-art qudit platforms. <<<
翻译
回到顶部