文献收藏与分享平台

21.

林海onrush (2025-10-31 23:18):

#paper, PALQO: Physics-informed Model for Accelerating Large-scale Quantum Optimization，DOI:10.48550/arXiv.2509.20733。这篇论文提出了 PALQO，一种基于物理约束神经网络（PINN）的新方法用于加速大规模变分量子算法（VQAs）的训练。作者将 VQA 的参数更新过程重新表述为非线性偏微分方程（PDE）问题，并利用 PINN 在经典计算机上学习优化动力学，仅需少量量子测量数据即可预测后续参数更新，从而显著减少量子设备调用。理论分析表明，PALQO 具有良好的泛化性能，其所需训练样本数量随参数规模多项式增长。在横场 Ising 模型、Heisenberg 模型及分子体系（如 LiH、BeH₂）等任务上的实验显示，PALQO 能在保持能量精度（误差约 (10^{-3})）的同时，将量子测量开销降低约90%，实现最高30倍加速。该方法在多体系统和量子化学计算中表现出良好的可扩展性，为在受限量子资源条件下推进大规模量子优化提供了新的思路。

arXiv, 2025-09-25T04:26:02Z. DOI: 10.48550/arXiv.2509.20733

PALQO: Physics-informed Model for Accelerating Large-scale Quantum Optimization

Yiming Huang, Yajie Hao, Jing Zhou, Xiao Yuan, Xiaoting Wang, Yuxuan Du

Abstract:

Variational quantum algorithms (VQAs) are leading strategies to reach
practical utilities of near-term quantum devices. However, the no-cloning
theorem in quantum mechanics precludes standard backpropagation, leading to
prohibitive quantum resource costs when applying VQAs to large-scale tasks. To
address this challenge, we reformulate the training dynamics of VQAs as a
nonlinear partial differential equation and propose a novel protocol that
leverages physics-informed neural networks (PINNs) to model this dynamical
system efficiently. Given a small amount of training tra… >>>

22.

符毓 (2025-10-31 22:50):

#paper doi: 10.48550/arXiv.2510.10903, 2025, Towards a Unified Understanding of Robot Manipulation: A Comprehensive Survey 一篇全面涵盖机器人操作领域的全景视角综述。超 1000 篇参考系统地梳理了机器人操作领域的全景图谱，涵盖硬件与控制基础、任务与数据体系、高低层控制框架，以及跨本体与跨模态的泛化研究，并提出了一个统一的理解框架，揭示机器人如何从“执行任务”走向“理解与学习任务”。

arXiv, 2025-10-13T01:59:27Z. DOI: 10.48550/arXiv.2510.10903

Towards a Unified Understanding of Robot Manipulation: A Comprehensive Survey

Shuanghao Bai, Wenxuan Song, Jiayi Chen, Yuheng Ji, Zhide Zhong, Jin Yang, Han Zhao, Wanqi Zhou, Wei Zhao, Zhe Li ... >>>

Abstract:

Embodied intelligence has witnessed remarkable progress in recent years,
driven by advances in computer vision, natural language processing, and the
rise of large-scale multimodal models. Among its core challenges, robot
manipulation stands out as a fundamental yet intricate problem, requiring the
seamless integration of perception, planning, and control to enable interaction
within diverse and unstructured environments. This survey presents a
comprehensive overview of robotic manipulation, encompassing foundational
background, task-organized benchmarks and datasets, and … >>>

23.

Vincent (2025-10-31 16:28):

#paper https://doi.org/10.48550/arXiv.2510.14901 Arxiv. 2025. Reasoning with Sampling: Your Base Model is Smarter Than You Think. 大语言模型(LLM)+ 强化学习(RL)在众多领域展现出了强大的推理能力，以往研究多集中于探讨强化学习如何赋予基础模型其原本不具备的能力。这篇文章另辟蹊径，提出一个发人深省的问题：是否仅通过采样，而非额外训练，就能让基础模型展现出与强化学习策略相当的推理能力？这篇文章基于模型自身的似然值，提出了一种简单的基于马尔可夫蒙特卡罗（MCMC）的迭代采样方法。实验结果显示，该方法在多种基础模型上均取得了与强化学习算法相当甚至更优的表现。更为重要的是，这一方法避免了强化学习中常见的多样性缺失问题，且无需额外数据或者训练，展现出其在不同领域中的广泛应用潜力

arXiv, 2025-10-16T17:18:11Z. DOI: 10.48550/arXiv.2510.14901

Reasoning with Sampling: Your Base Model is Smarter Than You Think

Aayush Karan, Yilun Du

Abstract:

Frontier reasoning models have exhibited incredible capabilities across a
wide array of disciplines, driven by posttraining large language models (LLMs)
with reinforcement learning (RL). However, despite the widespread success of
this paradigm, much of the literature has been devoted to disentangling truly
novel behaviors that emerge during RL but are not present in the base models.
In our work, we approach this question from a different angle, instead asking
whether comparable reasoning capabilites can be elicited from base models at
inference time by pure sampling, with… >>>

24.

刘昊辰 (2025-10-27 14:21):

#paper Strongly Solving 2048 4×3. 本文由日本东京大学研究者提出，成功强解了 2048 游戏的 4×3 变体（2048₄ₓ₃），核心关键技术是基于 ”年龄（age）”（定义为棋盘上所有方块数字之和）对状态空间进行划分 —— 状态与后续动作后的过渡态（afterstate）年龄保持不变，过渡态到新状态时年龄因新增方块（2 或 4）增加 2 或 4，据此可分阶段枚举状态并控制内存占用；同时采用Elias-Fano 编码实现状态的紧凑存储，将约 4.4TiB 的原始存储需求压缩至 1.4TiB（最优玩法专用存储仅需 300GiB）。研究结果显示，最常见初始状态（两个 2 方块，年龄 4）的最优策略期望得分为50724.26，可到达状态数与过渡态数分别为1.15×10¹²和7.40×10¹¹，且验证了 “生成大数字方块（如 2048）时难度显著提升” 等玩家直觉。下载地址：https://arxiv.org/pdf/2510.04580

arXiv, 2025-10-06T08:31:59Z. DOI: 10.48550/arXiv.2510.04580

Strongly Solving 2048 4x3

Tomoyuki Kaneko, Shuhei Yamashita

Abstract:

2048 is a stochastic single-player game involving 16 cells on a 4 by 4 grid,
where a player chooses a direction among up, down, left, and right to obtain a
score by merging two tiles with the same number located in neighboring cells
along the chosen direction. This paper presents that a variant 2048-4x3 12
cells on a 4 by 3 board, one row smaller than the original, has been strongly
solved. In this variant, the expected score achieved by an optimal strategy is
about $50724.26$ for the most common initial states: ones with two tiles of
number 2. The numbers of reachable st… >>>

25.

符毓 (2025-09-30 23:42):

#paper doi: 10.48550/arXiv.2509.13311, 2025, Towards General Agentic Intelligence via Environment Scaling. 以往训练这类“代理智能”的主要瓶颈在于缺乏高质量、大规模、多样化的交互数据。人工标注成本极高，而单纯用模型生成的数据又往往不够真实或难以验证。这篇由阿里巴巴通义实验室团队发表的论文（通过环境扩展迈向通用代理智能）提出了一条全新的路径：通过程序化、自动化地构建海量、异构、可验证的模拟环境，让语言模型能在其中自主交互、收集经验、学习成长。基于该方法训练的AgentScaler模型系列，仅用数十亿参数就在多项权威测试中达到了与万亿级模型或闭源商业系统媲美的性能，为高效、轻量级代理智能的发展打开了新的可能性。

arXiv, 2025-09-16T17:57:20Z. DOI: 10.48550/arXiv.2509.13311

Towards General Agentic Intelligence via Environment Scaling

Runnan Fang, Shihao Cai, Baixuan Li, Jialong Wu, Guangyu Li, Wenbiao Yin, Xinyu Wang, Xiaobin Wang, Liangcai Su, Zhen Zhang ... >>>

Abstract:

Advanced agentic intelligence is a prerequisite for deploying Large Language
Models in practical, real-world applications. Diverse real-world APIs demand
precise, robust function-calling intelligence, which needs agents to develop
these capabilities through interaction in varied environments. The breadth of
function-calling competence is closely tied to the diversity of environments in
which agents are trained. In this work, we scale up environments as a step
towards advancing general agentic intelligence. This gives rise to two central
challenges: (i) how to scale enviro… >>>

26.

尹志 (2025-09-30 22:39):

#paper Quantum computing and artificial intelligence: status and perspectives. doi: 10.48550/arXiv.2505.23860 比较新的一篇QAI的综述。比较细致的介绍了Quantum for AI及AI for Quantum，还有基础问题。最后介绍了一些目前这个领域所遇到的挑战。有两个特点值得一提，一个就是确实很新，目前基本的QAI的问题都有涉及；第二个就是这是一个全欧洲阵容的研究人员写的QAI综述，文章的开头就明确了自己的位置，这点还是很耐人寻味的。

arXiv, 2025-05-29T08:15:23Z. DOI: 10.48550/arXiv.2505.23860

Quantum computing and artificial intelligence: status and perspectives

Giovanni Acampora, Andris Ambainis, Natalia Ares, Leonardo Banchi, Pallavi Bhardwaj, Daniele Binosi, G. Andrew D. Briggs, Tommaso Calarco, Vedran Dunjko, Jens Eisert ... >>>

Abstract:

This white paper discusses and explores the various points of intersection
between quantum computing and artificial intelligence (AI). It describes how
quantum computing could support the development of innovative AI solutions. It
also examines use cases of classical AI that can empower research and
development in quantum technologies, with a focus on quantum computing and
quantum sensing. The purpose of this white paper is to provide a long-term
research agenda aimed at addressing foundational questions about how AI and
quantum computing interact and benefit one another.… >>>

27.

刘昊辰 (2025-09-08 15:13):

#paper Gemini 2.5 Pro Capable of Winning Gold at IMO 2025 研究团队通过构建自验证流程（含初始解题、自我改进、验证纠错等步骤）并优化提示词设计，利用 Google 的Gemini 2.5 Pro 模型在 2025 年国际数学奥林匹克竞赛（IMO 2025）的 6 道题目中成功解出 5 道，且为避免数据污染仅使用最新发布的 IMO 2025 题目作为测试集；研究还对比了带提示（如数学归纳法、解析几何）与无提示解题的效果，发现提示主要提升效率而非创造新能力，同时指出模型在第 6 题中因错误假设导致解题失败，最终证实强大 LLM 结合合理策略可实现高水平数学推理，接近人类金牌水平。下载地址：https://arxiv.org/pdf/2507.15855

arXiv, 2025-07-21T17:59:49Z. DOI: 10.48550/arXiv.2507.15855

Gemini 2.5 Pro Capable of Winning Gold at IMO 2025

Yichen Huang, Lin F. Yang

Abstract:

The International Mathematical Olympiad (IMO) poses uniquely challenging
problems requiring deep insight, creativity, and formal reasoning. While Large
Language Models (LLMs) perform well on mathematical benchmarks like AIME, they
struggle with Olympiad-level tasks. We use Google's Gemini 2.5 Pro on the newly
released IMO 2025 problems, avoiding data contamination. Using a
self-verification pipeline with careful prompt design, 5 (out of 6) problems
are solved correctly. This result underscores the importance of developing
optimal strategies to harness the full potential o… >>>

28.

符毓 (2025-08-31 23:27):

#paper doi: 10.48550/arXiv.2507.21046, 2025, A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence. 本综述首次系统而全面地回顾了自演化的智能体，并围绕三个基本维度：演化什么、何时演化以及如何演化进行了梳理。大型语言模型 (LLM) 其本质上仍处于静态，无法调整其内部参数以适应新任务、不断发展的知识领域或动态交互环境。随着 LLM 越来越多地部署在开放式交互式环境中，这种静态特性已成为关键瓶颈。本文研究了跨代理组件（例如模型、内存、工具、架构）的演化机制，按阶段（例如测试内、测试间）对适应方法进行分类，并分析指导演化适应的算法和架构设计（例如标量奖励、文本反馈、单代理和多代理系统）。

arXiv, 2025/8/1.

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Abstract:

Large Language Models (LLMs) have demonstrated strong capabilities but remain fundamentally static, unable to adapt their internal parameters to novel tasks, evolving knowledge domains, or dynamic interaction contexts. As LLMs are increasingly deployed in open-ended, interactive environments, this static nature has become a critical bottleneck, necessitating agents that can adaptively reason, act, and evolve in real time. This paradigm shift -- from scaling static models to developing self-evolving agents -- has sparked growing interest in architectures and methods enabling continual learning… >>>

29.

尹志 (2025-08-31 12:56):

#paper doi:10.48550/arXiv.2505.13683, ISCA, 2025, Genesis: A Compiler Framework for Hamiltonian Simulation on Hybrid CV-DV Quantum Computers. 作者引入了第一个基于连续离散混合量子计算系统的针对哈密顿量模拟的量子编译框架，非常有意思的工作。该框架分为哈密顿量初步分解和进一步的mapping和routing。也在几个常见的物理模型上做了验证。量子编译作为量子计算机的一个重要环节，值得更多关注和技术的突破。

arXiv, 2025-05-19T19:32:06Z. DOI: 10.48550/arXiv.2505.13683

Genesis: A Compiler Framework for Hamiltonian Simulation on Hybrid CV-DV Quantum Computers

Zihan Chen, Jiakang Li, Minghao Guo, Henry Chen, Zirui Li, Joel Bierman, Yipeng Huang, Huiyang Zhou, Yuan Liu, Eddy Z. Zhang

Abstract:

This paper introduces Genesis, the first compiler designed to support
Hamiltonian Simulation on hybrid continuous-variable (CV) and discrete-variable
(DV) quantum computing systems. Genesis is a two-level compilation system. At
the first level, it decomposes an input Hamiltonian into basis gates using the
native instruction set of the target hybrid CV-DV quantum computer. At the
second level, it tackles the mapping and routing of qumodes/qubits to implement
long-range interactions for the gates decomposed from the first level. Rather
than a typical implementation that rel… >>>

30.

刘昊辰 (2025-08-19 13:25):

#paper Search-contempt a hybrid MCTS algorithm for training AlphaZero-like engines with better computational efficiency提出search-contempt，一种结合PUCT与Thompson Sampling（TS）的混合 MCTS 算法，通过新参数Nscl调控自对弈中生成的棋局分布，偏好 “挑战性” 局面。在常规国际象棋中，其生成的训练棋局质量更高，使引擎强度提升约70 Elo，且训练所需棋局数量从数千万减少至数十万，计算成本从数千万美元降至数万美元；在Odds Chess（一方开局劣势）中，强度提升约150 Elo，同时增强系统对抗鲁棒性，有望在消费级 GPU 上实现从零训练。下载地址：https://arxiv.org/pdf/2504.07757

arXiv, 10 Apr 2025.

Search-contempt: a hybrid MCTS algorithm for training AlphaZero-like engines with better computational efficiency

31.

尹志 (2025-07-31 23:59):

#paper doi: 10.48550/arXiv.2507.06216 Unitary designs in nearly optimal depth. 文章设计了一种全新的量子电路，该电路可以接近理论最优深度高效构建unitray k-designs. 如果这个方案足够有效，那么对后续的量子算法的设计无疑非常有帮助。

arXiv, 2025-07-08T17:48:33Z. DOI: 10.48550/arXiv.2507.06216

Unitary designs in nearly optimal depth

Laura Cui, Thomas Schuster, Fernando Brandao, Hsin-Yuan Huang

Abstract:

We construct $\varepsilon$-approximate unitary $k$-designs on $n$ qubits in
circuit depth $O(\log k \log \log n k / \varepsilon)$. The depth is
exponentially improved over all known results in all three parameters $n$, $k$,
$\varepsilon$. We further show that each dependence is optimal up to
exponentially smaller factors. Our construction uses $\tilde{{O}}(nk)$ ancilla
qubits and ${O}(nk)$ bits of randomness, which are also optimal up to $\log(n
k)$ factors. An alternative construction achieves a smaller ancilla count
$\tilde{{O}}(n)$ with circuit depth ${O}(k \log \log n… >>>

32.

林海onrush (2025-07-31 23:19):

#paper, 《Efficient Qudit Circuit for Quench Dynamics of 2+1D Quantum Link Electrodynamics》,10.48550/arXiv.2507.12589 , 本研究提出了一种基于多能级量子比特（qudit）的高效量子电路框架，用于模拟2+1维U(1)格点规范电动力学的淬灭动力学。通过利用高斯定律积分出物质场，仅保留规范自由度，作者构建了无需辅助qubit的紧凑电路设计，并通过数值模拟验证其在现实噪声下仍能保持高度相干的动态演化表现。该方法不仅大幅降低了量子资源消耗，还适用于任意自旋表示和更高维度格点系统，具备良好的可扩展性。相比传统qubit编码，qudit实现更贴近硬件特性，适用于当前和近期的量子处理器，为模拟高能物理非平衡现象提供了一条切实可行的量子计算路径。

arXiv, 2025-07-16T19:16:49Z. DOI: 10.48550/arXiv.2507.12589

Efficient Qudit Circuit for Quench Dynamics of $2+1$D Quantum Link Electrodynamics

Rohan Joshi, Michael Meth, Jan C. Louw, Jesse J. Osborne, Kevin Mato, Martin Ringbauer, Jad C. Halimeh

Abstract:

A major challenge in the burgeoning field of quantum simulation for
high-energy physics is the realization of scalable $2+1$D lattice gauge
theories on state-of-the-art quantum hardware, which is an essential step
towards the overarching goal of probing $3+1$D quantum chromodynamics on a
quantum computer. Despite great progress, current experimental implementations
of $2+1$D lattice gauge theories are mostly restricted to relatively small
system sizes and two-level representations of the gauge and electric fields.
Here, we propose a resource-efficient method for quantum s… >>>

33.

刘昊辰 (2025-07-09 14:59):

#paper Rapfi Distilling Efficient Neural Network for the Game of Gomoku. 本文提出 Rapfi，一种高效的五子棋智能体，在有限计算环境中表现优于基于 CNN 的智能体。Rapfi 利用从 CNN 提炼的基于模式的码本压缩神经网络，以及在输入变化较小时最小化计算的增量更新方案。这种新网络使用数量级更少的计算量，达到与 ResNet 等更大神经网络相似的精度。得益于增量更新方案，深度优先搜索方法（如 α-β 搜索）可以显著加速。通过精心调整评估和搜索，Rapfi 在缺乏 GPU 等加速器的有限计算资源下，实力超越了基于 AlphaZero 算法的最强开源五子棋 AI Katagomo。Rapfi 在 Botzone 的 520 个五子棋智能体中排名第一，并在 2024 年 GomoCup 中夺冠。下载地址：https://arxiv.org/pdf/2503.13178

arXiv, 2025-03-17T13:53:57Z. DOI: 10.48550/arXiv.2503.13178

Rapfi: Distilling Efficient Neural Network for the Game of Gomoku

Zhanggen Jin, Haobin Duan, Zhiyang Hang

Abstract:

Games have played a pivotal role in advancing artificial intelligence, with
AI agents using sophisticated techniques to compete. Despite the success of
neural network based game AIs, their performance often requires significant
computational resources. In this paper, we present Rapfi, an efficient Gomoku
agent that outperforms CNN-based agents in limited computation environments.
Rapfi leverages a compact neural network with a pattern-based codebook
distilled from CNNs, and an incremental update scheme that minimizes
computation when input changes are minor. This new netw… >>>

34.

尹志 (2025-06-30 23:17):

#paper arXiv:2411.09131；Artificial Intelligence for Quantum Computing；2024；Yuri大佬带领的一篇综述，把AI用于量子计算的几个方面都做了分析和展望，虽然不是特别细致，但如果你希望量子计算能更快做出实际问题的优越性，显然不应该错过这篇综述。

arXiv, 2024-11-14T02:11:16Z. DOI: 10.48550/arXiv.2411.09131

Artificial Intelligence for Quantum Computing

Yuri Alexeev, Marwa H. Farag, Taylor L. Patti, Mark E. Wolf, Natalia Ares, Alán Aspuru-Guzik, Simon C. Benjamin, Zhenyu Cai, Zohim Chandani, Federico Fedele ... >>>

Abstract:

Artificial intelligence (AI) advancements over the past few years have had an
unprecedented and revolutionary impact across everyday application areas. Its
significance also extends to technical challenges within science and
engineering, including the nascent field of quantum computing (QC). The
counterintuitive nature and high-dimensional mathematics of QC make it a prime
candidate for AI's data-driven learning capabilities, and in fact, many of QC's
biggest scaling challenges may ultimately rest on developments in AI. However,
bringing leading techniques from AI to QC r… >>>

35.

刘馨云 (2025-06-30 20:34):

#paper arXiv:2406.10206；Visual Imitation Enables Contextual Humanoid Control；UC Berkeley, 2024；链接：https://videomimic.net VIDEOMIMIC 是一个从现实视频中学习上下文感知技能的类人机器人控制方法。论文提出一种 real-to-sim-to-real 模型训练管线，首次实现在无任务标签、无奖励函数、无 MoCap 情况下，仅通过日常视频即可训练并部署一个能上下楼梯、坐下、起立、越障的通用控制策略。核心贡献：首次提出从单目日常视频中提取4D人-场景几何信息用于机器人控制学习：同时重建人体运动与环境几何（mesh）；使用人体身高先验解决尺度不确定性，生成物理仿真可用的环境与动作数据。设计了多阶段 RL 策略训练管线，实现从视频到通用策略的学习：采用 MoCap 数据预训练；引入高度图作为环境输入，实现地形感知；利用 DAgger 蒸馏去除目标角依赖，训练单一策略统一执行坐起/上下楼等多任务。所学策略仅依赖机器人自身状态与 LiDAR 高度图即可在真实机器人上运行：使用 Unitree G1 部署，实现在室内外多种楼梯、草地、椅子场景下动作；在未知环境中无需任务标签，通过“地形+方向”自然触发相应行为。相较基线方法，VIDEOMIMIC 重建精度与泛化能力大幅提升：

arXiv, 2025-05-06T17:57:12Z. DOI: 10.48550/arXiv.2505.03729

Visual Imitation Enables Contextual Humanoid Control

Arthur Allshire, Hongsuk Choi, Junyi Zhang, David McAllister, Anthony Zhang, Chung Min Kim, Trevor Darrell, Pieter Abbeel, Jitendra Malik, Angjoo Kanazawa

Abstract:

How can we teach humanoids to climb staircases and sit on chairs using the
surrounding environment context? Arguably, the simplest way is to just show
them-casually capture a human motion video and feed it to humanoids. We
introduce VIDEOMIMIC, a real-to-sim-to-real pipeline that mines everyday
videos, jointly reconstructs the humans and the environment, and produces
whole-body control policies for humanoid robots that perform the corresponding
skills. We demonstrate the results of our pipeline on real humanoid robots,
showing robust, repeatable contextual control such as… >>>

36.

林海onrush (2025-06-07 13:27):

#paper, Token-Importance Guided Direct Preference Optimization，DOI: https://arxiv.org/abs/2505.19653, share一下个人最新的大模型微调算法工作，我们针对大语言模型（LLMs）如何更好地对齐人类偏好提出了一种新方法——TI-DPO。以往常用的DPO（直接偏好优化）方法虽然省去了显式奖励模型，直接用人类偏好数据优化模型，但忽略了不同token（词/字）在生成内容中的重要性差异，这可能导致模型在关键token上犯错，从而产生不符合人类价值观的输出。 TI-DPO通过两大创新点解决了这一问题： 1. 在token level层面引入基于梯度归因的Token重要性权重，能动态识别和优先优化对人类偏好最关键的token； 2. 加入基于对比学习的Triplet（三元组）损失，不仅区分“好-坏”样本，还引入“中间”输出，使优化更细致，有助于模型生成更接近人类期望、远离不理想响应的内容。实验表明，TI-DPO在多个任务上（如TruthfulQA、IFEval等）表现优异，准确率和生成多样性均超过DPO及其他对齐方法。消融实验进一步验证了token-importance机制和triplet loss的必要性和有效性。理论分析还证明了TI-DPO在优化上拥有更严格的损失下界，训练过程更加稳定。TI-DPO通过精细化地关注关键token，并结合三元组对齐结构，有效提升了大模型的对齐能力与输出质量，为人机交互中的AI安全和有用性提供了新的解决方案。

arXiv, 2025-05-26T08:11:24Z. DOI: 10.48550/arXiv.2505.19653

Token-Importance Guided Direct Preference Optimization

Ning Yang, Hai Lin, Yibo Liu, Baoliang Tian, Guoqing Liu, Haijun Zhang

Abstract:

Ensuring that large language models (LLMs) generate outputs aligned with
human preferences is important for safe and effective AI interactions. While
Direct Preference Optimization (DPO) employs an implicit reward function to
optimize the policy model, however, it and its related variants overlook the
differential importance of individual tokens and are sensitive to judgment
noise in preference datasets during generation. Although recent methods attempt
to assess the important weight of tokens via probability prediction or
simplistic weighting schemes, these evaluation me… >>>

37.

刘昊辰 (2025-06-03 16:34):

#paper AlphaZero-Edu Making AlphaZero Accessible to Everyone. AlphaZero-Edu 是基于 AlphaZero 数学框架开发的轻量化强化学习框架，专为教育场景和五子棋设计，具有模块化架构（解耦蒙特卡洛树搜索、自我对弈训练、策略价值网络）、资源高效训练（单块 NVIDIA RTX 3090 GPU 即可运行）和高度并行自我对弈数据生成（8 进程实现 3.2 倍加速）等特点。其状态特征采用 21 层张量（含当前状态和 20 层历史状态），输出包含策略概率分布和价值评估标量，并通过旋转 / 翻转数据增强提升泛化能力。训练中结合循环学习率调度器，使策略损失和价值损失均收敛，且在与 4 名人类玩家的对战中实现最高 100% 胜率，最低 60% 胜率（含 20% 平局）。该框架已开源，为学术研究和工业应用提供了可访问的基准。下载地址：https://arxiv.org/pdf/2504.14636

arXiv, 2025-04-20T14:29:39Z. DOI: 10.48550/arXiv.2504.14636

AlphaZero-Edu: Making AlphaZero Accessible to Everyone

Binjie Guo, Hanyu Zheng, Guowei Su, Ru Zhang, Haohan Jiang, Xurong Lin, Hongyan Wei, Aisheng Mo, Jie Li, Zhiyuan Qian ... >>>

Abstract:

Recent years have witnessed significant progress in reinforcement learning,
especially with Zero-like paradigms, which have greatly boosted the
generalization and reasoning abilities of large-scale language models.
Nevertheless, existing frameworks are often plagued by high implementation
complexity and poor reproducibility. To tackle these challenges, we present
AlphaZero-Edu, a lightweight, education-focused implementation built upon the
mathematical framework of AlphaZero. It boasts a modular architecture that
disentangles key components, enabling transparent visualiza… >>>

38.

符毓 (2025-05-31 22:59):

#paper doi: 10.48550/arXiv.2505.21906, 2025, Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge. 视觉-语言-动作 (VLA) 模型已成为机器人领域的下一代模型。然而，尽管现有的端到端 VLA 系统利用了强大的预训练视觉-语言模型 (VLM)，但在微调过程中，由于模型需要适应特定的机器人任务，它们往往会丢失关键功能。我们认为，一个可泛化的 VLA 模型应该保留并扩展 VLM 的核心能力：1）开放世界具身推理——VLA 应该继承 VLM 的知识，即识别 VLM 能够识别的任何事物，能够解决数学问题，并具备视觉空间智能；2）推理跟随——有效地将开放世界推理转化为机器人可执行的步骤。本文推出ChatVLA-2，通过端到端利用预训练视觉语言模型所获得的先天推理和理解能力，赋予视觉-语言-动作 (VLA) 模型执行各种任务的能力。核心贡献是在预训练的视觉语言主干之上集成了一个dynamic Mixture-of-Experts (MoE)模块。该模块可以有效地管理不同的任务需求，其中一些专家共识共享普遍的多模态特征，而其他专家则专注于特定任务的表征。此外，提出了一种两阶段训练策略：首先，引导 VLA 模型建立预训练多模态知识与机器人动作之间的联系；随后，引入推理跟踪阶段，使模型能够理解推理输出并有效地将其转化为相应的动作。

arXiv, 2025-05-28T02:48:42Z. DOI: 10.48550/arXiv.2505.21906

Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge

Zhongyi Zhou, Yichen Zhu, Junjie Wen, Chaomin Shen, Yi Xu

Abstract:

Vision-language-action (VLA) models have emerged as the next generation of
models in robotics. However, despite leveraging powerful pre-trained
Vision-Language Models (VLMs), existing end-to-end VLA systems often lose key
capabilities during fine-tuning as the model adapts to specific robotic tasks.
We argue that a generalizable VLA model should retain and expand upon the VLM's
core competencies: 1) Open-world embodied reasoning - the VLA should inherit
the knowledge from VLM, i.e., recognize anything that the VLM can recognize,
capable of solving math problems, possessin… >>>

39.

刘馨云 (2025-05-31 21:32):

#paper https://arxiv.org/pdf/2505.20290 人类通过观察他人来学习新任务。受到这一点启发，我们提出了 EgoZero 框架，它可以从人类佩戴智能眼镜拍摄的第三人称视频中学习闭环机器人策略。智能眼镜能够捕捉人类交互的丰富多模态第一人称视角：RGB 视频记录周围场景，IMU（惯性测量单元）提供头部运动信息，麦克风则记录对话和环境声音。我们的方法仅通过观察这些第一人称视频来学习如何行动，无需任何机器人演示。当给定一个人类完成任务的视频时，EgoZero 能预测一系列中间目标和语言子目标，并据此在真实机器人上以闭环方式执行任务。EgoZero 将人类观察压缩为与机器人形态无关的状态表示，这些表示可用于决策和闭环控制。所学策略在不同的机器人形态、环境和任务之间表现出良好的泛化能力。我们在真实的 Franka Panda 机械臂上进行了验证，结果表明 EgoZero 能以 70% 的零样本成功率完成多种具有挑战性的操控任务，每项任务仅需 20 分钟的数据采集时间。

arXiv, 2025-05-26T17:59:17Z. DOI: 10.48550/arXiv.2505.20290

EgoZero: Robot Learning from Smart Glasses

Vincent Liu, Ademi Adeniji, Haotian Zhan, Raunaq Bhirangi, Pieter Abbeel, Lerrel Pinto

Abstract:

Despite recent progress in general purpose robotics, robot policies still lag
far behind basic human capabilities in the real world. Humans interact
constantly with the physical world, yet this rich data resource remains largely
untapped in robot learning. We propose EgoZero, a minimal system that learns
robust manipulation policies from human demonstrations captured with Project
Aria smart glasses, $\textbf{and zero robot data}$. EgoZero enables: (1)
extraction of complete, robot-executable actions from in-the-wild, egocentric,
human demonstrations, (2) compression of hu… >>>

40.

尹志 (2025-05-31 21:23):

#paper https://doi.org/10.48550/arXiv.2012.07436 Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting。这是AAAI2021上的一篇关于长序列时序建模的经典工作。文章对传统Transformer进行了改进，提出了一类新的模型Informer，通过对self attention的改进和蒸馏，以及generative style decoder的构建，在时间复杂度、空间复杂度上都改善了传统Transformer存在的问题。该工作在多个数据集上取得了良好的性能。上述的几个思路在后续的时序建模中被频繁使用，非常具有启发性。

arXiv, 2020-12-14T11:43:09Z. DOI: 10.48550/arXiv.2012.07436

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, Wancai Zhang

Abstract:

Many real-world applications require the prediction of long sequence
time-series, such as electricity consumption planning. Long sequence
time-series forecasting (LSTF) demands a high prediction capacity of the model,
which is the ability to capture precise long-range dependency coupling between
output and input efficiently. Recent studies have shown the potential of
Transformer to increase the prediction capacity. However, there are several
severe issues with Transformer that prevent it from being directly applicable
to LSTF, including quadratic time complexity, high mem… >>>