刘馨云 (2025-06-30 20:34):
#paper arXiv:2406.10206;Visual Imitation Enables Contextual Humanoid Control;UC Berkeley, 2024;链接:https://videomimic.net VIDEOMIMIC 是一个从现实视频中学习上下文感知技能的类人机器人控制方法。论文提出一种 real-to-sim-to-real 模型训练管线,首次实现在无任务标签、无奖励函数、无 MoCap 情况下,仅通过日常视频即可训练并部署一个能上下楼梯、坐下、起立、越障的通用控制策略。 核心贡献:首次提出从单目日常视频中提取4D人-场景几何信息用于机器人控制学习:同时重建人体运动与环境几何(mesh);使用人体身高先验解决尺度不确定性,生成物理仿真可用的环境与动作数据。设计了多阶段 RL 策略训练管线,实现从视频到通用策略的学习:采用 MoCap 数据预训练;引入高度图作为环境输入,实现地形感知;利用 DAgger 蒸馏去除目标角依赖,训练单一策略统一执行坐起/上下楼等多任务。所学策略仅依赖机器人自身状态与 LiDAR 高度图即可在真实机器人上运行:使用 Unitree G1 部署,实现在室内外多种楼梯、草地、椅子场景下动作;在未知环境中无需任务标签,通过“地形+方向”自然触发相应行为。相较基线方法,VIDEOMIMIC 重建精度与泛化能力大幅提升:
arXiv, 2025-05-06T17:57:12Z. DOI: 10.48550/arXiv.2505.03729
Visual Imitation Enables Contextual Humanoid Control
Arthur Allshire, Hongsuk Choi, Junyi Zhang, David McAllister, Anthony Zhang, Chung Min Kim, Trevor Darrell, Pieter Abbeel, Jitendra Malik, Angjoo Kanazawa
Abstract:
How can we teach humanoids to climb staircases and sit on chairs using the<br>surrounding environment context? Arguably, the simplest way is to just show<br>them-casually capture a human motion video and feed it to humanoids. We<br>introduce VIDEOMIMIC, a real-to-sim-to-real pipeline that mines everyday<br>videos, jointly reconstructs the humans and the environment, and produces<br>whole-body control policies for humanoid robots that perform the corresponding<br>skills. We demonstrate the results of our pipeline on real humanoid robots,<br>showing robust, repeatable contextual control such as staircase ascents and<br>descents, sitting and standing from chairs and benches, as well as other<br>dynamic whole-body skills-all from a single policy, conditioned on the<br>environment and global root commands. VIDEOMIMIC offers a scalable path towards<br>teaching humanoids to operate in diverse real-world environments.
回到顶部