文献收藏与分享平台

刘馨云 (2025-06-30 20:34):

#paper arXiv:2406.10206；Visual Imitation Enables Contextual Humanoid Control；UC Berkeley, 2024；链接：https://videomimic.net VIDEOMIMIC 是一个从现实视频中学习上下文感知技能的类人机器人控制方法。论文提出一种 real-to-sim-to-real 模型训练管线，首次实现在无任务标签、无奖励函数、无 MoCap 情况下，仅通过日常视频即可训练并部署一个能上下楼梯、坐下、起立、越障的通用控制策略。核心贡献：首次提出从单目日常视频中提取4D人-场景几何信息用于机器人控制学习：同时重建人体运动与环境几何（mesh）；使用人体身高先验解决尺度不确定性，生成物理仿真可用的环境与动作数据。设计了多阶段 RL 策略训练管线，实现从视频到通用策略的学习：采用 MoCap 数据预训练；引入高度图作为环境输入，实现地形感知；利用 DAgger 蒸馏去除目标角依赖，训练单一策略统一执行坐起/上下楼等多任务。所学策略仅依赖机器人自身状态与 LiDAR 高度图即可在真实机器人上运行：使用 Unitree G1 部署，实现在室内外多种楼梯、草地、椅子场景下动作；在未知环境中无需任务标签，通过“地形+方向”自然触发相应行为。相较基线方法，VIDEOMIMIC 重建精度与泛化能力大幅提升：

arXiv, 2025-05-06T17:57:12Z. DOI: 10.48550/arXiv.2505.03729

Visual Imitation Enables Contextual Humanoid Control

翻译

Arthur Allshire, Hongsuk Choi, Junyi Zhang, David McAllister, Anthony Zhang, Chung Min Kim, Trevor Darrell, Pieter Abbeel, Jitendra Malik, Angjoo Kanazawa

Abstract:

How can we teach humanoids to climb staircases and sit on chairs using thesurrounding environment context? Arguably, the simplest way is to just showthem-casually capture a human motion video and feed it to humanoids. Weintroduce VIDEOMIMIC, a real-to-sim-to-real pipeline that mines everydayvideos, jointly reconstructs the humans and the environment, and produceswhole-body control policies for humanoid robots that perform the correspondingskills. We demonstrate the results of our pipeline on real humanoid robots,showing robust, repeatable contextual control such as staircase ascents anddescents, sitting and standing from chairs and benches, as well as otherdynamic whole-body skills-all from a single policy, conditioned on theenvironment and global root commands. VIDEOMIMIC offers a scalable path towardsteaching humanoids to operate in diverse real-world environments.

翻译

Related Links:

https://doi.org/10.48550/arXiv.2505.03729