来自用户 刘馨云 的文献。
当前共找到 2 篇文献分享。
1.
刘馨云 (2025-06-30 20:34):
#paper arXiv:2406.10206;Visual Imitation Enables Contextual Humanoid Control;UC Berkeley, 2024;链接:https://videomimic.net VIDEOMIMIC 是一个从现实视频中学习上下文感知技能的类人机器人控制方法。论文提出一种 real-to-sim-to-real 模型训练管线,首次实现在无任务标签、无奖励函数、无 MoCap 情况下,仅通过日常视频即可训练并部署一个能上下楼梯、坐下、起立、越障的通用控制策略。 核心贡献:首次提出从单目日常视频中提取4D人-场景几何信息用于机器人控制学习:同时重建人体运动与环境几何(mesh);使用人体身高先验解决尺度不确定性,生成物理仿真可用的环境与动作数据。设计了多阶段 RL 策略训练管线,实现从视频到通用策略的学习:采用 MoCap 数据预训练;引入高度图作为环境输入,实现地形感知;利用 DAgger 蒸馏去除目标角依赖,训练单一策略统一执行坐起/上下楼等多任务。所学策略仅依赖机器人自身状态与 LiDAR 高度图即可在真实机器人上运行:使用 Unitree G1 部署,实现在室内外多种楼梯、草地、椅子场景下动作;在未知环境中无需任务标签,通过“地形+方向”自然触发相应行为。相较基线方法,VIDEOMIMIC 重建精度与泛化能力大幅提升:
arXiv, 2025-05-06T17:57:12Z. DOI: 10.48550/arXiv.2505.03729
Abstract:
How can we teach humanoids to climb staircases and sit on chairs using thesurrounding environment context? Arguably, the simplest way is to just showthem-casually capture a human motion video and … >>>
How can we teach humanoids to climb staircases and sit on chairs using thesurrounding environment context? Arguably, the simplest way is to just showthem-casually capture a human motion video and feed it to humanoids. Weintroduce VIDEOMIMIC, a real-to-sim-to-real pipeline that mines everydayvideos, jointly reconstructs the humans and the environment, and produceswhole-body control policies for humanoid robots that perform the correspondingskills. We demonstrate the results of our pipeline on real humanoid robots,showing robust, repeatable contextual control such as staircase ascents anddescents, sitting and standing from chairs and benches, as well as otherdynamic whole-body skills-all from a single policy, conditioned on theenvironment and global root commands. VIDEOMIMIC offers a scalable path towardsteaching humanoids to operate in diverse real-world environments. <<<
翻译
2.
刘馨云 (2025-05-31 21:32):
#paper https://arxiv.org/pdf/2505.20290 人类通过观察他人来学习新任务。受到这一点启发,我们提出了 EgoZero 框架,它可以从人类佩戴智能眼镜拍摄的第三人称视频中学习闭环机器人策略。智能眼镜能够捕捉人类交互的丰富多模态第一人称视角:RGB 视频记录周围场景,IMU(惯性测量单元)提供头部运动信息,麦克风则记录对话和环境声音。我们的方法仅通过观察这些第一人称视频来学习如何行动,无需任何机器人演示。当给定一个人类完成任务的视频时,EgoZero 能预测一系列中间目标和语言子目标,并据此在真实机器人上以闭环方式执行任务。EgoZero 将人类观察压缩为与机器人形态无关的状态表示,这些表示可用于决策和闭环控制。所学策略在不同的机器人形态、环境和任务之间表现出良好的泛化能力。我们在真实的 Franka Panda 机械臂上进行了验证,结果表明 EgoZero 能以 70% 的零样本成功率完成多种具有挑战性的操控任务,每项任务仅需 20 分钟的数据采集时间。
arXiv, 2025-05-26T17:59:17Z. DOI: 10.48550/arXiv.2505.20290
Abstract:
Despite recent progress in general purpose robotics, robot policies still lagfar behind basic human capabilities in the real world. Humans interactconstantly with the physical world, yet this rich data resource … >>>
Despite recent progress in general purpose robotics, robot policies still lagfar behind basic human capabilities in the real world. Humans interactconstantly with the physical world, yet this rich data resource remains largelyuntapped in robot learning. We propose EgoZero, a minimal system that learnsrobust manipulation policies from human demonstrations captured with ProjectAria smart glasses, $\textbf{and zero robot data}$. EgoZero enables: (1)extraction of complete, robot-executable actions from in-the-wild, egocentric,human demonstrations, (2) compression of human visual observations intomorphology-agnostic state representations, and (3) closed-loop policy learningthat generalizes morphologically, spatially, and semantically. We deployEgoZero policies on a gripper Franka Panda robot and demonstrate zero-shottransfer with 70% success rate over 7 manipulation tasks and only 20 minutes ofdata collection per task. Our results suggest that in-the-wild human data canserve as a scalable foundation for real-world robot learning - paving the waytoward a future of abundant, diverse, and naturalistic training data forrobots. Code and videos are available at https://egozero-robot.github.io. <<<
翻译
回到顶部