Paper-Hub

2023, arXiv. DOI: 10.48550/arXiv.2307.05973 arXiv ID: 2307.05973

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

Wenlong Huang, Chen Wang, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Li Fei-Fei

Abstract:

Large language models (LLMs) are shown to possess a wealth of actionable knowledge that can be extracted for robot manipulation in the form of reasoning and planning. Despite the progress, most still rely on pre-defined motion primitives to carry out the physical interactions with the environment, which remains a major bottleneck. In this work, we aim to synthesize robot trajectories, i.e., a dense sequence of 6-DoF end-effector waypoints, for a large variety of manipulation tasks given an open-set of instructions and an open-set of objects. We achieve this by first observing that LLMs excel at inferring affordances and constraints given a free-form language instruction. More importantly, by leveraging their code-writing capabilities, they can interact with a visual-language model (VLM) to compose 3D value maps to ground the knowledge into the observation space of the agent. The composed value maps are then used in a model-based planning framework to zero-shot synthesize closed-loop robot trajectories with robustness to dynamic perturbations. We further demonstrate how the proposed framework can benefit from online experiences by efficiently learning a dynamics model for scenes that involve contact-rich interactions. We present a large-scale study of the proposed method in both simulated and real-robot environments, showcasing the ability to perform a large variety of everyday manipulation tasks specified in free-form natural language. Project website: this https URL

2023-07-31 16:41:00

符毓 Yu:

#paper doi: 10.48550/arXiv.2307.05973 2023, Composable 3D Value Maps for Robotic Manipulation with Language Models. 李飞飞团队最新论文研究，把语言模型与机器人操作结合。与大语言模型结合后人机交互效率得到提高，并且能做到基于视觉的实时轨迹规划。目测机械臂移动速率为常见机械臂工作速率的八分之一，到真实应用的话稳定性还需要进一步提高（超过25%的出错率）