王昊
(2022-08-10 11:27):
#paper 10.48550/arXiv.2109.07872 TAN S, GE M, GUO D, 等. Knowledge-based Embodied Question Answering[J/OL]. 2021[2022-08-09]. https://arxiv.org/abs/2109.07872v1.清华孙富春组的文章,主要介绍具身智能体在AI2thor空间里回答针对周围环境的问题,且这些问题需要外部知识库的支持才能回答.
之前存在的问题:具身问答(EQA)不具备回答需要外部知识图谱的问题的能力(其实在KBVQA领域已经有人这么做过了),且不具备推理能力(其实什么可以被定义为推理挺难说的),多跳问答是一个较难的问题.,且现在的EQA系统不能使用遗忘的记忆来节省智能体重新探索的时间.
本文贡献:
1.提出了knowledge-EQA的任务,基于AI2THOR虚拟环境;
2.建立了数据集(数据集的种类只有一些很简单的问题,不是很难)
3.提出了基于 神经编程诊断、3D场景图、3D重建、问题转换为SQL语句、蒙特卡洛树搜索 等技术综合起来的方法来解决上述问题。
arXiv,
2021.
DOI: 10.48550/arXiv.2109.07872
Knowledge-based Embodied Question Answering
翻译
Abstract:
In this paper, we propose a novel Knowledge-based Embodied Question Answering (K-EQA) task, in which the agent intelligently explores the environment to answer various questions with the knowledge. Different from explicitly specifying the target object in the question as existing EQA work, the agent can resort to external knowledge to understand more complicated question such as "Please tell me what are objects used to cut food in the room?", in which the agent must know the knowledge such as "knife is used for cutting food". To address this K-EQA problem, a novel framework based on neural program synthesis reasoning is proposed, where the joint reasoning of the external knowledge and 3D scene graph is performed to realize navigation and question answering. Especially, the 3D scene graph can provide the memory to store the visual information of visited scenes, which significantly improves the efficiency for the multi-turn question answering. Experimental results have demonstrated that the proposed framework is capable of answering more complicated and realistic questions in the embodied environment. The proposed method is also applicable to multi-agent scenarios.
翻译