来自用户 林海onrush 的文献。
当前共找到 43 篇文献分享,本页显示第 41 - 43 篇。
41.
林海onrush (2022-10-29 13:25):
#paper,CAUSAL DISCOVERY WITH REINFORCEMENT LEARNING,论文地址:https://arxiv.org/pdf/1906.04477.pdf,官方视频介绍:https://iclr.cc/virtual_2020/poster_S1g2skStPB.html, 因果研究作为下一个潜在的热点,已经吸引了机器学习/深度学习领域的的广泛关注,因果研究中一个经典的问题是「因果发现」问题——从被动可观测的数据中发现潜在的因果图结构。 此论文是华为诺亚方舟实验室被 ICLR 2020 接收的一篇满分论文。在此论文中,华为诺亚方舟实验室因果研究团队将强化学习应用到打分法的因果发现算法中,通过基于自注意力机制的 encoder-decoder 神经网络模型探索数据之间的关系,结合因果结构的条件,并使用策略梯度的强化学习算法对神经网络参数进行训练,最终得到因果图结构。在学术界常用的一些数据模型中,该方法在中等规模的图上的表现优于其他方法,包括传统的因果发现算法和近期的基于梯度的算法。同时该方法非常灵活,可以和任意的打分函数结合使用。
Abstract:
Discovering causal structure among a set of variables is a fundamental problem in many empirical sciences. Traditional score-based casual discovery methods rely on various local heuristics to search for a … >>>
Discovering causal structure among a set of variables is a fundamental problem in many empirical sciences. Traditional score-based casual discovery methods rely on various local heuristics to search for a Directed Acyclic Graph (DAG) according to a predefined score function. While these methods, e.g., greedy equivalence search, may have attractive results with infinite samples and certain model assumptions, they are usually less satisfactory in practice due to finite data and possible violation of assumptions. Motivated by recent advances in neural combinatorial optimization, we propose to use Reinforcement Learning (RL) to search for the DAG with the best scoring. Our encoder-decoder model takes observable data as input and generates graph adjacency matrices that are used to compute rewards. The reward incorporates both the predefined score function and two penalty terms for enforcing acyclicity. In contrast with typical RL applications where the goal is to learn a policy, we use RL as a search strategy and our final output would be the graph, among all graphs generated during training, that achieves the best reward. We conduct experiments on both synthetic and real datasets, and show that the proposed approach not only has an improved search ability but also allows a flexible score function under the acyclicity constraint. <<<
翻译
42.
林海onrush (2022-09-30 22:25):
#paper arXiv, 2209.00796 (2022) , Diffusion Models: A Comprehensive Survey of Methods and Applications, Diffusion model在诸多领域都有着优异的表现,并且考虑到不同领域的应用中diffusion model产生了不同的变形,论文系统地介绍了diffusion model的应用研究,其中包含如下领域:计算机视觉,NLP、波形信号处理、多模态建模、分子图建模、时间序列建模、对抗性净化。工作的主要贡献总结如下:新的分类方法:我们对扩散模型和其应用提出了一种新的、系统的分类法。具体将模型分为三类:采样速度增强、最大似然估计增强、数据泛化增强。进一步地,将扩散模型的应用分为七类:计算机视觉,NLP、波形信号处理、多模态建模、分子图建模、时间序列建模、对抗性净化。全面地概述了现代扩散模型及其应用,展示了每种扩散模型的主要改进,和原始模型进行了必要的比较,并总结了相应的论文。扩散模型的基本思想是正向扩散过程来系统地扰动数据中的分布,然后通过学习反向扩散过程恢复数据的分布,这样就了产生一个高度灵活且易于计算的生成模型。
Abstract:
Diffusion models are a class of deep generative models that have shown impressive results on various tasks with a solid theoretical foundation. Despite demonstrated success than state-of-the-art approaches, diffusion models … >>>
Diffusion models are a class of deep generative models that have shown impressive results on various tasks with a solid theoretical foundation. Despite demonstrated success than state-of-the-art approaches, diffusion models often entail costly sampling procedures and sub-optimal likelihood estimation. Significant efforts have been made to improve the performance of diffusion models in various aspects. In this article, we present a comprehensive review of existing variants of diffusion models. Specifically, we provide the taxonomy of diffusion models and categorize them into three types: sampling-acceleration enhancement, likelihood-maximization enhancement, and data-generalization enhancement. We also introduce the other generative models (i.e., variational autoencoders, generative adversarial networks, normalizing flow, autoregressive models, and energy-based models) and discuss the connections between diffusion models and these generative models. Then we review the applications of diffusion models, including computer vision, natural language processing, waveform signal processing, multi-modal modeling, molecular graph generation, time series modeling, and adversarial purification. Furthermore, we propose new perspectives pertaining to the development of generative models. Github: this https URL. <<<
翻译
43.
林海onrush (2022-08-07 22:47):
#paper arXiv:2207.03530v1 [cs.RO] 7 Jul 2022,VMAS: A Vectorized Multi-Agent Simulator for Collective Robot Learning,https://deepai.org/publication/vmas-a-vectorized-multi-agent-simulator-for-collective-robot-learning 剑桥大学提出多智能体联合强化学习框架VMAS 虽然许多多机器人协调问题可以通过精确的算法得到最佳解决,但解决方案在机器人的数量上往往是不可扩展的。多智能体强化学习(MARL)作为解决这类问题的一个有希望的解决方案,在机器人界越来越受到关注。然而,仍然缺乏能够快速有效地找到大规模集体学习任务解决方案的工具。在这项工作中,介绍了VMAS。VMAS是一个开源的框架,为高效的MARL基准测试而设计。它由一个用PyTorch编写的矢量二维物理引擎和一套12个具有挑战性的多机器人场景组成。其他场景可以通过一个简单的模块化接口来实现。 本文展示了矢量化是如何在不增加复杂性的情况下在加速硬件上实现并行仿真的,比较了VMAS和目前的最优框架OpenAI MPE,表明了其速度超过了MPE100倍,同时本文使用VMAS进行了各种基准测试,表明了现有算法存在的挑战。 VMAS 能够在 10 秒内执行 30,000 次并行仿真,速度提高了 100 倍以上。使用 VMAS 的 RLlib 接口,我们使用各种基于近端策略优化 (PPO) 的 MARL 算法对我们的多机器人场景进行基准测试。 VMAS 的场景在最先进的 MARL 算法的正交方法。 VMAS 框架可在以下网址获得并可进行复现:https://github.com/proroklab/VectorizedMultiAgentSimulator
arXiv, 2022. DOI: arXiv:2207.03530
Abstract:
While many multi-robot coordination problems can be solved optimally by exact algorithms, solutions are often not scalable in the number of robots. Multi-Agent Reinforcement Learning (MARL) is gaining increasing attention … >>>
While many multi-robot coordination problems can be solved optimally by exact algorithms, solutions are often not scalable in the number of robots. Multi-Agent Reinforcement Learning (MARL) is gaining increasing attention in the robotics community as a promising solution to tackle such problems. Nevertheless, we still lack the tools that allow us to quickly and efficiently find solutions to large-scale collective learning tasks. In this work, we introduce the Vectorized Multi-Agent Simulator (VMAS). VMAS is an open-source framework designed for efficient MARL benchmarking. It is comprised of a vectorized 2D physics engine written in PyTorch and a set of twelve challenging multi-robot scenarios. Additional scenarios can be implemented through a simple and modular interface. We demonstrate how vectorization enables parallel simulation on accelerated hardware without added complexity. When comparing VMAS to OpenAI MPE, we show how MPE's execution time increases linearly in the number of simulations while VMAS is able to execute 30,000 parallel simulations in under 10s, proving more than 100x faster. Using VMAS's RLlib interface, we benchmark our multi-robot scenarios using various Proximal Policy Optimization (PPO)-based MARL algorithms. VMAS's scenarios prove challenging in orthogonal ways for state-of-the-art MARL algorithms. The VMAS framework is available at this https URL. A video of VMAS scenarios and experiments is available at this https URL}{here}\footnote{\url{this https URL. <<<
翻译
回到顶部