文献收藏与分享平台

林海onrush (2022-10-29 13:25):

#paper，CAUSAL DISCOVERY WITH REINFORCEMENT LEARNING，论文地址：https://arxiv.org/pdf/1906.04477.pdf，官方视频介绍：https://iclr.cc/virtual_2020/poster_S1g2skStPB.html，因果研究作为下一个潜在的热点，已经吸引了机器学习/深度学习领域的的广泛关注，因果研究中一个经典的问题是「因果发现」问题——从被动可观测的数据中发现潜在的因果图结构。此论文是华为诺亚方舟实验室被 ICLR 2020 接收的一篇满分论文。在此论文中，华为诺亚方舟实验室因果研究团队将强化学习应用到打分法的因果发现算法中，通过基于自注意力机制的 encoder-decoder 神经网络模型探索数据之间的关系，结合因果结构的条件，并使用策略梯度的强化学习算法对神经网络参数进行训练，最终得到因果图结构。在学术界常用的一些数据模型中，该方法在中等规模的图上的表现优于其他方法，包括传统的因果发现算法和近期的基于梯度的算法。同时该方法非常灵活，可以和任意的打分函数结合使用。

arXiv, 2019. DOI: 10.48550/arXiv.1906.04477

Causal Discovery with Reinforcement Learning

翻译

Shengyu Zhu, Ignavier Ng, Zhitang Chen

Abstract:

Discovering causal structure among a set of variables is a fundamental problem in many empirical sciences. Traditional score-based casual discovery methods rely on various local heuristics to search for a Directed Acyclic Graph (DAG) according to a predefined score function. While these methods, e.g., greedy equivalence search, may have attractive results with infinite samples and certain model assumptions, they are usually less satisfactory in practice due to finite data and possible violation of assumptions. Motivated by recent advances in neural combinatorial optimization, we propose to use Reinforcement Learning (RL) to search for the DAG with the best scoring. Our encoder-decoder model takes observable data as input and generates graph adjacency matrices that are used to compute rewards. The reward incorporates both the predefined score function and two penalty terms for enforcing acyclicity. In contrast with typical RL applications where the goal is to learn a policy, we use RL as a search strategy and our final output would be the graph, among all graphs generated during training, that achieves the best reward. We conduct experiments on both synthetic and real datasets, and show that the proposed approach not only has an improved search ability but also allows a flexible score function under the acyclicity constraint.

翻译