响马读paper

一个要求成员每月至少读一篇文献并打卡的学术交流社群

2019, arXiv. DOI: 10.48550/arXiv.1806.09055 arXiv ID: 1806.09055
DARTS: Differentiable Architecture Search
Hanxiao Liu, Karen Simonyan, Yiming Yang
Abstract:
This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Unlike conventional approaches of applying evolution or reinforcement learning over a discrete and non-differentiable search space, our method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent. Extensive experiments on CIFAR-10, ImageNet, Penn Treebank and WikiText-2 show that our algorithm excels in discovering high-performance convolutional architectures for image classification and recurrent architectures for language modeling, while being orders of magnitude faster than state-of-the-art non-differentiable techniques. Our implementation has been made publicly available to facilitate further research on efficient architecture search algorithms.
2022-04-30 20:39:00
#paper https://doi.org/10.48550/arXiv.1806.09055 DARTS: differentiable architecture search ICLR(2019) Neural Architectural Search (NAS) 这个问题是出了名的消耗算力,动不动就需要消耗上千个gpu hour,基本也只能在顶级的研究机构做这类研究。这篇文章没有使用类似于进化算法或者强化学习这样的方法在离散和不可微的空间中搜索网络架构, 而是通过对神经网络的架构表征进行松弛,将NAS问题转化为一个可微分的形式,从而能够使用梯度下降法在连续空间中搜索神经网络架构。作者将这个问题建模成一个bilevel的优化问题,然后提出了一个类似于EM算法的优化方法,通过交替优化模型架构参数\alpha和模型权重w来找到较优的模型架构\alpha 。由于优化过程中涉及二阶导的计算,作者进一步对二阶导的计算做了松弛,将其转化为形式为一阶导的估计,从而进一步降低了方法的复杂度。结果也都很漂亮,相比于之前那些动辄需要上千个gpu day的计算量,darts方法只需要几个gpu day的计算,而且也能达到差不多的效果。
TOP