当前共找到 2 篇文献分享。
1.
张浩彬
(2022-11-10 00:03):
#paper Momentum Contrast for Unsupervised Visual Representation Learning
doi:10.1109/cvpr42600.2020.00975
大名鼎鼎的moco。之前只是粗略了解,今天算是认真精读了一下。受nlp的影响,cv也开始了自监督方法的新一代卷了。
自监督,其实也是无监督了。倒是为了和前人分开,又起了self supervised learning的名字。moco这篇论文,算是趟平了有监督和无监督的差距了,第一次用无监督的方法取得了比有监督预训练任务更好的结果。毕竟人工打标签还是很贵的,如果可以利用无监督的方法对模型进行预训练,那么可以说大大降低了受限于标注数据的性能瓶颈了。
说回本文的技术,可以算是自监督目前的一个主流了(另一个是生成式)。在对比学习中,关键在于:1代理任务;2损失函数。当然本文的突出主要贡献还是在于动量更新方法。
1.moco中的代理任务,选择了比较简单的个体判别,即对于原始数据某个样本x_i,通过两个不同的数据增强,获得锚点样本和正样本;而其他样本,则是相对于该基础样本的负样本。另外,作者提到,把锚点称之为q(query),正样本和负样本对称之为k(key)。
2.损失函数是infonce,本质是其实还是类似于softmax。但是考虑到我们有这么多负样本,实际上就有这么多类别,所以选用了infonce,超参数是“温度”
3.接下里是本文的两个贡献,或者说回归对比学习,作者也提到受到两个问题制约:1是字典大小;2是字典一致性(字典姑且理解为负样本集,对比学习中,负样本集越大越好。另外moco是一个正样本,但也有文献证明,使用多个正样本更好)。(1)字典大小问题:在simclr这样的方法中,实际上每个batch都是对应的字典,这样就保证字典一致性,但是问题是字典大小受限制。要对这么多一个字典是反向传播,需要GPU非常大的内存,其次就是大batchsize的优化,相对也更难。(2)字典一致性问题:相比于simclr这样的端到端方法,另一个套路是使用memory bank的方式。我们依然有一个很大的字典,每次从字典抽样k个负样本进行梯度更新。但是这样的问题在于每次我们只更新了抽样的k个样本,实际上,这时候整个字典的特征是很不一致的(因为我们每次用一个较大梯度去更新所选k个样本,这样对于一个大字典,一个epoch后,第一次更新和最后一次更新的特征会差别很大)
4.针对以上问题,作者首先提出队列作为字典。即每次更新新的特征后,最久的特征剔除出字典,新的特征进度。
5.其次就是动量更新。及我们依然使用梯度下降更新q的encoder。但是我们不再使用梯度更新更细k的encoder,而是使用动量的方式,即,(另外,初始化时,k的encoder是直接复制q的encoder)并且m选择一个非常大的数,例如0.999,这就保证了每次更新都只更新一点,从而保证了字典一致性。
5.接下里是下游任务实现。作者冻结了主干网络,之后只使用了Linear Classification Protocol(一个全连接层和softmax层)作为最后的分类器,与经过ImageNet预训练后的其他模型进行比较,除少部分任务外,moco基本都取得了sota结果。
6.另一个有意思的地方是,moco的分类器,用girdsearch日常搜索发现最优的学习率是30.作者解释到,这也说明了自监督得到的特征确实与有监督得到的特征差别很大。另外在后续比较中,考虑到使用grid search不方便,作者使用了归一化处理。
Abstract:
We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged …
>>>
We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks.
<<<
翻译
2.
前进
(2022-06-30 17:14):
#paper doi:10.1109/CVPR42600.2020.00470 CVPR 2020 Fast Symmetric Diffeomorphic Image Registration with Convolutional Neural Networks 这篇图像配准论文的思路新颖,不同于以往浮动图像朝着固定图像配准的思路,本文将浮动图像和固定图像同时朝着中间图像进行配准。在图像配准过程中,需要保证变形场的微分同胚性,即需要保留图像的拓扑结构,保证变形场是可逆的(不发生折叠)。以往的基于学习的方法通常通过给变形场施加一个全局的正则化来实现这一要求。但是这种做法引入了超参数,要么容易导致变形场过度平坦使得配准精度下降,要么变形场变形过大无法保证变形场不发生折叠。受到传统的对称图像归一化方法的启发,本文提出了一种新的、有效的无监督对称图像配准方法,该方法使微分纯映射空间内图像之间的相似性最大化,并同时估计正变换和逆变换,使得输入的图像从两个方向朝中间对齐,能够同时保证配准精度和变形场的微分同胚性。
Abstract:
Diffeomorphic deformable image registration is crucial in many medical image studies, as it offers unique, special features including topology preservation and invertibility of the transformation. Recent deep learning-based deformable image …
>>>
Diffeomorphic deformable image registration is crucial in many medical image studies, as it offers unique, special features including topology preservation and invertibility of the transformation. Recent deep learning-based deformable image registration methods achieve fast image registration by leveraging a convolutional neural network (CNN) to learn the spatial transformation from the synthetic ground truth or the similarity metric. However, these approaches often ignore the topology preservation of the transformation and the smoothness of the transformation which is enforced by a global smoothing energy function alone. Moreover, deep learning-based approaches often estimate the displacement field directly, which cannot guarantee the existence of the inverse transformation. In this paper, we present a novel, efficient unsupervised symmetric image registration method which maximizes the similarity between images within the space of diffeomorphic maps and estimates both forward and inverse transformations simultaneously. We evaluate our method on 3D image registration with a large scale brain image dataset. Our method achieves state-of-the-art registration accuracy and running time while maintaining desirable diffeomorphic properties.
<<<
翻译