文献收藏与分享平台

🐼太真实 (2024-01-30 21:45):

#paper: doi:2110.11316 文章《CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP》介绍了一种名为CLOOB（Contrastive Leave One Out Boost）的新型自监督学习方法。这种方法结合了现代霍普菲尔德网络（Modern Hopfield Networks）和InfoLOOB目标（Leave One Out Bound），用于提升对比学习的效能。CLOOB在零样本转移学习（zero-shot transfer learning）方面，不论在哪种架构或数据集上，均优于之前的CLIP方法。 CLOOB的核心是使用现代霍普菲尔德网络来增强数据的共现性和协方差结构。这种网络与传统的霍普菲尔德网络相比，具有更高的存储容量和更快的检索速度。通过使用这些网络，CLOOB能够加强输入样本中特征的共现性和协方差结构，有效地提取和强化数据中的重要特征。此外，CLOOB还采用了InfoLOOB目标函数来避免InfoNCE目标函数中出现的饱和问题。InfoLOOB目标是一种对比学习的目标，用于处理匹配对和不匹配对之间的关系，以减少目标函数的饱和，并使得学习过程更加高效。

arXiv, 2021. DOI: 10.48550/arXiv.2110.11316

CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

翻译

Andreas Fürst, Elisabeth Rumetshofer, Johannes Lehner, Viet Tran, Fei Tang, Hubert Ramsauer, David Kreil, Michael Kopp, Günter Klambauer, Angela Bitto-Nemling, ... >>>

Abstract:

CLIP yielded impressive results on zero-shot transfer learning tasks and isconsidered as a foundation model like BERT or GPT3. CLIP vision models thathave a rich representation are pre-trained using the InfoNCE objective andnatural language supervision before they are fine-tuned on particular tasks.Though CLIP excels at zero-shot transfer learning, it suffers from anexplaining away problem, that is, it focuses on one or few features, whileneglecting other relevant features. This problem is caused by insufficientlyextracting the covariance structure in the original multi-modal data. Wesuggest to use modern Hopfield networks to tackle the problem of explainingaway. Their retrieved embeddings have an enriched covariance structure derivedfrom co-occurrences of features in the stored embeddings. However, modernHopfield networks increase the saturation effect of the InfoNCE objective whichhampers learning. We propose to use the InfoLOOB objective to mitigate thissaturation effect. We introduce the novel "Contrastive Leave One Out Boost"(CLOOB), which uses modern Hopfield networks for covariance enrichment togetherwith the InfoLOOB objective. In experiments we compare CLOOB to CLIP afterpre-training on the Conceptual Captions and the YFCC dataset with respect totheir zero-shot transfer learning performance on other datasets. CLOOBconsistently outperforms CLIP at zero-shot transfer learning across allconsidered architectures and datasets.

翻译

Related Links:

http://arxiv.org/abs/2110.11316v4