🐼太真实 (2024-01-30 21:45):
#paper: doi:2110.11316 文章《CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP》介绍了一种名为CLOOB(Contrastive Leave One Out Boost)的新型自监督学习方法。这种方法结合了现代霍普菲尔德网络(Modern Hopfield Networks)和InfoLOOB目标(Leave One Out Bound),用于提升对比学习的效能。CLOOB在零样本转移学习(zero-shot transfer learning)方面,不论在哪种架构或数据集上,均优于之前的CLIP方法。 CLOOB的核心是使用现代霍普菲尔德网络来增强数据的共现性和协方差结构。这种网络与传统的霍普菲尔德网络相比,具有更高的存储容量和更快的检索速度。通过使用这些网络,CLOOB能够加强输入样本中特征的共现性和协方差结构,有效地提取和强化数据中的重要特征。 此外,CLOOB还采用了InfoLOOB目标函数来避免InfoNCE目标函数中出现的饱和问题。InfoLOOB目标是一种对比学习的目标,用于处理匹配对和不匹配对之间的关系,以减少目标函数的饱和,并使得学习过程更加高效。
CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP
Andreas Fürst, Elisabeth Rumetshofer, Johannes Lehner, Viet Tran, Fei Tang, Hubert Ramsauer, David Kreil, Michael Kopp, Günter Klambauer, Angela Bitto-Nemling, ... >>>
Andreas Fürst, Elisabeth Rumetshofer, Johannes Lehner, Viet Tran, Fei Tang, Hubert Ramsauer, David Kreil, Michael Kopp, Günter Klambauer, Angela Bitto-Nemling, Sepp Hochreiter <<<
Abstract:
CLIP yielded impressive results on zero-shot transfer learning tasks and is<br>considered as a foundation model like BERT or GPT3. CLIP vision models that<br>have a rich representation are pre-trained using the InfoNCE objective and<br>natural language supervision before they are fine-tuned on particular tasks.<br>Though CLIP excels at zero-shot transfer learning, it suffers from an<br>explaining away problem, that is, it focuses on one or few features, while<br>neglecting other relevant features. This problem is caused by insufficiently<br>extracting the covariance structure in the original multi-modal data. We<br>suggest to use modern Hopfield networks to tackle the problem of explaining<br>away. Their retrieved embeddings have an enriched covariance structure derived<br>from co-occurrences of features in the stored embeddings. However, modern<br>Hopfield networks increase the saturation effect of the InfoNCE objective which<br>hampers learning. We propose to use the InfoLOOB objective to mitigate this<br>saturation effect. We introduce the novel "Contrastive Leave One Out Boost"<br>(CLOOB), which uses modern Hopfield networks for covariance enrichment together<br>with the InfoLOOB objective. In experiments we compare CLOOB to CLIP after<br>pre-training on the Conceptual Captions and the YFCC dataset with respect to<br>their zero-shot transfer learning performance on other datasets. CLOOB<br>consistently outperforms CLIP at zero-shot transfer learning across all<br>considered architectures and datasets.
回到顶部