来自用户 🐼太真实 的文献。
当前共找到 3 篇文献分享。
1.
🐼太真实 (2024-02-29 10:04):
#paper ProPainter: Improving Propagation and Transformer for Video Inpainting 本文介绍了一种新的视频修复技术——ProPainter,通过双域传播和掩码引导稀疏视频Transformer的设计,实现了高效而准确的视频修复。文章详细介绍了ProPainter的三个关键组成部分:循环流场完成、双域传播和掩码引导稀疏视频Transformer,并提供了相应的技术细节和实验结果。
Abstract:
Flow-based propagation and spatiotemporal Transformer are two mainstreammechanisms in video inpainting (VI). Despite the effectiveness of thesecomponents, they still suffer from some limitations that affect theirperformance. Previous propagation-based approaches are … >>>
Flow-based propagation and spatiotemporal Transformer are two mainstreammechanisms in video inpainting (VI). Despite the effectiveness of thesecomponents, they still suffer from some limitations that affect theirperformance. Previous propagation-based approaches are performed separatelyeither in the image or feature domain. Global image propagation isolated fromlearning may cause spatial misalignment due to inaccurate optical flow.Moreover, memory or computational constraints limit the temporal range offeature propagation and video Transformer, preventing exploration ofcorrespondence information from distant frames. To address these issues, wepropose an improved framework, called ProPainter, which involves enhancedProPagation and an efficient Transformer. Specifically, we introducedual-domain propagation that combines the advantages of image and featurewarping, exploiting global correspondences reliably. We also propose amask-guided sparse video Transformer, which achieves high efficiency bydiscarding unnecessary and redundant tokens. With these components, ProPainteroutperforms prior arts by a large margin of 1.46 dB in PSNR while maintainingappealing efficiency. <<<
翻译
2.
🐼太真实 (2024-01-30 21:45):
#paper: doi:2110.11316 文章《CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP》介绍了一种名为CLOOB(Contrastive Leave One Out Boost)的新型自监督学习方法。这种方法结合了现代霍普菲尔德网络(Modern Hopfield Networks)和InfoLOOB目标(Leave One Out Bound),用于提升对比学习的效能。CLOOB在零样本转移学习(zero-shot transfer learning)方面,不论在哪种架构或数据集上,均优于之前的CLIP方法。 CLOOB的核心是使用现代霍普菲尔德网络来增强数据的共现性和协方差结构。这种网络与传统的霍普菲尔德网络相比,具有更高的存储容量和更快的检索速度。通过使用这些网络,CLOOB能够加强输入样本中特征的共现性和协方差结构,有效地提取和强化数据中的重要特征。 此外,CLOOB还采用了InfoLOOB目标函数来避免InfoNCE目标函数中出现的饱和问题。InfoLOOB目标是一种对比学习的目标,用于处理匹配对和不匹配对之间的关系,以减少目标函数的饱和,并使得学习过程更加高效。
Abstract:
CLIP yielded impressive results on zero-shot transfer learning tasks and isconsidered as a foundation model like BERT or GPT3. CLIP vision models thathave a rich representation are pre-trained using the … >>>
CLIP yielded impressive results on zero-shot transfer learning tasks and isconsidered as a foundation model like BERT or GPT3. CLIP vision models thathave a rich representation are pre-trained using the InfoNCE objective andnatural language supervision before they are fine-tuned on particular tasks.Though CLIP excels at zero-shot transfer learning, it suffers from anexplaining away problem, that is, it focuses on one or few features, whileneglecting other relevant features. This problem is caused by insufficientlyextracting the covariance structure in the original multi-modal data. Wesuggest to use modern Hopfield networks to tackle the problem of explainingaway. Their retrieved embeddings have an enriched covariance structure derivedfrom co-occurrences of features in the stored embeddings. However, modernHopfield networks increase the saturation effect of the InfoNCE objective whichhampers learning. We propose to use the InfoLOOB objective to mitigate thissaturation effect. We introduce the novel "Contrastive Leave One Out Boost"(CLOOB), which uses modern Hopfield networks for covariance enrichment togetherwith the InfoLOOB objective. In experiments we compare CLOOB to CLIP afterpre-training on the Conceptual Captions and the YFCC dataset with respect totheir zero-shot transfer learning performance on other datasets. CLOOBconsistently outperforms CLIP at zero-shot transfer learning across allconsidered architectures and datasets. <<<
翻译
3.
🐼太真实 (2023-12-28 20:39):
#paper https://doi.org/10.48550/arXiv.2312.03701 , Self-conditioned Image Generation via Generating Representations 这篇文章介绍了一种名为“表示条件图像生成”(RCG)的新型图像生成框架。RCG 不依赖于人类标注,而是基于自监督的表示分布来生成图像。使用预训练的编码器将图像分布映射到表示分布,然后通过表示扩散模型(RDM)从中采样,最后通过像素生成器根据采样的表示生成图像。RCG 在 ImageNet 256×256 数据集上实现了显著的性能提升,其 FID 和 IS 分别达到了 3.31 和 253.4。这个方法不仅显著提升了类无条件图像生成的水平,而且与当前领先的类条件图像生成方法相比也具有竞争力,弥补了这两种任务之间长期存在的性能差距。
Abstract:
This paper presents $\textbf{R}$epresentation-$\textbf{C}$onditioned image$\textbf{G}$eneration (RCG), a simple yet effective image generation frameworkwhich sets a new benchmark in class-unconditional image generation. RCG doesnot condition on any human annotations. Instead, it … >>>
This paper presents $\textbf{R}$epresentation-$\textbf{C}$onditioned image$\textbf{G}$eneration (RCG), a simple yet effective image generation frameworkwhich sets a new benchmark in class-unconditional image generation. RCG doesnot condition on any human annotations. Instead, it conditions on aself-supervised representation distribution which is mapped from the imagedistribution using a pre-trained encoder. During generation, RCG samples fromsuch representation distribution using a representation diffusion model (RDM),and employs a pixel generator to craft image pixels conditioned on the sampledrepresentation. Such a design provides substantial guidance during thegenerative process, resulting in high-quality image generation. Tested onImageNet 256$\times$256, RCG achieves a Frechet Inception Distance (FID) of3.31 and an Inception Score (IS) of 253.4. These results not only significantlyimprove the state-of-the-art of class-unconditional image generation but alsorival the current leading methods in class-conditional image generation,bridging the long-standing performance gap between these two tasks. Code isavailable at https://github.com/LTH14/rcg. <<<
翻译
回到顶部