文献收藏与分享平台

王昊 (2023-01-31 23:53):

#paper Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules http://arxiv.org/abs/2001.01568 Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. 2020. Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules. Retrieved January 31, 2023. VCM图像编码基线方法（cheng2020网络），用于机器视觉编码的特征提取阶段，是图像压缩方法类算法。作者提出使用离散的高斯混合似然来参数化潜在表示的分布，可以获得更准确和灵活的概率模型。此外，作者还使用attention module来提高网络对图像中复杂区域的关注能力。具体地,作者提出使用离散高斯混合模型来对latent representation进行熵估计，这样可以对y提供多个最可能的均值，而每一个mixture的方差可以更小，达到的效果是实现更准确的概率模型，节约编码y所需要的比特数。第二，作者还加入了简化版的attention modules，可以提高网络对于non-zero responses，即复杂区域的关注，同时不引入过多的训练复杂度。

arXiv, 2020.

Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules

翻译

Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto

Abstract:

Image compression is a fundamental research field and many well-known compression standards have been developed for many decades. Recently, learned compression methods exhibit a fast development trend with promising results. However, there is still a performance gap between learned compression algorithms and reigning compression standards, especially in terms of widely used PSNR metric. In this paper, we explore the remaining redundancy of recent learned compression algorithms. We have found accurate entropy models for rate estimation largely affect the optimization of network parameters and thus affect the rate-distortion performance. Therefore, in this paper, we propose to use discretized Gaussian Mixture Likelihoods to parameterize the distributions of latent codes, which can achieve a more accurate and flexible entropy model. Besides, we take advantage of recent attention modules and incorporate them into network architecture to enhance the performance. Experimental results demonstrate our proposed method achieves a state-of-the-art performance compared to existing learned compression methods on both Kodak and high-resolution datasets. To our knowledge our approach is the first work to achieve comparable performance with latest compression standard Versatile Video Coding (VVC) regarding PSNR. More importantly, our approach generates more visually pleasant results when optimized by MS-SSIM. This project page is at this https URL this https URL

翻译