响马读paper

一个要求成员每月至少读一篇文献并打卡的学术交流社群

2023, arXiv. DOI: 10.48550/arXiv.2304.02643 arXiv ID: 2304.02643
Segment Anything
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick
Abstract:
We introduce the Segment Anything (SA) project: a new task, model, and
dataset for image segmentation. Using our efficient model in a data collection
loop, we built the largest segmentation dataset to date (by far), with over 1
billion masks on 11M licensed and privacy respecting images. The model is
designed and trained to be promptable, so it can transfer zero-shot to new
image distributions and tasks. We evaluate its capabilities on numerous tasks
and find that its zero-shot performance is impressive -- often competitive with
or even superior to prior fully supervised results. We are releasing the
Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and
11M images at https://segment-anything.com to foster research into foundation
models for computer vision.
2024-01-31 10:39:00
#paper doi: https://doi.org/10.48550/arXiv.2304.02643 Segment Anything。Meta在2023年的一篇工作,提出了一个CV领域的基础模型。文章的目标很清楚,通过prompt的方式,实现通用的segmentatoin任务。虽然在互联网上爆炒一轮后趋于平淡,但是对CV社区的影响还是非常大的。后续的Grounding-DINO,Grounded-SAM等工作,都有着不错的效果,而且对后续CV任务的解决给出了一套不同的思考范式。整个工作偏工程,或者想法上原创性的亮点不多,网络结构上也充分借鉴了大量基于Transformer的创新工作。值得一提的正是工程上的思路或者说解决方案。meta提出了一个新颖的任务,即:如何通过一个通用的任务来解决图像分割。进而设计训练流程和对应的损失。在过程中,设计了一套有效的数据标注引擎,实现了高效标注数据生产,这对于行业应用有着很强的借鉴价值。 从研究角度来看,如何充分利用预训练好的sam模型,大模型中的先验如何提取,从而为特定领域下游任务提供支持是一个重要的研究方向。
TOP