Paper-Hub

2023, arXiv. DOI: 10.48550/arXiv.2304.02643 arXiv ID: 2304.02643

Segment Anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick

Abstract:

We introduce the Segment Anything (SA) project: a new task, model, and
dataset for image segmentation. Using our efficient model in a data collection
loop, we built the largest segmentation dataset to date (by far), with over 1
billion masks on 11M licensed and privacy respecting images. The model is
designed and trained to be promptable, so it can transfer zero-shot to new
image distributions and tasks. We evaluate its capabilities on numerous tasks
and find that its zero-shot performance is impressive -- often competitive with
or even superior to prior fully supervised results. We are releasing the
Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and
11M images at https://segment-anything.com to foster research into foundation
models for computer vision.

Related Links:

http://arxiv.org/abs/2304.02643v1

2024-01-31 10:39:00

尹志:

#paper doi: https://doi.org/10.48550/arXiv.2304.02643 Segment Anything。Meta在2023年的一篇工作，提出了一个CV领域的基础模型。文章的目标很清楚，通过prompt的方式，实现通用的segmentatoin任务。虽然在互联网上爆炒一轮后趋于平淡，但是对CV社区的影响还是非常大的。后续的Grounding-DINO,Grounded-SAM等工作，都有着不错的效果，而且对后续CV任务的解决给出了一套不同的思考范式。整个工作偏工程，或者想法上原创性的亮点不多，网络结构上也充分借鉴了大量基于Transformer的创新工作。值得一提的正是工程上的思路或者说解决方案。meta提出了一个新颖的任务，即：如何通过一个通用的任务来解决图像分割。进而设计训练流程和对应的损失。在过程中，设计了一套有效的数据标注引擎，实现了高效标注数据生产，这对于行业应用有着很强的借鉴价值。从研究角度来看，如何充分利用预训练好的sam模型，大模型中的先验如何提取，从而为特定领域下游任务提供支持是一个重要的研究方向。