文献收藏与分享平台

前进 (2024-01-31 22:50):

#paper arxiv.org//pdf/2311.026 2023 Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection. 大型多模态模型 (LMM) GPT-4V(ision) 赋予 GPT-4 视觉grounding能力，使得通过视觉问答 (VQA) 范式处理某些任务成为可能。本文探讨了面向 VQA 的 GPT-4V 在最近流行的视觉异常检测（AD）中的潜力，并首次对流行的 MVTec AD 和 VisA 数据集进行定性和定量评估。考虑到该任务需要图像/像素级评估，提出的 GPT-4V-AD 框架包含三个组成部分：1）粒度区域划分，2）提示设计，3）用于轻松定量评估的 Text2Segmentation，并做了一些不同的尝试进行比较分析。结果表明，GPT-4V可以通过VQA范式在零样本AD任务中取得一定的结果，例如在MVTec AD和VisA数据集上分别实现图像级77.1/88.0和像素级68.0/76.6 AU-ROC 。然而，其性能与最先进的零样本方法（例如WinCLIP和CLIP-AD）相比仍然存在一定差距，需要进一步研究。这项研究为零样本 AD 任务中面向 VQA 的 LMM 的研究提供了基线参考

arXiv, 2023. DOI: 10.48550/arXiv.2311.02612

Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection

翻译

Jiangning Zhang, Xuhai Chen, Zhucun Xue, Yabiao Wang, Chengjie Wang, Yong Liu

Abstract:

Large Multimodal Model (LMM) GPT-4V(ision) endows GPT-4 with visual groundingcapabilities, making it possible to handle certain tasks through the VisualQuestion Answering (VQA) paradigm. This paper explores the potential ofVQA-oriented GPT-4V in the recently popular visual Anomaly Detection (AD) andis the first to conduct qualitative and quantitative evaluations on the popularMVTec AD and VisA datasets. Considering that this task requires bothimage-/pixel-level evaluations, the proposed GPT-4V-AD framework contains threecomponents: 1) Granular Region Division, 2) Prompt Designing, 3)Text2Segmentation for easy quantitative evaluation, and have made somedifferent attempts for comparative analysis. The results show that GPT-4V canachieve certain results in the zero-shot AD task through a VQA paradigm, suchas achieving image-level 77.1/88.0 and pixel-level 68.0/76.6 AU-ROCs on MVTecAD and VisA datasets, respectively. However, its performance still has acertain gap compared to the state-of-the-art zero-shot method, e.g., WinCLIPann CLIP-AD, and further research is needed. This study provides a baselinereference for the research of VQA-oriented LMM in the zero-shot AD task, and wealso post several possible future works. Code is available at\url{https://github.com/zhangzjn/GPT-4V-AD}.

翻译

Related Links:

http://arxiv.org/abs/2311.02612v1