前进 (2024-01-31 22:50):
#paper arxiv.org//pdf/2311.026 2023 Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection. 大型多模态模型 (LMM) GPT-4V(ision) 赋予 GPT-4 视觉grounding能力,使得通过视觉问答 (VQA) 范式处理某些任务成为可能。本文探讨了面向 VQA 的 GPT-4V 在最近流行的视觉异常检测(AD)中的潜力,并首次对流行的 MVTec AD 和 VisA 数据集进行定性和定量评估。 考虑到该任务需要图像/像素级评估,提出的 GPT-4V-AD 框架包含三个组成部分:1)粒度区域划分,2)提示设计,3)用于轻松定量评估的 Text2Segmentation,并做了一些不同的 尝试进行比较分析。 结果表明,GPT-4V可以通过VQA范式在零样本AD任务中取得一定的结果,例如在MVTec AD和VisA数据集上分别实现图像级77.1/88.0和像素级68.0/76.6 AU-ROC 。 然而,其性能与最先进的零样本方法(例如WinCLIP和CLIP-AD)相比仍然存在一定差距,需要进一步研究。 这项研究为零样本 AD 任务中面向 VQA 的 LMM 的研究提供了基线参考
Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection
Jiangning Zhang, Xuhai Chen, Zhucun Xue, Yabiao Wang, Chengjie Wang, Yong Liu
Abstract:
Large Multimodal Model (LMM) GPT-4V(ision) endows GPT-4 with visual grounding<br>capabilities, making it possible to handle certain tasks through the Visual<br>Question Answering (VQA) paradigm. This paper explores the potential of<br>VQA-oriented GPT-4V in the recently popular visual Anomaly Detection (AD) and<br>is the first to conduct qualitative and quantitative evaluations on the popular<br>MVTec AD and VisA datasets. Considering that this task requires both<br>image-/pixel-level evaluations, the proposed GPT-4V-AD framework contains three<br>components: 1) Granular Region Division, 2) Prompt Designing, 3)<br>Text2Segmentation for easy quantitative evaluation, and have made some<br>different attempts for comparative analysis. The results show that GPT-4V can<br>achieve certain results in the zero-shot AD task through a VQA paradigm, such<br>as achieving image-level 77.1/88.0 and pixel-level 68.0/76.6 AU-ROCs on MVTec<br>AD and VisA datasets, respectively. However, its performance still has a<br>certain gap compared to the state-of-the-art zero-shot method, e.g., WinCLIP<br>ann CLIP-AD, and further research is needed. This study provides a baseline<br>reference for the research of VQA-oriented LMM in the zero-shot AD task, and we<br>also post several possible future works. Code is available at<br>\url{https://github.com/zhangzjn/GPT-4V-AD}.
回到顶部