Paper-Hub

2023, arXiv. DOI: 10.48550/arXiv.2311.05332 arXiv ID: 2311.05332

On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

Licheng Wen, Xuemeng Yang, Daocheng Fu, Xiaofeng Wang, Pinlong Cai, Xin Li, Tao Ma, Yingxuan Li, Linran Xu, Dengke Shang, Zheng Zhu, Shaoyan Sun, Yeqi Bai, Xinyu Cai, Min Dou, Shuanglu Hu, Botian Shi, Yu Qiao

Abstract:

The pursuit of autonomous driving technology hinges on the sophisticated
integration of perception, decision-making, and control systems. Traditional
approaches, both data-driven and rule-based, have been hindered by their
inability to grasp the nuance of complex driving environments and the
intentions of other road users. This has been a significant bottleneck,
particularly in the development of common sense reasoning and nuanced scene
understanding necessary for safe and reliable autonomous driving. The advent of
Visual Language Models (VLM) represents a novel frontier in realizing fully
autonomous vehicle driving. This report provides an exhaustive evaluation of
the latest state-of-the-art VLM, GPT-4V(ision), and its application in
autonomous driving scenarios. We explore the model's abilities to understand
and reason about driving scenes, make decisions, and ultimately act in the
capacity of a driver. Our comprehensive tests span from basic scene recognition
to complex causal reasoning and real-time decision-making under varying
conditions. Our findings reveal that GPT-4V demonstrates superior performance
in scene understanding and causal reasoning compared to existing autonomous
systems. It showcases the potential to handle out-of-distribution scenarios,
recognize intentions, and make informed decisions in real driving contexts.
However, challenges remain, particularly in direction discernment, traffic
light recognition, vision grounding, and spatial reasoning tasks. These
limitations underscore the need for further research and development. Project
is now available on GitHub for interested parties to access and utilize:
\url{https://github.com/PJLab-ADG/GPT4V-AD-Exploration}

Related Links:

http://arxiv.org/abs/2311.05332v2

2023-11-30 23:11:00

符毓 Yu:

#paper doi.org/10.48550/arXiv.2311.05332, 2023, On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving. 文远知行的团队近期的论文，把GPT应用在自动驾驶领域。测试结果显示GPT在图像识别，点云识别，天气识别，V2X图像，模拟图像识别，多角度图片识别都有较高准确率；在交通灯识别，左右空间区分上容易出错