Home > News> Content




17 April 2024|Research

LINGO-2:Driving with Natural Language

This blog introduces LINGO-2,a driving model that links vision,language,and action to explain and determine driving behavior,opening up a new dimension of control and customization for an autonomous driving experience.LINGO-2 is the first closed-loop vision-language-action driving model(VLAM)tested on public roads.

2023年9月,我们在介绍LINGO-1的博客中首次提出了在自动驾驶中应用自然语言的概念。LINGO-1是一个开环(open-loop)驾驶评论系统,朝着实现值得信赖的自动驾驶技术迈出了第一步。2023年11月,我们通过增加"边显示边讲述"的参考分割功能,进一步提高了LINGO-1响应的准确性和可信度。今天,我们很高兴地介绍Wayve公司在将自然语言融入驾驶模型方面取得的新进展:LINGO-2,这是一个闭环(closed-loop)视觉-语言-行动驾驶模型,简称VLAM(Vision-Language-Action Model)。它是全球首个在公共道路上进行测试的、基于语言训练的驾驶模型。在这篇博文中,我们将分享LINGO-2的技术细节,并通过示例展示它如何将语言和行动结合起来,加速Wayve的AI驾驶模型的安全开发。

Introducing LINGO-2,a closed-loop Vision-Language-Action-Model(VLAM)

我们之前的模型LINGO-1是一个开环的驾驶评论系统。它利用视觉和语言输入来执行视觉问答(Visual Question Answering,VQA),对驾驶场景进行描述、推理和关注点分析,但只能生成语言输出。这个研究模型是我们在利用语言来理解驾驶模型对场景理解的重要一步。而LINGO-2则更进一步,它能让我们深入了解驾驶模型的决策过程。

LINGO-2 combines vision and language as inputs and outputs,both driving action and language,to provide a continuous driving commentary of its motion planning decisions.LINGO-2 adapts its actions and explanations in accordance with various scene elements and is a strong first indication of the alignment between explanations and decision-making.By linking language and action directly,LINGO-2 sheds light on how AI systems make decisions and opens up a new level of control and customization for driving.


While LINGO-1 could retrospectively generate commentary on driving scenarios,its commentary was not integrated with the driving model.Therefore,its observations were not informed by actual driving decisions.However,LINGO-2 can both generate real-time driving commentary and control a car.The linking of these fundamental modalities underscores the model’s profound understanding of the contextual semantics of the situation,for example,explaining that it’s slowing down for pedestrians on the road or executing an overtaking maneuver.It’s a crucial step towards enhancing trust in our assisted and autonomous driving systems.


It opens up new possibilities for accelerating learning with natural language by incorporating a description of driving actions and causal reasoning into the model’s training.Natural language interfaces could,even in the future,allow users to engage in conversations with the driving model,making it easier for people to understand these systems and build trust.


Subcribe Weekly News Report
Please contact : pm.s@idcnova.com