数字基建全球采购者平台

LINGO-2正在重新定义自动驾驶？

2024-04-26，数字开物

当谈到自动驾驶，特斯拉的"纯视觉"方案通常占据头条。这种以视觉感知为主的技术路线，试图通过深度学习算法分析车载摄像头采集的影像，实现对车辆周围环境的感知和理解。然而，Wayve公司的研究团队却另辟蹊径，将语言交互引入自动驾驶系统。他们的最新成果LINGO-2，让智能汽车不仅能听懂驾驶员的口头指令，还能用自然语言解释自身决策。这一突破性进展为无人驾驶开启了全新的可能性。

LINGO-2并非要取代视觉感知，而是与其形成互补。通过语言指令优化决策过程，LINGO-2能够提升自动驾驶系统应对复杂场景的能力。举例来说，当遇到恶劣天气或道路施工等特殊情况时，驾驶员可以通过语音提示车辆采取应对措施。而LINGO-2则会根据指令和自身感知，给出恰当的反应并作出解释。这种人机协同不仅增强了自动驾驶的安全性，也让决策过程更加透明。

可以说，LINGO-2代表了自动驾驶技术的一种"另类"革命。它探索了人工智能在认知和交互层面的新边界，丰富了无人驾驶的实现路径。未来，视觉感知与语言交互势必会深度融合。

17 April 2024|Research

LINGO-2:Driving with Natural Language

This blog introduces LINGO-2,a driving model that links vision,language,and action to explain and determine driving behavior,opening up a new dimension of control and customization for an autonomous driving experience.LINGO-2 is the first closed-loop vision-language-action driving model(VLAM)tested on public roads.

2023年9月，我们在介绍LINGO-1的博客中首次提出了在自动驾驶中应用自然语言的概念。LINGO-1是一个开环(open-loop)驾驶评论系统，朝着实现值得信赖的自动驾驶技术迈出了第一步。2023年11月，我们通过增加"边显示边讲述"的参考分割功能，进一步提高了LINGO-1响应的准确性和可信度。今天，我们很高兴地介绍Wayve公司在将自然语言融入驾驶模型方面取得的新进展：LINGO-2，这是一个闭环(closed-loop)视觉-语言-行动驾驶模型，简称VLAM(Vision-Language-Action Model)。它是全球首个在公共道路上进行测试的、基于语言训练的驾驶模型。在这篇博文中，我们将分享LINGO-2的技术细节，并通过示例展示它如何将语言和行动结合起来，加速Wayve的AI驾驶模型的安全开发。

Introducing LINGO-2,a closed-loop Vision-Language-Action-Model(VLAM)

我们之前的模型LINGO-1是一个开环的驾驶评论系统。它利用视觉和语言输入来执行视觉问答(Visual Question Answering,VQA)，对驾驶场景进行描述、推理和关注点分析，但只能生成语言输出。这个研究模型是我们在利用语言来理解驾驶模型对场景理解的重要一步。而LINGO-2则更进一步，它能让我们深入了解驾驶模型的决策过程。

LINGO-2 combines vision and language as inputs and outputs,both driving action and language,to provide a continuous driving commentary of its motion planning decisions.LINGO-2 adapts its actions and explanations in accordance with various scene elements and is a strong first indication of the alignment between explanations and decision-making.By linking language and action directly,LINGO-2 sheds light on how AI systems make decisions and opens up a new level of control and customization for driving.

LINGO-2同时将视觉和语言作为输入和输出，在输出驾驶动作的同时，还能生成对驾驶决策的实时解释。它可以根据不同的场景元素来调整驾驶行为和解释内容，初步证明了模型的解释与决策之间的高度一致性。通过直接关联语言和行动，LINGO-2揭示了AI系统的决策机制，为实现可控、个性化的驾驶体验开辟了新的可能。

While LINGO-1 could retrospectively generate commentary on driving scenarios,its commentary was not integrated with the driving model.Therefore,its observations were not informed by actual driving decisions.However,LINGO-2 can both generate real-time driving commentary and control a car.The linking of these fundamental modalities underscores the model’s profound understanding of the contextual semantics of the situation,for example,explaining that it’s slowing down for pedestrians on the road or executing an overtaking maneuver.It’s a crucial step towards enhancing trust in our assisted and autonomous driving systems.

尽管LINGO-1能够对驾驶场景进行事后评论，但它的评论与驾驶模型是分离的，并不是基于实际的驾驶决策。而LINGO-2不仅能生成实时驾驶解说，还能直接控制汽车的行驶。将这两个关键能力结合起来，凸显了LINGO-2对场景语义有着深刻的理解。例如，它能解释减速是因为前方有行人，或者说明正在执行超车动作。这是我们在提高用户对辅助驾驶和自动驾驶系统信任度方面迈出的关键一步。

It opens up new possibilities for accelerating learning with natural language by incorporating a description of driving actions and causal reasoning into the model’s training.Natural language interfaces could,even in the future,allow users to engage in conversations with the driving model,making it easier for people to understand these systems and build trust.

通过将驾驶动作和因果推理的描述纳入模型训练，LINGO-2为加速自然语言学习开辟了新的可能性。未来，自然语言交互界面甚至可以让用户与驾驶模型直接对话，让大众更容易理解和信任这些智能驾驶系统。

Subcribe Weekly News Report

Please contact ： pm.s@idcnova.com