Li Auto has introduced its latest advancement in autonomous driving technology, MindVLA (Visual-Language-Action), which is set to revolutionize the way we perceive self-driving vehicles. This cutting-edge architecture, unveiled by Jia Peng, the head of autonomous driving technology development at Li Auto, aims to propel the company towards achieving truly autonomous driving capabilities.
Describing MindVLA as a “robot large model,” Li Xiang, the founder, chairman, and CEO of Li Auto, emphasized its integration of spatial, linguistic, and behavioral intelligence into a single model. This amalgamation equips autonomous driving systems with the ability to sense, analyze, and adapt to their surroundings, marking a significant milestone in Li Auto’s journey towards Level 4 autonomous driving.
Comparing the impact of MindVLA to that of the iPhone 4 on smartphones, Li Xiang expressed confidence in its potential to redefine the autonomous driving landscape. By imbuing autonomous vehicles with human-like driving capabilities, MindVLA is poised to bridge the gap between traditional transportation and intelligent, self-driving agents.
During the fourth-quarter 2024 earnings analyst call, Li Auto’s management announced the commencement of research and development on the next-generation VLA smart driving large model, which will debut alongside the highly anticipated Li i8 electric SUV. Scheduled for release in July, the Li i8 represents Li Auto’s foray into the realm of all-electric SUV models.
MindVLA is envisioned to elevate vehicles from mere modes of transportation to proactive, cognitive entities. By endowing cars with cognitive and adaptive functionalities akin to human thought processes, Li Auto aims to create smart agents that can independently navigate complex traffic scenarios.
The development of MindVLA involved the creation and training of a large language model base model using the MoE hybrid expert architecture and Sparse Attention mechanism. This design ensures that the model’s size scales without compromising reasoning efficiency, resulting in enhanced performance and adaptability.
MindVLA leverages diffusion to translate action tokens into optimized trajectories and employs joint modeling to predict the behavior of other vehicles, enhancing decision-making capabilities in dynamic traffic environments. Additionally, the architecture utilizes Li Auto’s proprietary world model to simulate real-world scenarios accurately, further enhancing its adaptability and response accuracy.
In conclusion, MindVLA represents a significant leap forward in autonomous driving technology, positioning Li Auto as a frontrunner in the race towards fully autonomous vehicles. With its innovative approach and robust capabilities, MindVLA sets a new benchmark for the industry and underscores Li Auto’s commitment to shaping the future of transportation.