Autonomous Vehicle: A Compound AI System

Lin Yuan
4 min readJun 4, 2024

--

AI took the spotlight in 2023 once again, largely due to the success of ChatGPT, which is built on top of a pre-trained foundation model, GPT-4. Since then, foundation models have gained attention in many areas, from chatbots and language translators to audio/video creators and self-driving cars. Due to its generality and emergent capability as model size increases, there was a belief that a super-powerful large language model (LLM) is all you need. However, I would argue that in the autonomous driving domain, a compound AI system is more favorable, inspired by the recent paper The Shift from Models to Compound AI Systems from the BAIR Lab at UC Berkeley.

Overview of Autonomous Driving System

The compute system, i.e., the “brain,” of an autonomous vehicle consists of the following modules:

1. Perception: This module takes input signals captured by various sensors mounted on the vehicle and generates semantic outputs for various objects. For example, the module should not only detect pedestrians on the sidewalk but also identify if the pedestrians are standing still or trying to walk across the street.

2. Predictor: Predicts the likelihood of future trajectories in the next few milliseconds of all the objects that may interact with the autonomous vehicle.

3. Planner: Given the inputs from Perception and Predictor, combined with other inputs such as HD maps and world knowledge (e.g., traffic rules), the planner generates the trajectories and speed profiles for the autonomous vehicle for the next few milliseconds.

4. Control: This module further fine-tunes the trajectory and speed based on various criteria, such as passenger comfort level, vehicle dynamics, and energy efficiency.

The following figure shows the relationship between all the modules mentioned above:

Fig 1: Autonomous Driving System

Why a Compound AI System is Better for Autonomous Vehicles

Thanks to the widespread success of LLMs, there have been many recent discussions about applying LLMs, especially multimodal foundation models, in autonomous vehicles. However, the exact approach to applying LLMs differs significantly. Some use foundation models for offline learning and auto-labeling, while others directly use the foundation models on the vehicle. There is even one campaign that advocates applying a single large foundational model to the vehicle, i.e., the model will take sensor signals and output the control signals directly.

In my opinion, a compound AI system instead of a monolithic model is more suitable for autonomous vehicles for the following reasons:

1. Fast and Slow Thinking: Our brain has a “slow” part and a “fast” part (for those curious minds, please read the widely acclaimed book Thinking, Fast and Slow). When we perceive objects on the road, it is done through our “fast” part of our brain; but when we make a driving decision based on some road sign (e.g., stop at a red signal when turning right during weekdays), we use our “slow”, or sometimes referred to as the logical part of the brain. The same analogy can be applied to the systems in autonomous vehicles. Certain decisions are very suitable for one type of model, e.g., vision models for object detection and tracking; some other tasks are better with a sequence model. A compound AI system allows us to design the fast and slow parts of the brain separately, which we don’t yet know how to do in a monolithic model.

2. Leveraging Established Control Logic: Some of the control logic for cars has been very well studied. It’s the outcome of hundreds of years of human learning through physics and control theory. Instead of letting the AI learn from human behavior (we also need to label what good driver behavior is), it’s much easier to guide the system with an optimal controller. If one ever takes a Waymo robotaxi, they can immediately feel the driving is much more comfortable than a human Uber driver. In the future, it’s also possible to apply some AI for controlling or a mix of AI and logic, which is only possible in a compound AI system.

3. Improved Observability and Debuggability: Unlike a chatbot or video generation tool, safety is paramount in autonomous vehicles. Hence, the AI system on autonomous vehicles requires extremely high accuracy and security. Even if sometimes some errors are unavoidable, the system should enable engineers to reproduce, debug, and improve. This is not possible, or very difficult, in a monolithic model.

The Future

As envisioned in the paper “The Shift from Models to Compound AI Systems,” we can build more robust and performant systems with compound AI systems. Some components are ML models that excel in various tasks, while others can be implemented with optimized logic. I believe a compound AI system is the path to bringing L5 autonomous driving to us more quickly in the future.

Disclaimer: opinions expressed are solely my own and do not express the views or opinions of my employer.

--

--