How Autonomous Vehicles See the World
AVs don't see like humans — they measure. Learn how cameras, lidar, radar, and sensor fusion work together to build a 3D understanding of the road.
December 3, 2024 · 4 min read
Humans drive using a pair of eyes, some mirrors, and a brain full of life experience.
Autonomous vehicles drive using multiple sensors and a lot of software. Let's break down how that actually works, without getting lost in equations.
The sensor toolbox
Most AVs use some combination of these sensors:
Cameras – Like eyes
- Capture color and texture
- Great for reading signs, traffic lights, lane markings
- Struggle when it's very dark, very bright, or things are obscured
Lidar – A 3D scanner
- Shoots out laser pulses and measures how long they take to bounce back
- Builds a precise 3D "point cloud" of the environment
- Very good at measuring shape and distance
Radar – Good in bad weather
- Uses radio waves
- Great at measuring distance and relative speed, even in fog, rain, or dust
- Less detailed in terms of shape
Ultrasonic sensors – Close-range feelers
- Similar to parking sensors
- Used for low-speed maneuvers like parking or tight turns
No single sensor is perfect. That's the point.
Sensor fusion: Better together
Each sensor sees the world differently:
- Cameras see color and detail
- Lidar sees precise 3D structure
- Radar sees motion and distance, even when visibility is poor
AVs combine all of this in a process called sensor fusion.
You can think of it like this:
The car builds a constantly updated 3D model of the world, then paints that model with information about what each object is and how it's moving.
Instead of just "pixels" or "dots," the system knows:
- "This cluster of points is a parked car."
- "This moving blob is a cyclist going 12 mph."
- "This long shape is a lane line."
All of that gets updated many times per second.
Perception: From raw data to understanding
The perception system takes fused sensor data and tries to answer:
- What objects are around me?
- Where exactly are they?
- How big are they?
- How fast are they moving, and in which direction?
Modern perception systems use a lot of machine learning for this. They're trained on huge amounts of labeled data: images and sensor recordings where everything is tagged — cars, trucks, pedestrians, traffic cones, dogs, you name it.
A nighttime crosswalk example
Imagine an AV approaching a crosswalk at night:
- The camera sees the traffic light, the crosswalk lines, and a person in dark clothing.
- The lidar picks up a 3D shape at about human height standing near the curb.
- The radar detects a nearby object that's mostly stationary but might start moving.
Fused together, the system concludes:
"There is a pedestrian right here, near the crosswalk. I should prepare to slow or stop."
Even if one sensor is not perfect (say, glare on the camera), the others can provide backup.
Why redundancy matters
Humans have one sensor type (eyes) and one brain, and we get tired, distracted, or overloaded.
AVs are designed with multiple sensors and multiple layers of software so that:
- If one sensor is temporarily blinded or obstructed, others can cover.
- If one model misclassifies something, other checks can catch it.
That doesn't mean AVs are flawless. But it does mean the system is built to be robust, not dependent on one perfect input.
The bottom line
Autonomous vehicles don't "see" the world the way we do. They measure it.
They turn light, radio waves, and tiny time measurements into a precise, constantly updated 3D understanding of the environment — then feed that into the rest of the stack to make decisions.
In other words:
Humans drive on intuition. AVs drive on data.