How End-to-End Self-Driving Works—No Maps Required
A new generation of autonomous vehicles ditches HD maps and hand-coded rules in favor of a single neural network that learns to drive the way humans do—by watching the road.
The Old Way: Driving by Committee
For more than a decade, the dominant approach to self-driving cars has been the modular pipeline. Engineers break the driving task into a chain of specialized modules—perception, tracking, prediction, planning, and control—each with its own code, its own inputs, and its own outputs. A lidar sensor feeds a 3-D map; the map feeds a motion planner; the planner feeds a controller that turns the wheel.
This architecture is logical, transparent, and easy to debug. When something goes wrong, engineers can trace the failure to a specific module. But it has a fundamental flaw: information is lost at every handoff. Small errors in one module compound through the chain. And because each component is tuned independently, the system is only as good as its weakest link.
Worse, traditional systems depend on high-definition maps—centimeter-precise 3-D models of every road, lane marking, and curb. Building and maintaining those maps is expensive and slow, which is why most robotaxi services still operate in a handful of geofenced cities.
The New Way: One Network, Sensor to Steering
End-to-end autonomous driving replaces the entire modular chain with a single large neural network. Raw camera footage goes in; a driving plan comes out. The network learns perception, prediction, and planning simultaneously, optimizing every layer toward one goal: driving safely.
The concept is not new—researchers demonstrated basic versions in the 1980s—but recent advances in deep learning, transformer architectures, and massive compute clusters have made it practical. According to a comprehensive survey published on arXiv, the autonomous driving community has seen rapid growth in end-to-end frameworks that use raw sensor input to generate vehicle motion plans directly.
The key insight is that joint optimization beats isolated tuning. When perception and planning share the same gradient signal, the network learns to pay attention to what actually matters for driving—not just what a human engineer decided to label.
Who Is Building This?
Three companies lead the end-to-end push. Tesla replaced roughly 300,000 lines of hand-coded C++ with a single neural network in its Full Self-Driving software, starting with version 12 in 2024 and expanding dramatically with FSD v13, which integrates parking, driving, and reversing into one unified model. The system takes video from eight cameras and directly outputs steering, acceleration, and braking commands.
London-based Wayve has taken the mapless philosophy furthest. Backed by Microsoft and SoftBank, Wayve's platform has demonstrated autonomous driving in over 90 cities without any prior HD mapping—adapting to new geographies in weeks rather than years. Israeli startup Imagry pursues a similar vision, calling its approach "location-independent" driving.
Advantages and Risks
The benefits are compelling. End-to-end systems are dramatically more scalable because they do not require expensive HD maps for every new road. They handle edge cases more gracefully because the network has seen millions of real-world driving scenarios during training. And they are computationally more efficient—one model instead of a dozen.
But the risks are real. End-to-end networks are black boxes. When the car makes a mistake, engineers cannot easily trace the error to a specific decision point, making certification and regulation harder. These systems also require enormous training datasets—Tesla draws on billions of miles of fleet data, a resource few competitors can match. As researchers at UC Berkeley have noted, bridging the interpretability gap between modular and end-to-end systems remains an open challenge.
What Comes Next
Many teams are now exploring hybrid architectures that combine end-to-end learning with modular safety checks—using neural networks for planning but retaining rule-based guardrails for emergency braking and collision avoidance. According to research published in the journal Sensors, these hybrids aim to capture the adaptability of deep learning without sacrificing the transparency regulators demand.
The shift from hand-coded pipelines to learned driving intelligence mirrors a broader trend in AI: replacing human-engineered features with models that discover their own representations. Whether end-to-end systems can prove safe enough to earn public trust—and regulatory approval—will determine whether the next generation of self-driving cars finally leaves the geofence behind.