NVIDIA's Vera Rubin: The Next Leap in AI Computing
NVIDIA unveiled the Vera Rubin platform at CES 2026, promising five times the inference performance of Blackwell, a tenfold reduction in token cost, and a sweeping redesign of AI infrastructure for the agentic AI era.
A New Era Announced in Las Vegas
At CES 2026, NVIDIA CEO Jensen Huang took the stage in Las Vegas to unveil the company's most ambitious hardware platform yet: Vera Rubin. Named after the pioneering astronomer who confirmed the existence of dark matter, the platform succeeds NVIDIA's record-breaking Blackwell architecture and represents the company's first fully co-designed, six-chip AI supercomputing system. Chips are already in full production, with rack-scale products set to reach cloud partners in the second half of 2026.
Six Chips, One Supercomputer
Unlike previous generations, Vera Rubin is not a single GPU — it is an integrated platform of six co-designed chips. At its core sits the Vera CPU paired with two Rubin GPUs, forming a unified superchip. Rounding out the platform are four networking and storage components: the NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, and Spectrum-6 Ethernet Switch.
The flagship configuration, the Vera Rubin NVL72, packs 72 GPUs and 36 CPUs into a single rack, delivering a staggering 3.6 exaflops of NVFP4 inference performance. Its scale-up bandwidth reaches 260 TB/s — double that of the Blackwell GB200 NVL72.
How It Compares to Blackwell
The performance leap over Blackwell is substantial across every key metric:
- Inference performance: 50 PFLOPS per chip — 5× higher than Blackwell GB200
- Memory bandwidth: 22 TB/s with HBM4 memory — 2.75× more than Blackwell's HBM3E
- Token cost: Up to 10× reduction in inference cost per token
- Training efficiency: 4× fewer GPUs needed to train mixture-of-experts (MoE) models
- Assembly speed: Modular, cable-free tray design enables 18× faster rack servicing
According to Tom's Hardware, Vera Rubin will consume roughly twice the power of Blackwell but deliver ten times more performance per watt — a significant efficiency gain for hyperscale operators.
Built for the Agentic AI Era
NVIDIA is positioning Vera Rubin squarely at the next wave of AI applications: agentic systems, advanced reasoning models, and large mixture-of-experts architectures. These workloads demand far greater memory capacity and interconnect bandwidth than the chatbots and image generators that defined the first generative AI wave.
Among the first cloud providers set to deploy Vera Rubin-based instances are AWS, Google Cloud, Microsoft Azure, and Oracle Cloud, as well as NVIDIA's cloud partner CoreWeave. Microsoft has already published infrastructure planning guidance for large-scale Rubin deployments on Azure.
Geopolitical Stakes
The launch arrives against a charged geopolitical backdrop. NVIDIA has halted H200 chip exports to China and redirected its TSMC production capacity toward Vera Rubin, deepening the technological gulf between the United States and China. Analysis cited by the Center for Strategic and International Studies suggests that without access to advanced US chips, China's AI compute capacity in 2026 could be more than ten times smaller than that of the US.
Industry observers note that platforms like Vera Rubin are increasingly viewed not merely as commercial products, but as strategic infrastructure in the global contest over artificial general intelligence. As CNN Business reported, Vera Rubin effectively maps out NVIDIA's dominance well into the late 2020s — a roadmap that rivals in Beijing are watching closely.
What Comes Next
NVIDIA has already signaled that Vera Rubin will itself be succeeded by a next-generation architecture, maintaining the company's annual cadence of hardware advancement. For now, however, Vera Rubin sets a new benchmark: more compute, lower cost, and a rack design built for the industrialization of intelligence at a scale the world has not yet seen.