From GPUs to AI factories: NVIDIA’s Vera Rubin

Written by

Last updated on:

June 24, 2026

Written by

Last updated on:

June 24, 2026

NVIDIA is recasting AI infrastructure as factories, with Vera Rubin and Vera at the core of a pod‑scale system built for agentic workloads.

NVIDIA is trying to turn the idea of “AI factories” into a standard unit of infrastructure. Vera Rubin and the new Vera CPU are the platform pieces it hopes will anchor that shift for cloud and enterprise buyers.

The company frames Vera Rubin as a complete stack for agentic AI, not just another GPU generation. It brings together seven coordinated chips and five rack types designed to act as one AI supercomputer across training and inference, with Vera at the center of the CPU work.

‍

NVIDIA Vera CPU rack with densely packed server trays in a data center cabinet

What is Vera?

Vera is NVIDIA’s CPU for agentic AI and reinforcement learning. Vera is built for code execution, tool use, sandboxing, analytics, data pipelines, and orchestration beyond the model itself—exactly the work that grows as teams move from simple prompts to multi-step agents.

In NVIDIA’s launch announcement, Vera was described as the first CPU built for AI agents and as a processor that enables 1.8x faster task completion than x86 CPUs for these workloads. Vera also acts as the host CPU inside Vera Rubin systems, connected to Rubin GPUs through second-generation NVLink-C2C for high-bandwidth CPU-GPU communication.

What is Vera Rubin?

Vera Rubin is NVIDIA’s pod-scale platform for agentic AI and reasoning models. According to NVIDIA, the platform is designed to eliminate bottlenecks in communication, coordination, and memory so enterprises run AI reasoning at scale more efficiently.

Rather than acting as a single server, Vera Rubin is a multi-rack, pod-scale platform that NVIDIA says is designed to function as one integrated AI supercomputer.

In a press release, NVIDIA describes Vera Rubin as “opening the next frontier of agentic AI,” with seven new chips in full production to scale large AI factories. The platform combines the Vera CPU, Rubin GPU, NVLink 6 switches, ConnectX‑9 SuperNICs, BlueField‑4 DPUs, Spectrum‑6 Ethernet switches, and Groq 3 LPX racks into a coordinated rack-scale design.

In this model, the rack becomes the basic building block of the system architecture. By unifying 72 Rubin GPUs and 36 Vera CPUs in a single Vera Rubin NVL72 rack, NVIDIA can tune bandwidth, latency, power, and cooling across the whole system so the rack behaves like one large AI accelerator instead of a cluster of independent machines.

How NVIDIA defines an AI factory

NVIDIA defines an AI factory as “specialized computing infrastructure designed to create value from data by managing the entire AI life cycle.” In that definition, the primary product is intelligence measured by token throughput, which shifts the conversation from isolated servers to end-to-end systems built for continuous AI output.

That framing also appears in NVIDIA’s broader AI infrastructure language. The company describes AI infrastructure for an AI factory as a tightly integrated stack of compute, storage, networking, power, cooling, and orchestration software that supports the full life cycle of agentic AI workloads.

Why NVIDIA Is pushing Vera Rubin

NVIDIA claims that AI is shifting from one-shot prompts to agents that plan, call tools, and interact with other systems. That shift puts more pressure on coordination, memory bandwidth, and CPU-side orchestration, not just raw GPU performance.

Vera Rubin is NVIDIA’s way of turning that shift into a concrete infrastructure pattern. Instead of selling individual accelerators, NVIDIA is packaging CPU, GPU, networking, and storage into pod-scale AI factories that are designed, tested, and operated as one system. Vera, positioned as the CPU for agents, anchors the CPU-heavy side of that factory—code execution, environment stepping, data prep, and control logic.

NVIDIA is also prioritizing efficiency and operations. Its Vera Rubin DSX AI factory reference design, along with Max‑Q and Flex tools, frames the platform around power budgets, “tokens per watt,” and grid-aware planning, so AI factories can scale within real-world energy and facility constraints rather than only chasing peak benchmark numbers.

‍

NVIDIA Rubin platform system rack for next-generation AI data center workloads.

What this means for enterprise teams

For enterprise buyers, Vera Rubin changes the unit of planning. Instead of evaluating a GPU server in isolation, teams are now being asked to think in terms of AI factories that combine accelerators, CPUs, networking, storage, and operations into one architecture.

For CIOs and VP engineering leaders, this raises practical questions about power budgets, facility readiness, vendor concentration, and the long-term cost of aligning closely with a single stack. For platform teams, it also raises design questions about how a Vera-Rubin-centered architecture fits with existing x86 fleets, cloud commitments, and internal tooling.

What comes after Vera Rubin

Vera Rubin and Vera together show how NVIDIA wants AI infrastructure to evolve: away from isolated GPU servers and toward coordinated AI factories that blend accelerators, CPUs for agents, networking, and power-aware operations.

As you plan the next phase of your AI platform, it is worth mapping this factory model against your goals for flexibility, cost, and governance.

To see how FullStack can help you design your new AI roadmap, contact us today.

Learn more

Frequently Asked Questions

What is NVIDIA Vera, and why does it matter for agentic AI?

NVIDIA Vera is a data center CPU built specifically for agentic AI and reinforcement learning workloads. It is designed for code execution, tool use, sandboxing, analytics, and orchestration tasks that sit around the model but drive most of the logic in multi-step agents. By delivering faster task completion than traditional x86 servers and pairing tightly with NVIDIA GPUs, Vera helps reduce CPU bottlenecks in AI factories so agents can plan, act, and iterate at scale.

How is the NVIDIA Vera Rubin platform different from a traditional GPU cluster?

The NVIDIA Vera Rubin platform is a pod-scale system rather than a collection of standalone GPU servers. It combines Vera CPUs, Rubin GPUs, high-speed interconnects, DPUs, and Ethernet switches into multiple purpose-built racks that are designed, tested, and operated as one AI supercomputer. By treating the rack as the basic building block—rather than an individual server—Vera Rubin can deliver higher internal bandwidth, more predictable power and cooling, and more consistent performance for large-scale reasoning workloads.

What does NVIDIA mean by an “AI factory,” and how does Vera Rubin fit in?

NVIDIA uses the term AI factory to describe specialized infrastructure that manages the entire AI life cycle, from data ingestion and training to inference and continuous improvement. In this model, the “product” is intelligence measured in tokens, recommendations, or actions, not just FLOPs. Vera Rubin is NVIDIA’s reference implementation of an AI factory for the agentic era: a tightly integrated stack of compute, storage, networking, power, cooling, and orchestration software that is delivered as a pod-scale platform.

How does Vera Rubin address power efficiency and “tokens per watt”?

NVIDIA is explicit that Vera Rubin is designed around power budgets and “tokens per watt,” not only peak benchmark numbers. The Vera Rubin DSX AI factory reference design, along with Max‑Q and Flex tools, focuses on maximizing usable computing output and token performance within fixed power envelopes. That means the platform is meant to help teams plan AI capacity around real-world energy and facility constraints, making it easier to scale agentic workloads without overbuilding data center infrastructure.

What should enterprise teams consider before adopting the Vera Rubin and Vera stack?

Enterprise teams should look at Vera Rubin and Vera through both a technical and strategic lens. On the technical side, key questions include how a Vera-based AI factory fits with existing x86 fleets, cloud commitments, networking standards, and internal MLOps tooling. On the strategic side, leaders need to weigh power budgets, facility readiness, and the implications of aligning closely with a single full-stack vendor. Mapping NVIDIA’s AI factory model against internal goals for flexibility, risk, and long-term cost can help determine whether Vera Rubin becomes a core platform or one of several pillars in the organization’s AI roadmap.

Enjoyed the article? Get new content delivered to your inbox.

Subscribe below and stay updated with the latest developer guides and industry insights.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.