Agentic AI Needs Rack-Scale CPU Performance – AMD EPYC Delivers It Today

Jun 09, 2026

AMD EPYC

Agentic AI is changing the shape of infrastructure. As enterprises move from isolated AI experiments to production agentic systems, the supporting CPU infrastructure becomes critical: orchestration services, databases, web front ends, caches, middleware, APIs and control-plane services all need to scale efficiently within real rack power and thermal limits. Customers do not deploy benchmark headlines; they deploy racks constrained by power, cooling, floor space, software compatibility and operational readiness.

Evaluated through that lens, AMD EPYC™ processors demonstrate clear rack-scale leadership. Under the modeled 100 kW rack scenario, AMD EPYC™ 9965 delivers an estimated 2.37x the rack-level throughput of the NVIDIA Vera baseline and roughly 1.6x that of Intel Xeon 6980P. Next-generation AMD EPYC “Venice” is projected to extend the Vera comparison to 3.30x.1  Just as important, this is infrastructure customers can build today on standard x86 platforms, not a future architecture they have to wait for.

Agentic AI Needs CPU-Rich Infrastructure

It is easy to frame the AI buildout as a GPU story. But production agentic systems are not just model inference, they are sprawling, continuously running service environments. Every agent depends on orchestration logic, transactional databases, web and API endpoints, key-value stores, in-memory caches, and middleware that coordinate work, hold state and brokers requests across the system. These services are overwhelmingly CPU-bound, and they scale with the number of concurrent agents rather than the size of any single model.

As agentic deployments move into production, the volume of this supporting infrastructure grows with them. The processor platform that hosts these services becomes a primary determinant of how many agents an enterprise can actually run, and at what cost. This is the layer where general-purpose CPU capacity, not accelerator peak performance, sets the ceiling.

Why Rack-Level Performance is the Right Metric

Component benchmarks describe a chip. They do not describe what a customer can deploy. Data centers are provisioned in racks, and racks are bounded by a fixed power and thermal budget, finite floor space, software-compatibility requirements, and operational readiness. The question that determines real capacity is not “how fast is one socket” but “how much useful work fits inside a 100 kW rack.”

That’s the lens this analysis uses. All configurations are normalized to a modeled 100 kW rack built on 2P (two-processor) platforms, so the comparison reflects deployable service capacity rather than isolated peak processor behavior. Higher-density configurations translate directly into more service capacity per rack. This is what drives capital efficiency, floor-space utilization and operational simplicity. 

AMD EPYC Rack-Level Performance Leadership

Across the evaluated workloads – general-purpose compute, server-side Java, web serving, key-value, in-memory caching and relational databases – AMD EPYC leads the modeled rack-level results decisively. AMD EPYC 9965 (“Turin,” 192C) delivers a 2.37x normalized geometric mean advantage over NVIDIA Vera (88-core “Olympus”), with Intel Xeon 6980P (“Granite Rapids-AP,” 128C) turning in 1.46x over NVIDIA Vera. When AMD EPYC "Venice" (256C) arrives, it extends AMD’s advantage to 3.30x. The gains hold across the entire workload set rather than depending on a single favorable benchmark.

The pattern is consistent: As core density rises within the fixed power envelope, aggregate service throughput rises with it. For the transactional, web-serving and middleware tiers that surround agentic systems, that means materially more concurrency and responsiveness per rack, the qualities that ultimately govern how many agents an environment can sustain.

Horizontal bar chart: Normalized rack performance within 100kW budget. AMD EPYC 3.30, AMD EPYC 9965 2.37, Intel Xeon 6980P 1.46, NVIDIA Vera 1.0.
Table 1: Server CPU Rack Configuration Summary. Compares NVIDIA VERA, Intel Xeon GNR-AP 6980P, AMD EPYC (Turin) 9965, and AMD EPYC (Venice). Data includes cores per socket (88-256), normalized core per socket (1.0-2.90), normalized 2P system power (1.0-1.41), normalized nodes per rack (1.0-0.71), and a consistent 100 kW rack power budget.

Shipping Density Today, Not Proprietary Promises

Rack density has become a headline metric, and rightly so; it’s a direct proxy for deployable value, and it’s where AMD’s currently available solutions stand out. An AMD EPYC "Turin" deployment in a Dell PowerEdge IR7000, or any comparable liquid-cooled rack, supports more than 27,000 CPU cores per rack today; next-generation AMD EPYC "Venice" is architected to scale beyond 36,000 cores in the same rack class. Sandboxes and CPU cores aren’t directly equivalent, but as a directional measure of rack-scale compute density the picture is clear: The density positioned as future-looking is already being exceeded with standard infrastructure available now.

Table 3: Rack Density Comparison. NVIDIA Vera (88C x 2P) has 22,500 cores/sandboxes per rack. AMD EPYC Turin (192C x 2P) has >27,000. AMD EPYC Venice (256C x 2P) has >36,000.

These AMD deployments run on standard liquid-cooled data center equipment and the x86 software ecosystem enterprises already operate, with no new rack architecture required – preserving software continuity, reducing migration friction and shortening time-to-production.

Methodology and Workload Details

The workload suite spans the infrastructure dimensions most relevant to agentic AI service environments, using established benchmarks as proxies:

  • General-purpose computing: SPEC CPU 2017 Integer Rate
  • Server-side Java: a SPECjbb2015-derived workload measuring throughput and latency-sensitive business-logic execution
  • Web serving: NGINX with the WRK tool, under sustained concurrent request load
  • Key-value store: redis-benchmark, for high-speed in-memory operations
  • In-memory caching/analytics: Memcached with memtier_benchmark
  • Relational databases: TPROC-C, a TPC-C-derived OLTP proxy, on MySQL

The set doesn’t model full end-to-end agent pipelines; it isolates the infrastructure layers those pipelines depend on. Comparisons are performed at rack level using a reference 100 kW envelope with 2P platforms, with system power and nodes-per-rack normalized to NVIDIA Vera. Because the "Venice" and Vera figures reflect modeled and projected configurations, results are presented as estimates within the stated rack-power constraint.

Threaded Performance

In addition to rack-level performance and energy efficiency, per-core performance remains a critical consideration for some workloads. AMD has consistently led on this metric for demanding workloads such as databases, analytics, simulations and host processing in multi-GPU server environments. Our “Venice” 64-core CPU is estimated to deliver a 27% performance-per-core advantage compared to the Vera 88-core processor. Even at a higher core count, the 96-core “Venice” CPU is projected to still deliver 11% higher performance-per-core than the Vera 88-core processor.

Conclusion: Deployable Performance Wins

Agentic AI infrastructure should be planned at the rack level, not around isolated component claims. On that basis the conclusion is straightforward: AMD EPYC delivers higher deployable CPU throughput, x86 software continuity and a standards-based path to dense, AI-supporting infrastructure. And it’s available today on shipping platforms. For enterprises scaling toward production agentic AI, that combination of density, compatibility and deployability is what turns performance into capacity.

Find additional details in our methodology description.

Footnotes

1. Results are based on modeled rack-level configurations using publicly available and internal benchmark data. Results may not reflect actual deployed system performance.

Share:

Article By


Corp VP, Datacenter Ecosystems and Application Engineering, Server BU