Powering Scale-out AI Infrastructure

As the industry’s first Ultra Ethernet Consortium (UEC)-ready AI Networking Interface Card (NIC), the AMD Pensando™ Pollara 400 AI NIC is engineered to accelerate applications running across AI nodes in mega-scale and giga-scale data centers, achieving up to 400 Gigabit per second (Gbps) Ethernet speeds.

Built on the proven third generation, fully hardware programmable Pensando P4 engine, the AMD Pensando Pollara 400 AI NIC delivers leadership performance with the flexibility to be programmed to meet future requirements, helping to maximize infrastructure investments for Hyperscalers, cloud service providers, and Enterprises. 

Accelerate AI Performance at Scale

Up to
8% Faster AI Job Completion Times 1

With up to 400 Gbps GPU-GPU communication speeds, the AMD Pensando™ Pollara 400 AI NIC delivers up to 8% faster AI job completion times compared to the competition, helping accelerate AI training and time-to-production for Gen AI workloads.  

Up to
50% Higher Cluster Uptime 2

Help improve effective cluster uptime by up to ~50% through enhanced reliability, availability, and serviceability (RAS) capabilities. The AMD Pensando™ Pollara 400 AI NIC accelerates convergence and loss recovery under congestion, helping large‑scale AI workloads continue running with fewer interruptions.

Up to
58% Reduced Capex Spending 3

Designed to meet the needs of AI workloads today and tomorrow, the AMD Pensando™ Pollara 400 AI NIC enables open, multi‑plane Ethernet architectures that can reduce network capex by up to 58%, while providing flexibility to scale as AI infrastructure evolves.

Improved Operational Excellence 

Designed with fully programmable hardware and software, the AMD Pensando™ Pollara 400 AI NIC minimizes downtime, validates cluster health, provides advanced telemetry, and enables faster production readiness for AI infrastructure.

Scaling Out Future-Ready AI Infrastructure 

As AI clusters scale, performance increasingly depends on the behavior of the network as congestion, tail latency, and fault propagation define system efficiency, cost, and reliability.

Read this product guide to learn how the AMD Pensando™ Pollara 400 AI NIC can provide predictable scaling, improved utilization, and sustained performance as you scale out your AI infrastructure.  

AMD Pensando™ Pollara 400 AI NIC In the Spotlight

The Critical Role of NIC Programmability in Scaling Out Data Center Networks for AI

Infrastructure buildouts are underway for hosting AI workloads. For effective scale-out, networks play a critical role, and those networks are leaning toward ethernet. But effective networking isn’t just about switches–building advanced functionality into network interface cards is an essential design strategy. Jim Frey, Principal Analyst of Enterprise Networking at Enterprise Strategy Group by TechTarget shares his perspective on why he thinks the AMD programmable NICs represent an optimized path to success.

Ultra Ethernet Consortium logo

Industry’s First AI NIC Supporting Ultra Ethernet Consortium (UEC) Features 

The AMD Pensando™ Pollara 400 AI NIC integrates UEC transport features into Ethernet, enabling UEC RDMA to deliver more consistent performance for AI workloads. With a fully programmable P4 engine, the AI NIC supports ongoing adoption and refinement of UEC capabilities through software, allowing networks to evolve with emerging standards without hardware replacement.

Enhanced Networking Performance for AI Workloads

Competitive Leadership in Ethernet AI Collective Communication Performance

Using RoCEv2 over standard Ethernet in both cases, the AMD Pensando™ Pollara 400 AI NIC running ROCm™ software delivers up to 10% stronger AI collective communication performance compared to NVIDIA running RCCL.4

AMD Pensando™ Pollara 400 AI NIC

NVIDIA 400G RDMA NIC

Up to 10% Better RoCEv2 Performance
+10%

UEC‑Ready RDMA on the AMD AI NIC: Significant Gains in AI Collective Communication Performance 

The AMD Pensando™ Pollara 400 AI NIC achieves up to 25% higher collective communication operation performance with UEC‑ready RDMA versus RoCEv2.5

AMD Pensando™ Pollara 400 AI NIC UEC-RDMA - UEC‑ready RDMA

AMD Pensando™ Pollara 400 AI NIC UEC-RDMA - RoCEv2

Up to 25% Better Performance with UEC RDMA
+25%

Features

Intelligent Network Monitoring and Load Balancing

Intelligent Packet Spray

Intelligent packet spray enables teams to seamlessly optimize network performance by enhancing load balancing, boosting overall efficiency, and scalability. Improved network performance can significantly reduce GPU-to-GPU communication times, leading to faster job completion and greater operational efficiency.

AI technology concept
Out-of-order Packet Handling and In-order Message Delivery

Help ensure messages are delivered in the correct order, even when employing multipathing and packet spraying techniques. The advanced out-of-order message delivery feature efficiently processes data packets that may arrive out of sequence, seamlessly placing them directly into GPU memory without the need for buffering.

Programming code abstract technology background of software developer and  Computer script
Selective Retransmission

Boost network performance with selective acknowledgment (SACK) retransmission, which helps ensure only dropped or corrupted packets are retransmitted. SACK efficiently detects and resends lost or damaged packets, optimizing bandwidth utilization, helping reduce latency during packet loss recovery, and minimizing redundant data transmission for exceptional efficiency.

Abstract illustration of a data stream
Path-Aware Congestion Control

Focus on workloads, not network monitoring, with real-time telemetry and network-aware algorithms. The path-aware congestion control feature simplifies network performance management, enabling teams to quickly detect and address critical issues while helping mitigate the impact of incast scenarios.

Abstract data center concept
Rapid Fault Detection 

With rapid fault detection, teams can pinpoint issues within milliseconds, enabling near-instantaneous failover recovery and helping significantly reduce GPU downtime. Tap into elevated network observability with near real-time latency metrics, congestion and drop statistics.

Digital cyberspace and digital data network connections

AMD Pensando™ Pollara 400 AI NIC Specifications

Maximum Bandwidth  Form Factor Ethernet Interface  Ethernet Speeds Ethernet Configurations  Management
Up to 400 Gbps Half-height, half-length  PCIe® Gen5.0x16; OCP® 3.0 25/50/100/200/400 Gbps

Supports up to 4 ports
- 1 x 400G
- 2 x 200G
- 4 x 100G
- 4 x 50G
- 4 x 25G

MCTP over SMBus

Partner Ecosystem Solutions

AMD partners with leading Original Equipment Manufacturers (OEMs) and Original Design Manufacturers (ODMs) to deliver a comprehensive ecosystem of AMD Networking-powered solutions. Explore our diverse portfolio of partner offerings designed to accelerate innovation and performance.

AMD Pensando™ Pollara 400 AI NIC- Ready Server Platforms

ASRock Rack logo
Celestica logo
Cisco white logo
Compal logo
Dell Technologies logo
Foxconn logo
Gigabyte logo
HPE logo
ingrasys logo
Lenovo logo
MiTAC Computing logo
QCT logo
Supermicro logo
Wistron logo

Resources

Unlock the Future of AI Networking

Learn how the AMD Pensando Pollara 400 AI NIC can transform your scale-out AI Infrastructure.

Explore the full suite of AMD networking solutions designed for high-performance modern data centers.

Footnotes
  1. PEN-020: Testing conducted by AMD Performance Labs as of [15 September 2025] on the AMD Pensando Pollara AI NIC  running Llama 3.1-405B @ 64 global batch size (GBS) with 8K sequence length, on a test system comprising of 8 node SMC-300X server for GPU-GPU communication using 2x AMD Pensando Pollara AI NIC or 2x Nvidia CX-7, 2P AMD EPYC 9454 48-Core 2P - Processor, 8x AMD Instinct MI300X GPUs, Ubuntu 22.04.5 LTS, kernel 5.15.0-139-generic, ROCm 6.4.1.0-83-69b59e5
    Following operation are part of gateway function
    Configuration: Num layers=4, Data Type=BF16, DCN - TP=1, PP=1, SP=1, DP=1, FSDP=-1, ICI - TP=1, PP=1, SP=1, DP=1, FSDP=8.
    AINIC container: jax-private:rocm6.4.0-jax0.5.0-py3.10.12-tedev2.1-20250801_training. Results may vary based on factors including but not limited to system configuration and software settings.
  2. PEN-019: Testing conducted by AMD Performance Labs as of [15 September 2025] on the AMD Pensando Pollara AI NIC, on a test system comprising of SMC-300X server for GPU-GPU communication: 2x AMD Pensando Pollara AI NIC, 2P AMD EPYC 9454   48-Core -2P Processor, 8x AMD Instinct MI300X   GPU, Ubuntu 22.04.5 LTS, kernel 5.15.0-139-generic, ROCm 6.4.1.0-83-69b59e5.  Testing running Llama-3.1-8B, Model Configuration: SEQ_LEN=2048, TP=1, PP=1, CP=1,FP8=1, MBS=10, GBS = 5120. Iteration = 2, No. of paths/QP  : 128. Results may vary based on factors including but not limited to system configuration and software settings.
  3. PEN-018: AMD comparison and pricing as of July 6, 2025, for network fabric costs to support 128,000 GPUs. Comparison of a Pollara NIC with multiplane fabric and packet spray on an 800G Tomahawk 5–based multiplane design versus a generic fat-tree fabric built on fully scheduled, big-buffer (Jericho3/Ramon3) 800G switching platforms. The generic system is assumed to use a competitive NIC, with NIC costs considered comparable. The Pollara-based design is estimated to deliver up to 58% network switching cost savings by enabling the use of more cost-effective Tomahawk 5–based switching in a multiplane architecture. .AMD comparison and pricing as of 4/23/2025 of a Tomahawk 5 system with Pensando Pollara NIC featuring exclusive multiplane fabric and packet spray versus a generic big-buffer 800G switching platform; the generic system woud employ a competitive NIC, costs of NICs are assumed to be comparable. Deploying Pollara with multi-fabric support and packet spray, allows customers to build cost-effective multiplane network fabrics, instead of a fat-tree design using less network switches to deliver the same amount of network bandwidth across the fabric, and dramatically reducing both switch platform cost, and cost associated with cables, optics.
  4. PEN-015 - Testing conducted by AMD Performance Labs as of [13th May 2025] on the [Pollara AI NIC and Nvidia CX7 NIC], on a test system comprising of  8 Nodes of 8xMI300X AMD GPUs (64 GPUs); Broadcom Tomahawk-5 based leaf switch (64x800G) Model Dell z9864f-r0; RAIL Topology; AMD AI NIC Pollara – 64 NICs, ROCm™ version 6.3.2.0-66-cbc70b5 OR Nvidia CX7 SmartNIC - 64 NICs, RCCL version 2.24.3-develop:7961624; CPU Model in each of the 8 Node - Dual socket AMD EPYC 9454 48-Core Processor; Operating System Ubuntu® 22.04.5 LTS; Kernel 5.15.0-139-generic.
    All the application software libraries (RCCL and ROCm)  and test environment are exactly same except the low-level drivers which are hardware specific.
    For Nvidia CX7 card the drivers are installed from openly available Linux drivers’ installation methodology link on NVIDIA website
    https://docs.nvidia.com/networking/display/connectx7vpi/linux+driver+installation

    For AMD Pensando Pollara NIC
    The drivers are used from internal build, but they are planned to be publicly available in coming months.

    Following collective communication operations were measured
    Allreduce, Allroall, Alltoallv, Broadcast, Reduce, Scatter, Allgather
  5. PEN-016 - Testing conducted by AMD Performance Labs as of [28th April 2025] on the [AMD Pensando™ Pollara 400 AI NIC ], on a production system comprising of: 2 Nodes of 8xMI300X AMD GPUs (16 GPUs): Broadcom Tomahawk-4 based leaf switch (64x400G) from MICAS network; CLOS Topology; AMD Pensando Pollara AI NIC  – 16 NICs; CPU Model in each of the 2 nodes - Dual socket 5th gen Intel® Xeon® 8568 - 48 core CPU with PCIe® Gen-5 BIOS version 1.3.6 ; Mitigation - Off (default)
    System profile setting - Performance (default) SMT- enabled (default); Operating System Ubuntu 22.04.5 LTS, Kernel 5.15.0-139-generic.
    Following operation were measured: Allreduce
    Average 25% for All-Reduce operations with 4QP and using UEC ready RDMA vs the RoCEv2 for multiple different message size samples (512MB, 1GB, 2GB, 4GB, 8GB, 16GB). The results are based on the average at least 8 test runs.