Powering Scale-out AI Infrastructure
As the industry’s first Ultra Ethernet Consortium (UEC)-ready AI Networking Interface Card (NIC), the AMD Pensando™ Pollara 400 AI NIC is engineered to accelerate applications running across AI nodes in mega-scale and giga-scale data centers, achieving up to 400 Gigabit per second (Gbps) Ethernet speeds.
Built on the proven third generation, fully hardware programmable Pensando P4 engine, the AMD Pensando Pollara 400 AI NIC delivers leadership performance with the flexibility to be programmed to meet future requirements, helping to maximize infrastructure investments for Hyperscalers, cloud service providers, and Enterprises.
Accelerate AI Performance at Scale
With up to 400 Gbps GPU-GPU communication speeds, the AMD Pensando™ Pollara 400 AI NIC delivers up to 8% faster AI job completion times compared to the competition, helping accelerate AI training and time-to-production for Gen AI workloads.
Help improve effective cluster uptime by up to ~50% through enhanced reliability, availability, and serviceability (RAS) capabilities. The AMD Pensando™ Pollara 400 AI NIC accelerates convergence and loss recovery under congestion, helping large‑scale AI workloads continue running with fewer interruptions.
Designed to meet the needs of AI workloads today and tomorrow, the AMD Pensando™ Pollara 400 AI NIC enables open, multi‑plane Ethernet architectures that can reduce network capex by up to 58%, while providing flexibility to scale as AI infrastructure evolves.
Improved Operational Excellence
Designed with fully programmable hardware and software, the AMD Pensando™ Pollara 400 AI NIC minimizes downtime, validates cluster health, provides advanced telemetry, and enables faster production readiness for AI infrastructure.
Scaling Out Future-Ready AI Infrastructure
As AI clusters scale, performance increasingly depends on the behavior of the network as congestion, tail latency, and fault propagation define system efficiency, cost, and reliability.
Read this product guide to learn how the AMD Pensando™ Pollara 400 AI NIC can provide predictable scaling, improved utilization, and sustained performance as you scale out your AI infrastructure.
AMD Pensando™ Pollara 400 AI NIC In the Spotlight
The Critical Role of NIC Programmability in Scaling Out Data Center Networks for AI
Infrastructure buildouts are underway for hosting AI workloads. For effective scale-out, networks play a critical role, and those networks are leaning toward ethernet. But effective networking isn’t just about switches–building advanced functionality into network interface cards is an essential design strategy. Jim Frey, Principal Analyst of Enterprise Networking at Enterprise Strategy Group by TechTarget shares his perspective on why he thinks the AMD programmable NICs represent an optimized path to success.
Industry’s First AI NIC Supporting Ultra Ethernet Consortium (UEC) Features
The AMD Pensando™ Pollara 400 AI NIC integrates UEC transport features into Ethernet, enabling UEC RDMA to deliver more consistent performance for AI workloads. With a fully programmable P4 engine, the AI NIC supports ongoing adoption and refinement of UEC capabilities through software, allowing networks to evolve with emerging standards without hardware replacement.
Enhanced Networking Performance for AI Workloads
Competitive Leadership in Ethernet AI Collective Communication Performance
Using RoCEv2 over standard Ethernet in both cases, the AMD Pensando™ Pollara 400 AI NIC running ROCm™ software delivers up to 10% stronger AI collective communication performance compared to NVIDIA running RCCL.4
AMD Pensando™ Pollara 400 AI NIC
NVIDIA 400G RDMA NIC
UEC‑Ready RDMA on the AMD AI NIC: Significant Gains in AI Collective Communication Performance
The AMD Pensando™ Pollara 400 AI NIC achieves up to 25% higher collective communication operation performance with UEC‑ready RDMA versus RoCEv2.5
AMD Pensando™ Pollara 400 AI NIC UEC-RDMA - UEC‑ready RDMA
AMD Pensando™ Pollara 400 AI NIC UEC-RDMA - RoCEv2
Features
Intelligent Network Monitoring and Load Balancing
- Intelligent Packet Spray
- Out-of-order Packet Handling and In-order Message Delivery
- Selective Retransmission
- Path-Aware Congestion Control
- Rapid Fault Detection
Intelligent Packet Spray
Intelligent packet spray enables teams to seamlessly optimize network performance by enhancing load balancing, boosting overall efficiency, and scalability. Improved network performance can significantly reduce GPU-to-GPU communication times, leading to faster job completion and greater operational efficiency.
Out-of-order Packet Handling and In-order Message Delivery
Help ensure messages are delivered in the correct order, even when employing multipathing and packet spraying techniques. The advanced out-of-order message delivery feature efficiently processes data packets that may arrive out of sequence, seamlessly placing them directly into GPU memory without the need for buffering.
Selective Retransmission
Boost network performance with selective acknowledgment (SACK) retransmission, which helps ensure only dropped or corrupted packets are retransmitted. SACK efficiently detects and resends lost or damaged packets, optimizing bandwidth utilization, helping reduce latency during packet loss recovery, and minimizing redundant data transmission for exceptional efficiency.
Path-Aware Congestion Control
Focus on workloads, not network monitoring, with real-time telemetry and network-aware algorithms. The path-aware congestion control feature simplifies network performance management, enabling teams to quickly detect and address critical issues while helping mitigate the impact of incast scenarios.
Rapid Fault Detection
With rapid fault detection, teams can pinpoint issues within milliseconds, enabling near-instantaneous failover recovery and helping significantly reduce GPU downtime. Tap into elevated network observability with near real-time latency metrics, congestion and drop statistics.
AMD Pensando™ Pollara 400 AI NIC Specifications
| Maximum Bandwidth | Form Factor | Ethernet Interface | Ethernet Speeds | Ethernet Configurations | Management |
| Up to 400 Gbps | Half-height, half-length | PCIe® Gen5.0x16; OCP® 3.0 | 25/50/100/200/400 Gbps | Supports up to 4 ports |
MCTP over SMBus |
Partner Ecosystem Solutions
AMD partners with leading Original Equipment Manufacturers (OEMs) and Original Design Manufacturers (ODMs) to deliver a comprehensive ecosystem of AMD Networking-powered solutions. Explore our diverse portfolio of partner offerings designed to accelerate innovation and performance.
AMD Pensando™ Pollara 400 AI NIC- Ready Server Platforms
Resources
Unlock the Future of AI Networking
Learn how the AMD Pensando Pollara 400 AI NIC can transform your scale-out AI Infrastructure.
Explore the full suite of AMD networking solutions designed for high-performance modern data centers.
Footnotes
- PEN-020: Testing conducted by AMD Performance Labs as of [15 September 2025] on the AMD Pensando Pollara AI NIC running Llama 3.1-405B @ 64 global batch size (GBS) with 8K sequence length, on a test system comprising of 8 node SMC-300X server for GPU-GPU communication using 2x AMD Pensando Pollara AI NIC or 2x Nvidia CX-7, 2P AMD EPYC 9454 48-Core 2P - Processor, 8x AMD Instinct MI300X GPUs, Ubuntu 22.04.5 LTS, kernel 5.15.0-139-generic, ROCm 6.4.1.0-83-69b59e5
Following operation are part of gateway function
Configuration: Num layers=4, Data Type=BF16, DCN - TP=1, PP=1, SP=1, DP=1, FSDP=-1, ICI - TP=1, PP=1, SP=1, DP=1, FSDP=8.
AINIC container: jax-private:rocm6.4.0-jax0.5.0-py3.10.12-tedev2.1-20250801_training. Results may vary based on factors including but not limited to system configuration and software settings.
- PEN-019: Testing conducted by AMD Performance Labs as of [15 September 2025] on the AMD Pensando Pollara AI NIC, on a test system comprising of SMC-300X server for GPU-GPU communication: 2x AMD Pensando Pollara AI NIC, 2P AMD EPYC 9454 48-Core -2P Processor, 8x AMD Instinct MI300X GPU, Ubuntu 22.04.5 LTS, kernel 5.15.0-139-generic, ROCm 6.4.1.0-83-69b59e5. Testing running Llama-3.1-8B, Model Configuration: SEQ_LEN=2048, TP=1, PP=1, CP=1,FP8=1, MBS=10, GBS = 5120. Iteration = 2, No. of paths/QP : 128. Results may vary based on factors including but not limited to system configuration and software settings.
- PEN-018: AMD comparison and pricing as of July 6, 2025, for network fabric costs to support 128,000 GPUs. Comparison of a Pollara NIC with multiplane fabric and packet spray on an 800G Tomahawk 5–based multiplane design versus a generic fat-tree fabric built on fully scheduled, big-buffer (Jericho3/Ramon3) 800G switching platforms. The generic system is assumed to use a competitive NIC, with NIC costs considered comparable. The Pollara-based design is estimated to deliver up to 58% network switching cost savings by enabling the use of more cost-effective Tomahawk 5–based switching in a multiplane architecture. .AMD comparison and pricing as of 4/23/2025 of a Tomahawk 5 system with Pensando Pollara NIC featuring exclusive multiplane fabric and packet spray versus a generic big-buffer 800G switching platform; the generic system woud employ a competitive NIC, costs of NICs are assumed to be comparable. Deploying Pollara with multi-fabric support and packet spray, allows customers to build cost-effective multiplane network fabrics, instead of a fat-tree design using less network switches to deliver the same amount of network bandwidth across the fabric, and dramatically reducing both switch platform cost, and cost associated with cables, optics.
- PEN-015 - Testing conducted by AMD Performance Labs as of [13th May 2025] on the [Pollara AI NIC and Nvidia CX7 NIC], on a test system comprising of 8 Nodes of 8xMI300X AMD GPUs (64 GPUs); Broadcom Tomahawk-5 based leaf switch (64x800G) Model Dell z9864f-r0; RAIL Topology; AMD AI NIC Pollara – 64 NICs, ROCm™ version 6.3.2.0-66-cbc70b5 OR Nvidia CX7 SmartNIC - 64 NICs, RCCL version 2.24.3-develop:7961624; CPU Model in each of the 8 Node - Dual socket AMD EPYC 9454 48-Core Processor; Operating System Ubuntu® 22.04.5 LTS; Kernel 5.15.0-139-generic.
All the application software libraries (RCCL and ROCm) and test environment are exactly same except the low-level drivers which are hardware specific.
For Nvidia CX7 card the drivers are installed from openly available Linux drivers’ installation methodology link on NVIDIA website
https://docs.nvidia.com/networking/display/connectx7vpi/linux+driver+installation
For AMD Pensando Pollara NIC
The drivers are used from internal build, but they are planned to be publicly available in coming months.
Following collective communication operations were measured
Allreduce, Allroall, Alltoallv, Broadcast, Reduce, Scatter, Allgather
- PEN-016 - Testing conducted by AMD Performance Labs as of [28th April 2025] on the [AMD Pensando™ Pollara 400 AI NIC ], on a production system comprising of: 2 Nodes of 8xMI300X AMD GPUs (16 GPUs): Broadcom Tomahawk-4 based leaf switch (64x400G) from MICAS network; CLOS Topology; AMD Pensando Pollara AI NIC – 16 NICs; CPU Model in each of the 2 nodes - Dual socket 5th gen Intel® Xeon® 8568 - 48 core CPU with PCIe® Gen-5 BIOS version 1.3.6 ; Mitigation - Off (default)
System profile setting - Performance (default) SMT- enabled (default); Operating System Ubuntu 22.04.5 LTS, Kernel 5.15.0-139-generic.
Following operation were measured: Allreduce
Average 25% for All-Reduce operations with 4QP and using UEC ready RDMA vs the RoCEv2 for multiple different message size samples (512MB, 1GB, 2GB, 4GB, 8GB, 16GB). The results are based on the average at least 8 test runs.
- PEN-020: Testing conducted by AMD Performance Labs as of [15 September 2025] on the AMD Pensando Pollara AI NIC running Llama 3.1-405B @ 64 global batch size (GBS) with 8K sequence length, on a test system comprising of 8 node SMC-300X server for GPU-GPU communication using 2x AMD Pensando Pollara AI NIC or 2x Nvidia CX-7, 2P AMD EPYC 9454 48-Core 2P - Processor, 8x AMD Instinct MI300X GPUs, Ubuntu 22.04.5 LTS, kernel 5.15.0-139-generic, ROCm 6.4.1.0-83-69b59e5
Following operation are part of gateway function
Configuration: Num layers=4, Data Type=BF16, DCN - TP=1, PP=1, SP=1, DP=1, FSDP=-1, ICI - TP=1, PP=1, SP=1, DP=1, FSDP=8.
AINIC container: jax-private:rocm6.4.0-jax0.5.0-py3.10.12-tedev2.1-20250801_training. Results may vary based on factors including but not limited to system configuration and software settings. - PEN-019: Testing conducted by AMD Performance Labs as of [15 September 2025] on the AMD Pensando Pollara AI NIC, on a test system comprising of SMC-300X server for GPU-GPU communication: 2x AMD Pensando Pollara AI NIC, 2P AMD EPYC 9454 48-Core -2P Processor, 8x AMD Instinct MI300X GPU, Ubuntu 22.04.5 LTS, kernel 5.15.0-139-generic, ROCm 6.4.1.0-83-69b59e5. Testing running Llama-3.1-8B, Model Configuration: SEQ_LEN=2048, TP=1, PP=1, CP=1,FP8=1, MBS=10, GBS = 5120. Iteration = 2, No. of paths/QP : 128. Results may vary based on factors including but not limited to system configuration and software settings.
- PEN-018: AMD comparison and pricing as of July 6, 2025, for network fabric costs to support 128,000 GPUs. Comparison of a Pollara NIC with multiplane fabric and packet spray on an 800G Tomahawk 5–based multiplane design versus a generic fat-tree fabric built on fully scheduled, big-buffer (Jericho3/Ramon3) 800G switching platforms. The generic system is assumed to use a competitive NIC, with NIC costs considered comparable. The Pollara-based design is estimated to deliver up to 58% network switching cost savings by enabling the use of more cost-effective Tomahawk 5–based switching in a multiplane architecture. .AMD comparison and pricing as of 4/23/2025 of a Tomahawk 5 system with Pensando Pollara NIC featuring exclusive multiplane fabric and packet spray versus a generic big-buffer 800G switching platform; the generic system woud employ a competitive NIC, costs of NICs are assumed to be comparable. Deploying Pollara with multi-fabric support and packet spray, allows customers to build cost-effective multiplane network fabrics, instead of a fat-tree design using less network switches to deliver the same amount of network bandwidth across the fabric, and dramatically reducing both switch platform cost, and cost associated with cables, optics.
- PEN-015 - Testing conducted by AMD Performance Labs as of [13th May 2025] on the [Pollara AI NIC and Nvidia CX7 NIC], on a test system comprising of 8 Nodes of 8xMI300X AMD GPUs (64 GPUs); Broadcom Tomahawk-5 based leaf switch (64x800G) Model Dell z9864f-r0; RAIL Topology; AMD AI NIC Pollara – 64 NICs, ROCm™ version 6.3.2.0-66-cbc70b5 OR Nvidia CX7 SmartNIC - 64 NICs, RCCL version 2.24.3-develop:7961624; CPU Model in each of the 8 Node - Dual socket AMD EPYC 9454 48-Core Processor; Operating System Ubuntu® 22.04.5 LTS; Kernel 5.15.0-139-generic.
All the application software libraries (RCCL and ROCm) and test environment are exactly same except the low-level drivers which are hardware specific.
For Nvidia CX7 card the drivers are installed from openly available Linux drivers’ installation methodology link on NVIDIA website
https://docs.nvidia.com/networking/display/connectx7vpi/linux+driver+installation
For AMD Pensando Pollara NIC
The drivers are used from internal build, but they are planned to be publicly available in coming months.
Following collective communication operations were measured
Allreduce, Allroall, Alltoallv, Broadcast, Reduce, Scatter, Allgather
- PEN-016 - Testing conducted by AMD Performance Labs as of [28th April 2025] on the [AMD Pensando™ Pollara 400 AI NIC ], on a production system comprising of: 2 Nodes of 8xMI300X AMD GPUs (16 GPUs): Broadcom Tomahawk-4 based leaf switch (64x400G) from MICAS network; CLOS Topology; AMD Pensando Pollara AI NIC – 16 NICs; CPU Model in each of the 2 nodes - Dual socket 5th gen Intel® Xeon® 8568 - 48 core CPU with PCIe® Gen-5 BIOS version 1.3.6 ; Mitigation - Off (default)
System profile setting - Performance (default) SMT- enabled (default); Operating System Ubuntu 22.04.5 LTS, Kernel 5.15.0-139-generic.
Following operation were measured: Allreduce
Average 25% for All-Reduce operations with 4QP and using UEC ready RDMA vs the RoCEv2 for multiple different message size samples (512MB, 1GB, 2GB, 4GB, 8GB, 16GB). The results are based on the average at least 8 test runs.