AMD Zen Deep Neural Network (ZenDNN)

Overview

ZenDNN is a deep neural network acceleration inference library optimized for AMD “Zen” CPU architecture. ZenDNN library comprises of a set of fundamental building blocks and APIs designed to enhance performance for AI inference applications primarily targeting AMD EPYC™ server CPUs. ZenDNN plugs into mainstream AI frameworks offering developers a seamless experience in developing cutting edge AI applications. This library continues to redefine deep learning performance on AMD EPYC™ CPUs, combining relentless optimization, innovative features, and leading-edgeindustry-leading support for modern workloads.

ZenDNN at a Glance

Delivers high performance over diverse AI workloads such as LLMs, NLP, Vision, and Recommendation Systems without significant engineering efforts offering ease of integration into existing x86 DL environment
Provides freedom of vendor choice by building upon open-source projects such as oneDNN. ZenDNN offers zero to minimal code modifications for existing x86 applications and at the same time supports additional APIs designed to deliver higher performance
ZenDNN is optimized to benefit from higher core counts and large L3 caches on AMD EPYC CPUs helping users derive TCO advantages.

ZenDNN Provides:

Efficient multi-threading on large number of CPU cores
Enhanced microkernels for efficient low level math operations
Optimized Mempools
Comprehensive graph optimizations and kernel fusions
Broad framework supports: PyTorch, TensorFlow and integrated ONNX runtime
Opensource code

Image Zoom

Getting Started

Below is a comprehensive ZenDNN User Guide that covers the release highlights and installation instructions for PyTorch and TensorFlow. For the performance turning enthusiasts, learn about extra tips and tricks under the Performance Tuning chapter.

ZenDNN User Guide

ZenDNN Support Matrix

Documentation

ZenDNN Library: https://github.com/amd/ZenDNN
ZenDNN Plugin for PyTorch: https://github.com/amd/ZenDNN-pytorch-plugin
ZenDNN Plugin for TensorFlow: https://github.com/amd/ZenDNN-tensorflow-plugin

Blogs and Media

AMD ZenDNN Explained: AI Inferencing Power You Didn't Know You Had

ZenDNN Blogs

Get started with ZenDNN to enhance AI performance on AMD EPYC™ server CPUs.

ZenDNN 5.1 Release Blog

ZenDNN 5.0 Release Blog

To read more about current and previous releases, see the AMD Technical Articles and Blogs.

Technical Articles and Blogs

What’s New

5.1
5.0.2
5.0.1
5.0

5.1 Release Highlights

Framework Compatibility

PyTorch & TensorFlow: We've added full compatibility with PyTorch 2.7 and TensorFlow 2.19, ensuring seamless integration with the latest versions of these leading AI frameworks.
vLLM + zentorch Plugin: The new zentorch plugin for vLLM delivers a significant performance uplift of up to 21% on a variety of models compared to vLLM-IPEX.
Java® Integration: We've enabled support for PluggableDevice in TensorFlow-Java, a feature essential for zentf functionality. This feature has been officially contributed and upstreamed to the TensorFlow-Java repository, strengthening its core capabilities. For more details, please see the TensorFlow-Java integration Blog.

Performance Optimizations

Recommender Systems: We've introduced several key optimizations to boost the performance of recommender models, such as DLRMv2.
- EmbeddingBag Improvements: EmbeddingBag Improvements: New "out" variants of EmbeddingBag and related operators now write directly to a shared output buffer, eliminating the need for a separate concatenation operation and improving efficiency.
- Concat Optimization: We've introduced a new optimization that fuses the concatenation operation after Bottom MLP and EmbeddingBag, for the DLRMv2 model.
New Operator Fusions: We've added new operator fusions to accelerate common computational patterns, resulting in a 25% performance uplift for the DIEN BF16 model.
- MatMul + BiasAdd + Tanh
- MatMul + BiasAdd + Sigmoid
Kernel Optimizations:
- BF16/FP32 MatMul: A new kernel for BF16/FP32 matrix multiplication has been introduced that eliminates overheads in less compute-intensive GEMM operations, leading to improved performance of the DIEN model.
- Ahead of Time (AOT) Reorder: We now support AOT reordering for MatMul kernels across INT8, BF16, and FP32 data types.
ZenDNN Enhancements: Added support for MatMul(+fused) Low Overhead API (LOA) to improve performance of small matrix shapes, further improving performance and efficiency.

Ecosystem Contribution

We are actively contributing our optimization work directly to the core PyTorch codebase, as well as the PluggableDevice feature to the TensorFlow-Java repository. These regular upstream contributions strengthen the native performance and capabilities of both frameworks, benefiting the entire community.

5.0.2 Release Highlights

Framework Compatibility: Fully compatible with PyTorch 2.6 and TensorFlow 2.18.
Java® Integration: Introduces a Java interface to the TensorFlow plugin (zentf) via TensorFlow Java.
Optimized Quantized Model Support: Enhanced performance for INT8/INT4-quantized DLRM models.

5.0.1 Release Highlights

Compatible with deep-learning frameworks: Aligned closely with PyTorch 2.5 and TensorFlow 2.18, helping ensure smooth upgrades and interoperability.
Efficient Model Execution: Added support for INT8/INT4-quantized DLRM models in zentorch, unlocking faster inference with lower memory usage compared to BF16-precision. This release supports the MLPerf® version of DLRMv2; support for generic models are planned for the next release.

5.0 Release Highlights

Support for 5^th Gen AMD EPYC™ processors, formerly codenamed “Turin”
Framework Support: PyTorch 2.4.0, TensorFlow 2.17 and ONNXRT 1.19.2
New APIs in the ZenDNN Plugin for PyTorch (zentorch), such as zentorch.llm.optimize() and zentorch.load_woq_model(), for enhanced LLM performance
Enhanced matmul operators and fusions and a new BF16 auto-tuning algorithm targeted for generative LLMs.
An optimized Scalar Dot Product Attention operator including-KV cache performance optimizations tailored to AMD EPYC™ cache architectures
Support for INT4 Weight-Only-Quantization (WOQ)
Improved Model Support: Llama3.1 and 3.2, Phi3, ChatGLM3, Qwen2, GPT-J
And more!

Please consult each plugin’s Release Highlight section in the ZenDNN User Guide for a comprehensive list of updates.

Release Blog

Abstract illustration of a cross-platform integration concept

ZenDNN 5.0: Supercharge AI on AMD EPYC™ Server CPUs

Read Blog

Get Assistance for Current Projects

If you need technical support on ZenDNN, please file an issue ticket on the respective Github page:

ZenDNN Library: https://github.com/amd/ZenDNN
ZenDNN Plugin for PyTorch: https://github.com/amd/ZenDNN-pytorch-plugin
ZenDNN Plugin for TensorFlow: https://github.com/amd/ZenDNN-tensorflow-plugin
[Up to version 5.0]: ONNX Runtime with ZenDNN integrated: https://github.com/amd/ZenDNN-onnxruntime

Sign Up for ZenDNN News

Keep up-to-date on the latest product releases, news, and tips.

Stay Informed of the Latest on AI from AMD

Binaries Download Links:

ZenDNN Plug-in for PyTorch	Description	MD5SUM
ZENTORCH_v5.1.0_Python_v3.10.zip	This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.10	da8bf1d3d4f5975ef17d8bffab790f55
ZENTORCH_v5.1.0_Python_v3.11.zip	This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.11	608699f120980469bd2fc8c5cfb6395f
ZENTORCH_v5.1.0_Python_v3.12.zip	This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.12	d119f6e6551083663486367170ebca4d
ZENTORCH_v5.1.0_Python_v3.13.zip	This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.13	f8abf43aff5af4790a2977972fbffca0
ZENTORCH_v5.1.0_Python_v3.9.zip	This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.9	0f858c78272de09b95719c76ab68d1b4
ZenDNN Plug-in for TensorFlow	Description	MD5SUM
ZENTF_v5.1.0_Python_v3.10.zip	This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.10	885c7df96ec9b3f2b3302ab3f7bfdef1
ZENTF_v5.1.0_Python_v3.11.zip	This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.11	2a77a62d110ae96244f1dc34ddcbda64
ZENTF_v5.1.0_Python_v3.12.zip	This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.12	81799cd80331155555105efdf10f10aa
ZENTF_v5.1.0_Python_v3.9.zip	This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.9	745fbefd95f761a0fd0cac3bd2339289
ZENTF_v5.1.0_C++_API.zip	This zip file contains the ZenDNN TensorFlow Plug-in with C++ APIs	1824f52455c76888fca433028f943414

Binaries are available on the PyPI repository as well and below are the links:
ZenTF: https://pypi.org/project/zentf/
ZenTorch : https://pypi.org/project/zentorch/
Refer to the user guide for more details.

Archive Access: For those requiring versions up to ZenDNN 5.0.2, our archives provide easy access to previous releases, ensuring you have the tools you need for any project.

Data Center

Business Systems

Personal & Gaming

Embedded

Resources

GPU Accelerators

Adaptive Accelerators

DPU Accelerators

Ethernet Adapters

Workstations

Desktops

Laptops

Resources

Adaptive SoCs & FPGAs

System-on-Modules (SOMs)

Technologies

Resources

Evaluation Boards & Kits

Processor Tools

Graphics Tools & Apps

Adaptive SoC & FPGA Tools

Intellectual Property & Apps

GPU Accelerator Tools & Apps

Ethernet Adapter Tools

Overview

For Data Center & Cloud

For Edge & Endpoints

For Developers

Industries

Industries

Industries

Industries

Industries

Workloads

Gaming

Systems

Technologies

Resources

EPYC Processors

Radeon Graphics & AMD Chipsets

Adaptive SoCs & FPGAs

Alveo Accelerators & Kria SOMs

Ryzen Processors

Ethernet Adapters

Overview

Processors

Accelerators

Embedded Products

Graphics

Overview

Resources by Product

Resources by Type

About Our Partners

AMD Global Support

Processors & Graphics

Accelerators

Adaptive SoCs & FPGAs

Gaming & Personal Computing

Adaptive & Embedded Computing

Get AMD Fan Gear

Shop Our Retail Partners

Overview

ZenDNN at a Glance

Getting Started

Documentation

Blogs and Media

AMD ZenDNN Explained: AI Inferencing Power You Didn't Know You Had

ZenDNN Blogs

What’s New

ZenDNN 5.0: Supercharge AI on AMD EPYC™ Server CPUs

Get Assistance for Current Projects

Sign Up for ZenDNN News

Binaries Download Links:

Company

News & Events

Resources

Partners

Investors