AMD ZenDNN

Overview

ZenDNN is a deep neural network acceleration inference library optimized for AMD “Zen” CPU architecture. ZenDNN library comprises of a set of fundamental building blocks and APIs designed to enhance performance for AI inference applications primarily targeting AMD EPYC™ server CPUs. ZenDNN plugs into mainstream AI frameworks offering developers a seamless experience in developing cutting edge AI applications. This library continues to redefine deep learning performance on AMD EPYC™ CPUs, combining relentless optimization, innovative features, and leading-edgeindustry-leading support for modern workloads.

ZenDNN at a Glance

  • Delivers high performance over diverse AI workloads such as LLMs, NLP, Vision, and Recommendation Systems without significant engineering efforts offering ease of integration into existing x86 DL environment
  • Provides freedom of vendor choice by building upon open-source projects such as oneDNN. ZenDNN offers zero to minimal code modifications for existing x86 applications and at the same time supports additional APIs designed to deliver higher performance
  • ZenDNN is optimized to benefit from higher core counts and large L3 caches on AMD EPYC CPUs helping users derive TCO advantages.

ZenDNN Provides:​

  • Efficient multi-threading on large number of CPU cores
  • Enhanced microkernels for efficient low level math operations
  • Optimized Mempools
  • Comprehensive graph optimizations and kernel fusions
  • Broad framework supports: PyTorch, TensorFlow and integrated ONNX runtime
  • Opensource code
Image Zoom
ZenDNN Diagram

Getting Started

Below is a comprehensive ZenDNN User Guide that covers the release highlights and installation instructions for PyTorch and TensorFlow. For the performance turning enthusiasts, learn about extra tips and tricks under the Performance Tuning chapter.

Documentation

ZenDNN Library: https://github.com/amd/ZenDNN
ZenDNN Plugin for PyTorch: https://github.com/amd/ZenDNN-pytorch-plugin
ZenDNN Plugin for TensorFlow: https://github.com/amd/ZenDNN-tensorflow-plugin

Blogs and Media

AMD ZenDNN Explained: AI Inferencing Power You Didn't Know You Had
woman watching computer training
ZenDNN Blogs

Get started with ZenDNN to enhance AI performance on AMD EPYC™ server CPUs.

To read more about current and previous releases, see the AMD Technical Articles and Blogs.

What’s New 

5.1 Release Highlights

Framework Compatibility

  • PyTorch & TensorFlow: We've added full compatibility with PyTorch 2.7 and TensorFlow 2.19, ensuring seamless integration with the latest versions of these leading AI frameworks.
  • vLLM + zentorch Plugin: The new zentorch plugin for vLLM delivers a significant performance uplift of up to 21% on a variety of models compared to vLLM-IPEX. 
  • Java® Integration: We've enabled support for PluggableDevice in TensorFlow-Java, a feature essential for zentf functionality. This feature has been officially contributed and upstreamed to the TensorFlow-Java repository, strengthening its core capabilities. For more details, please see the TensorFlow-Java integration Blog.

Performance Optimizations

  • Recommender Systems: We've introduced several key optimizations to boost the performance of recommender models, such as DLRMv2.
    • EmbeddingBag Improvements: EmbeddingBag Improvements: New "out" variants of EmbeddingBag and related operators now write directly to a shared output buffer, eliminating the need for a separate concatenation operation and improving efficiency. 
    • Concat Optimization: We've introduced a new optimization that fuses the concatenation operation after Bottom MLP and EmbeddingBag, for the DLRMv2 model. 
  • New Operator Fusions: We've added new operator fusions to accelerate common computational patterns, resulting in a 25% performance uplift for the DIEN BF16 model. 
    • MatMul + BiasAdd + Tanh 
    • MatMul + BiasAdd + Sigmoid
  • Kernel Optimizations:
    • BF16/FP32 MatMul: A new kernel for BF16/FP32 matrix multiplication has been introduced that eliminates overheads in less compute-intensive GEMM operations, leading to improved performance of the DIEN model.
    • Ahead of Time (AOT) Reorder: We now support AOT reordering for MatMul kernels across INT8, BF16, and FP32 data types.
  • ZenDNN Enhancements: Added support for MatMul(+fused) Low Overhead API (LOA) to improve performance of small matrix shapes, further improving performance and efficiency.

Ecosystem Contribution

  • We are actively contributing our optimization work directly to the core PyTorch codebase, as well as the PluggableDevice feature to the TensorFlow-Java repository. These regular upstream contributions strengthen the native performance and capabilities of both frameworks, benefiting the entire community.

5.0.2 Release Highlights

  • Framework Compatibility: Fully compatible with PyTorch 2.6 and TensorFlow 2.18.
  • Java® Integration: Introduces a Java interface to the TensorFlow plugin (zentf) via TensorFlow Java.
  • Optimized Quantized Model Support: Enhanced performance for INT8/INT4-quantized DLRM models.

5.0.1 Release Highlights

  • Compatible with deep-learning frameworks: Aligned closely with PyTorch 2.5 and TensorFlow 2.18, helping ensure smooth upgrades and interoperability.
  • Efficient Model Execution: Added support for INT8/INT4-quantized DLRM models in zentorch, unlocking faster inference with lower memory usage compared to BF16-precision. This release supports the MLPerf® version of DLRMv2; support for generic models are planned for the next release.

5.0 Release Highlights

  • Support for 5th Gen AMD EPYC™ processors, formerly codenamed “Turin”
  • Framework Support: PyTorch 2.4.0, TensorFlow 2.17 and ONNXRT 1.19.2
  • New APIs in the ZenDNN Plugin for PyTorch (zentorch), such as zentorch.llm.optimize() and zentorch.load_woq_model(), for enhanced LLM performance
  • Enhanced matmul operators and fusions and a new BF16 auto-tuning algorithm targeted for generative LLMs.
  • An optimized Scalar Dot Product Attention operator including-KV cache performance optimizations tailored to AMD EPYC™ cache architectures
  • Support for INT4 Weight-Only-Quantization (WOQ)
  • Improved Model Support: Llama3.1 and 3.2, Phi3, ChatGLM3, Qwen2, GPT-J
  • And more!

Please consult each plugin’s Release Highlight section in the ZenDNN User Guide for a comprehensive list of updates.  

Release Blog

Get Assistance for Current Projects

If you need technical support on ZenDNN, please file an issue ticket on the respective Github page: 

Sign Up for ZenDNN News

Keep up-to-date on the latest product releases, news, and tips.

Binaries Download Links:

ZenDNN Plug-in for PyTorch Description MD5SUM
ZENTORCH_v5.1.0_Python_v3.10.zip This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.10 da8bf1d3d4f5975ef17d8bffab790f55
ZENTORCH_v5.1.0_Python_v3.11.zip This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.11 608699f120980469bd2fc8c5cfb6395f
ZENTORCH_v5.1.0_Python_v3.12.zip This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.12 d119f6e6551083663486367170ebca4d
ZENTORCH_v5.1.0_Python_v3.13.zip This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.13 f8abf43aff5af4790a2977972fbffca0
ZENTORCH_v5.1.0_Python_v3.9.zip This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.9 0f858c78272de09b95719c76ab68d1b4
ZenDNN Plug-in for TensorFlow Description MD5SUM
ZENTF_v5.1.0_Python_v3.10.zip This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.10 885c7df96ec9b3f2b3302ab3f7bfdef1
ZENTF_v5.1.0_Python_v3.11.zip This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.11 2a77a62d110ae96244f1dc34ddcbda64
ZENTF_v5.1.0_Python_v3.12.zip This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.12 81799cd80331155555105efdf10f10aa
ZENTF_v5.1.0_Python_v3.9.zip This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.9 745fbefd95f761a0fd0cac3bd2339289
ZENTF_v5.1.0_C++_API.zip This zip file contains the ZenDNN TensorFlow Plug-in with C++ APIs 1824f52455c76888fca433028f943414

Binaries are available on the PyPI repository as well and below are the links:
ZenTF: https://pypi.org/project/zentf/
ZenTorch : https://pypi.org/project/zentorch/
Refer to the user guide for more details.

Archive Access: For those requiring versions up to ZenDNN 5.0.2, our archives provide easy access to previous releases, ensuring you have the tools you need for any project.