AMD ZenDNN

Overview

ZenDNN is a deep neural network acceleration inference library optimized for AMD “Zen” CPU architecture. ZenDNN library comprises of a set of fundamental building blocks and APIs designed to enhance performance for AI inference applications primarily targeting AMD EPYC™ server CPUs. ZenDNN plugs into mainstream AI frameworks offering developers a seamless experience in developing cutting edge AI applications. This library continues to redefine deep learning performance on AMD EPYC™ CPUs, combining relentless optimization, innovative features, and leading-edge support for modern workloads.

ZenDNN at a Glance

  • Delivers high performance over diverse AI workloads such as LLMs, NLP, Vision, and Recommendation Systems without significant engineering efforts offering ease of integration into existing x86 DL environment
  • Provides freedom of vendor choice by building upon open-source projects such as oneDNN. ZenDNN offers zero to minimal code modifications for existing x86 applications and at the same time supports additional APIs designed to deliver higher performance
  • ZenDNN is optimized to benefit from higher core counts and large L3 caches on AMD EPYC CPUs helping users derive TCO advantages.

ZenDNN Provides:​

  • Efficient multi-threading on large number of CPU cores
  • Enhanced microkernels for efficient low level math operations
  • Optimized Mempools
  • Comprehensive graph optimizations and kernel fusions
  • Broad framework supports: PyTorch, TensorFlow and integrated ONNX runtime
  • Opensource code
Image Zoom
ZenDNN Advantage on AMD EPYC™ Processors

Getting Started

Below is a comprehensive ZenDNN User Guide that covers the release highlights and installation instructions for PyTorch and TensorFlow. For the performance tuning enthusiasts, learn about extra tips and tricks under the Performance Tuning chapter. To read more about current and previous releases, check out the ZenDNN Release Blog tab.

Documentation

ZenDNN Library: https://github.com/amd/ZenDNN
ZenDNN Plugin for PyTorch: https://github.com/amd/ZenDNN-pytorch-plugin
ZenDNN Plugin for TensorFlow: https://github.com/amd/ZenDNN-tensorflow-plugin

Blogs and Media

AMD ZenDNN Explained: AI Inferencing Power You Didn't Know You Had
woman watching computer training
ZenDNN Blogs

Get started with ZenDNN to enhance AI performance on AMD EPYC™ server CPUs.

To read more about current and previous releases, see the AMD Technical Articles and Blogs.

What’s New 

5.2 Release Highlights

ZenDNN Extension for PyTorch (zentorch):

PyTorch Version Support

  • PyTorch 2.10.0: Primary support with optimal performance (available via PyPI)
  • Python 3.10 - 3.13: Full compatibility with the supported Python versions of PyTorch

Improvements

1. vLLM Integration

  • vLLM-ZenTorch Plugin: Zero-code-changes Plug-and-play automatic acceleration for vLLM V1 inference engine
  • vLLM Version Support: vLLM 0.12.0 to 0.15.1

2. Quantized Inference Support

    LLM Quantization (Weight-Only Quantization) (Experimental):INT4 quantized inference functional support
    RecSys Quantization (DLRM-v2):

  • Embedding tables: UINT4 asymmetric per-channel weight-only quantization
  • Linear layers: W8A8 quantization (INT8 symmetric per-channel for weights, UINT8 asymmetric per-tensor for activations)
  • PyTorch 2 Export (PT2E) quantization framework with performance optimizations
  • Custom EmbeddingBagUInt4Quantizer for embedding quantization
  • X86InductorQuantizer for linear layer quantization

3. Performance Optimizations

  • Improved bfloat16 Performance: AMD EPYC™ specific enhancements for bfloat16 operations
  • Enhanced Operations with LOA: Low Overhead API optimizations for improved performance
  • Optimized Embedding Kernels: Enhanced embedding bag operations with group op support
  • Graph Optimizations: Advanced pattern identification and replacement, concat operation folding support

4. Infrastructure and Testing

  • Hypothesis Testing Framework: Expanded test coverage with property-based testing
  • NumPy 2.x Compatibility: Updated scripts for NumPy 2.x support
  • TORCH_COMPILE_DEBUG Support: Full compatibility with PyTorch debugging tools
  • Integrated with New ZenDNN Library: Updated to new ZenDNN library with self-managed dependency building

5. Documentation

  • Updated README: Comprehensive documentation updates including:
  • vLLM plugin usage instructions
  • Weight-only quantization guide
  • Profiler output interpretation
  • Updated examples and usage patterns
  • Example Scripts: Added DLRM-v2 quantization example scripts

ZenDNN Extension for TensorFlow (zentf):

TensorFlow Version Support

  • TensorFlow 2.20.0: Primary support with optimal performance (available via PyPI and CPP package)
  • TensorFlow-Java main(75402bef): Java User interface - Fully supported (available via source build only)
  • Python 3.9 - 3.13: Full compatibility with the supported Python versions of TensorFlow

Improvements

1. TensorFlow 2.20.0 Integration

  • zentf 5.2.0 is built for and validated against TensorFlow v2.20.0.
  • Bazel 7.4.1: Upgraded from Bazel 5.3-6.5 range to a single supported version (7.4.1).
  • Python 3.9 - 3.13: Extended Python version support to include Python 3.13.
  • As TF JAVA is not released with 2.20.0 version, zentf is supported with main(75402bef) branch from TensorFlow-Java through source build only.

2. Migrate from legacy ZenDNN library to ZenDNNL

  •  CMake-based ZenDNNL integration using rules_foreign_cc.
  • All operator kernels (MatMul, Conv2D, BatchMatMul, Softmax, Pooling) have been rewritten to use the ZenDNNL Low Overhead API (LOA), replacing the legacy ZenDNN primitives.
  • Old third-party dependencies on zen_dnn and amd_blis (BLIS) have been removed, replaced by ZenDNNL with integrated AOCL-DLP.

3. Removed Legacy Components

  • Mempool optimization has been completely removed and equivalent performance has been achieved using jemalloc as the memory allocator instead.
  • INT8 support has been removed.
  • Removal of non-performant ops: ZenTranspose, ZenReshape, Binary ops.

4. Performance Optimizations

  • Enhanced Operations with LOA: Low Overhead API optimizations for improved performance

Note: For further details on this release, please consult the User Guide.

5.1 Release Highlights

Framework Compatibility

  • PyTorch & TensorFlow: We've added full compatibility with PyTorch 2.7 and TensorFlow 2.19, ensuring seamless integration with the latest versions of these leading AI frameworks.
  • vLLM + zentorch Plugin: The new zentorch plugin for vLLM delivers a significant performance uplift of up to 21% on a variety of models compared to vLLM-IPEX.
  • Java® Integration: We've enabled support for PluggableDevice in TensorFlow-Java, a feature essential for zentf functionality. This feature has been officially contributed and upstreamed to the TensorFlow-Java repository, strengthening its core capabilities. For more details, please see the TensorFlow-Java integration Blog.

Performance Optimizations

  • Recommender Systems: We've introduced several key optimizations to boost the performance of recommender models, such as DLRMv2.
    • EmbeddingBag Improvements: New "out" variants of EmbeddingBag and related operators now write directly to a shared output buffer, eliminating the need for a separate concatenation operation and improving efficiency.
    • Concat Optimization: We've introduced a new optimization that fuses the concatenation operation after Bottom MLP and EmbeddingBag, for the DLRMv2 model.
  • New Operator Fusions: We've added new operator fusions to accelerate common computational patterns, resulting in a 25% performance uplift for the DIEN BF16 model.
    • MatMul + BiasAdd + Tanh
    • MatMul + BiasAdd + Sigmoid
  • Kernel Optimizations:
    • BF16/FP32 MatMul: A new kernel for BF16/FP32 matrix multiplication has been introduced that eliminates overheads in less compute-intensive GEMM operations, leading to improved performance of the DIEN model.
    • Ahead of Time (AOT) Reorder: We now support AOT reordering for MatMul kernels across INT8, BF16, and FP32 data types.
  • ZenDNN Enhancements: Added support for MatMul(+fused) Low Overhead API (LOA) to improve performance of small matrix shapes, further improving performance and efficiency.

Ecosystem Contribution

  • We are actively contributing our optimization work directly to the core PyTorch codebase, as well as the PluggableDevice feature to the TensorFlow-Java repository. These regular upstream contributions strengthen the native performance and capabilities of both frameworks, benefiting the entire community.

5.0.2 Release Highlights

  • Framework Compatibility: Fully compatible with PyTorch 2.6 and TensorFlow 2.18.
  • Java® Integration: Introduces a Java interface to the TensorFlow plugin (zentf) via TensorFlow Java.
  • Optimized Quantized Model Support: Enhanced performance for INT8/INT4-quantized DLRM models.

5.0.1 Release Highlights

  • Compatible with deep-learning frameworks: Aligned closely with PyTorch 2.5 and TensorFlow 2.18, helping ensure smooth upgrades and interoperability.
  • Efficient Model Execution: Added support for INT8/INT4-quantized DLRM models in zentorch, unlocking faster inference with lower memory usage compared to BF16-precision. This release supports the MLPerf® version of DLRMv2; support for generic models are planned for the next release.

5.0 Release Highlights

  • Support for 5th Gen AMD EPYC™ processors, formerly codenamed “Turin”
  • Framework Support: PyTorch 2.4.0, TensorFlow 2.17 and ONNXRT 1.19.2
  • New APIs in the ZenDNN Plugin for PyTorch (zentorch), such as zentorch.llm.optimize() and zentorch.load_woq_model(), for enhanced LLM performance
  • Enhanced matmul operators and fusions and a new BF16 auto-tuning algorithm targeted for generative LLMs.
  • An optimized Scalar Dot Product Attention operator including-KV cache performance optimizations tailored to AMD EPYC™ cache architectures
  • Support for INT4 Weight-Only-Quantization (WOQ)
  • Improved Model Support: Llama3.1 and 3.2, Phi3, ChatGLM3, Qwen2, GPT-J
  • And more!

Please consult each plugin’s Release Highlight section in the ZenDNN User Guide for a comprehensive list of updates.  

Release Blog

Get Assistance for Current Projects

If you need technical support on ZenDNN, please file an issue ticket on the respective Github page: 

Binaries Download Links:

ZenDNN Plug-in for PyTorch Description MD5SUM
ZENTORCH_v5.2.0_Python_v3.10.zip This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.10 01824257b50c1dae2d43bc120d731634
ZENTORCH_v5.2.0_Python_v3.11.zip This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.11 17d1524dc923a6bba5220fc576227379
ZENTORCH_v5.2.0_Python_v3.12.zip This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.12 3cd58ddf52352e66f48b4d18ccfa4365
ZENTORCH_v5.2.0_Python_v3.13.zip This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.13 6cca83b569f36f791bd5a421387ec8c2
ZenDNN Plug-in for TensorFlow Description MD5SUM
ZENTF_v5.2.0_Python_v3.10.zip This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.10 507512ca2cd5f24b32fb595cea4c0b24
ZENTF_v5.2.0_Python_v3.11.zip This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.11 5a0f784585ebb3441ef892841befb333
ZENTF_v5.2.0_Python_v3.12.zip This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.12 3c2b0e2e16e6634cfe1fe7bee6363991
ZENTF_v5.2.0_Python_v3.13.zip This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.13 ca1965bd065587c1397b199dce29ba45
ZENTF_v5.2.0_Python_v3.9.zip This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.9 5e827f66ed4afff6c84396d9596a4aff
ZENTF_v5.2.0_C++_API.zip This zip file contains the ZenDNN TensorFlow Plug-in with C++ APIs d4c0888f9037ede2d83f05449009222b

Binaries are available on the PyPI repository as well and below are the links:
ZenTF: https://pypi.org/project/zentf/
ZenTorch : https://pypi.org/project/zentorch/
Refer to the user guide for more details.

Archive Access: For those requiring versions up to ZenDNN 5.1, our archives provide easy access to previous releases, ensuring you have the tools and resources you need for any project.

Sign Up for ZenDNN News

Keep up-to-date on the latest product releases, news, and tips.