AMD Zen Deep Neural Network (ZenDNN)
Overview
ZenDNN is a deep neural network acceleration inference library optimized for AMD “Zen” CPU architecture. ZenDNN library comprises of a set of fundamental building blocks and APIs designed to enhance performance for AI inference applications primarily targeting AMD EPYC™ server CPUs. ZenDNN plugs into mainstream AI frameworks offering developers a seamless experience in developing cutting edge AI applications. This library continues to redefine deep learning performance on AMD EPYC™ CPUs, combining relentless optimization, innovative features, and leading-edge support for modern workloads.
ZenDNN at a Glance
- Delivers high performance over diverse AI workloads such as LLMs, NLP, Vision, and Recommendation Systems without significant engineering efforts offering ease of integration into existing x86 DL environment
- Provides freedom of vendor choice by building upon open-source projects such as oneDNN. ZenDNN offers zero to minimal code modifications for existing x86 applications and at the same time supports additional APIs designed to deliver higher performance
- ZenDNN is optimized to benefit from higher core counts and large L3 caches on AMD EPYC CPUs helping users derive TCO advantages.
ZenDNN Provides:
- Efficient multi-threading on large number of CPU cores
- Enhanced microkernels for efficient low level math operations
- Optimized Mempools
- Comprehensive graph optimizations and kernel fusions
- Broad framework supports: PyTorch, TensorFlow and integrated ONNX runtime
- Opensource code
Getting Started
Below is a comprehensive ZenDNN User Guide that covers the release highlights and installation instructions for PyTorch and TensorFlow. For the performance tuning enthusiasts, learn about extra tips and tricks under the Performance Tuning chapter. To read more about current and previous releases, check out the ZenDNN Release Blog tab.
Documentation
ZenDNN Library: https://github.com/amd/ZenDNN
ZenDNN Plugin for PyTorch: https://github.com/amd/ZenDNN-pytorch-plugin
ZenDNN Plugin for TensorFlow: https://github.com/amd/ZenDNN-tensorflow-plugin
Blogs and Media
AMD ZenDNN Explained: AI Inferencing Power You Didn't Know You Had
ZenDNN Blogs
Get started with ZenDNN to enhance AI performance on AMD EPYC™ server CPUs.
To read more about current and previous releases, see the AMD Technical Articles and Blogs.
What’s New
- 5.2
- 5.1
- 5.0.2
- 5.0.1
- 5.0
5.2 Release Highlights
ZenDNN Extension for PyTorch (zentorch):
PyTorch Version Support
- PyTorch 2.10.0: Primary support with optimal performance (available via PyPI)
- Python 3.10 - 3.13: Full compatibility with the supported Python versions of PyTorch
Improvements
1. vLLM Integration
- vLLM-ZenTorch Plugin: Zero-code-changes Plug-and-play automatic acceleration for vLLM V1 inference engine
- vLLM Version Support: vLLM 0.12.0 to 0.15.1
2. Quantized Inference Support
LLM Quantization (Weight-Only Quantization) (Experimental):INT4 quantized inference functional support
RecSys Quantization (DLRM-v2):
- Embedding tables: UINT4 asymmetric per-channel weight-only quantization
- Linear layers: W8A8 quantization (INT8 symmetric per-channel for weights, UINT8 asymmetric per-tensor for activations)
- PyTorch 2 Export (PT2E) quantization framework with performance optimizations
- Custom EmbeddingBagUInt4Quantizer for embedding quantization
- X86InductorQuantizer for linear layer quantization
3. Performance Optimizations
- Improved bfloat16 Performance: AMD EPYC™ specific enhancements for bfloat16 operations
- Enhanced Operations with LOA: Low Overhead API optimizations for improved performance
- Optimized Embedding Kernels: Enhanced embedding bag operations with group op support
- Graph Optimizations: Advanced pattern identification and replacement, concat operation folding support
4. Infrastructure and Testing
- Hypothesis Testing Framework: Expanded test coverage with property-based testing
- NumPy 2.x Compatibility: Updated scripts for NumPy 2.x support
- TORCH_COMPILE_DEBUG Support: Full compatibility with PyTorch debugging tools
- Integrated with New ZenDNN Library: Updated to new ZenDNN library with self-managed dependency building
5. Documentation
- Updated README: Comprehensive documentation updates including:
- vLLM plugin usage instructions
- Weight-only quantization guide
- Profiler output interpretation
- Updated examples and usage patterns
- Example Scripts: Added DLRM-v2 quantization example scripts
ZenDNN Extension for TensorFlow (zentf):
TensorFlow Version Support
- TensorFlow 2.20.0: Primary support with optimal performance (available via PyPI and CPP package)
- TensorFlow-Java main(75402bef): Java User interface - Fully supported (available via source build only)
- Python 3.9 - 3.13: Full compatibility with the supported Python versions of TensorFlow
Improvements
1. TensorFlow 2.20.0 Integration
- zentf 5.2.0 is built for and validated against TensorFlow v2.20.0.
- Bazel 7.4.1: Upgraded from Bazel 5.3-6.5 range to a single supported version (7.4.1).
- Python 3.9 - 3.13: Extended Python version support to include Python 3.13.
- As TF JAVA is not released with 2.20.0 version, zentf is supported with main(75402bef) branch from TensorFlow-Java through source build only.
2. Migrate from legacy ZenDNN library to ZenDNNL
- CMake-based ZenDNNL integration using rules_foreign_cc.
- All operator kernels (MatMul, Conv2D, BatchMatMul, Softmax, Pooling) have been rewritten to use the ZenDNNL Low Overhead API (LOA), replacing the legacy ZenDNN primitives.
- Old third-party dependencies on zen_dnn and amd_blis (BLIS) have been removed, replaced by ZenDNNL with integrated AOCL-DLP.
3. Removed Legacy Components
- Mempool optimization has been completely removed and equivalent performance has been achieved using jemalloc as the memory allocator instead.
- INT8 support has been removed.
- Removal of non-performant ops: ZenTranspose, ZenReshape, Binary ops.
4. Performance Optimizations
- Enhanced Operations with LOA: Low Overhead API optimizations for improved performance
Note: For further details on this release, please consult the User Guide.
5.1 Release Highlights
Framework Compatibility
- PyTorch & TensorFlow: We've added full compatibility with PyTorch 2.7 and TensorFlow 2.19, ensuring seamless integration with the latest versions of these leading AI frameworks.
- vLLM + zentorch Plugin: The new zentorch plugin for vLLM delivers a significant performance uplift of up to 21% on a variety of models compared to vLLM-IPEX.
- Java® Integration: We've enabled support for PluggableDevice in TensorFlow-Java, a feature essential for zentf functionality. This feature has been officially contributed and upstreamed to the TensorFlow-Java repository, strengthening its core capabilities. For more details, please see the TensorFlow-Java integration Blog.
Performance Optimizations
- Recommender Systems: We've introduced several key optimizations to boost the performance of recommender models, such as DLRMv2.
- EmbeddingBag Improvements: New "out" variants of EmbeddingBag and related operators now write directly to a shared output buffer, eliminating the need for a separate concatenation operation and improving efficiency.
- Concat Optimization: We've introduced a new optimization that fuses the concatenation operation after Bottom MLP and EmbeddingBag, for the DLRMv2 model.
- New Operator Fusions: We've added new operator fusions to accelerate common computational patterns, resulting in a 25% performance uplift for the DIEN BF16 model.
- MatMul + BiasAdd + Tanh
- MatMul + BiasAdd + Sigmoid
- Kernel Optimizations:
- BF16/FP32 MatMul: A new kernel for BF16/FP32 matrix multiplication has been introduced that eliminates overheads in less compute-intensive GEMM operations, leading to improved performance of the DIEN model.
- Ahead of Time (AOT) Reorder: We now support AOT reordering for MatMul kernels across INT8, BF16, and FP32 data types.
- ZenDNN Enhancements: Added support for MatMul(+fused) Low Overhead API (LOA) to improve performance of small matrix shapes, further improving performance and efficiency.
Ecosystem Contribution
- We are actively contributing our optimization work directly to the core PyTorch codebase, as well as the PluggableDevice feature to the TensorFlow-Java repository. These regular upstream contributions strengthen the native performance and capabilities of both frameworks, benefiting the entire community.
5.0.2 Release Highlights
- Framework Compatibility: Fully compatible with PyTorch 2.6 and TensorFlow 2.18.
- Java® Integration: Introduces a Java interface to the TensorFlow plugin (zentf) via TensorFlow Java.
- Optimized Quantized Model Support: Enhanced performance for INT8/INT4-quantized DLRM models.
5.0.1 Release Highlights
- Compatible with deep-learning frameworks: Aligned closely with PyTorch 2.5 and TensorFlow 2.18, helping ensure smooth upgrades and interoperability.
- Efficient Model Execution: Added support for INT8/INT4-quantized DLRM models in zentorch, unlocking faster inference with lower memory usage compared to BF16-precision. This release supports the MLPerf® version of DLRMv2; support for generic models are planned for the next release.
5.0 Release Highlights
- Support for 5th Gen AMD EPYC™ processors, formerly codenamed “Turin”
- Framework Support: PyTorch 2.4.0, TensorFlow 2.17 and ONNXRT 1.19.2
- New APIs in the ZenDNN Plugin for PyTorch (zentorch), such as zentorch.llm.optimize() and zentorch.load_woq_model(), for enhanced LLM performance
- Enhanced matmul operators and fusions and a new BF16 auto-tuning algorithm targeted for generative LLMs.
- An optimized Scalar Dot Product Attention operator including-KV cache performance optimizations tailored to AMD EPYC™ cache architectures
- Support for INT4 Weight-Only-Quantization (WOQ)
- Improved Model Support: Llama3.1 and 3.2, Phi3, ChatGLM3, Qwen2, GPT-J
- And more!
Please consult each plugin’s Release Highlight section in the ZenDNN User Guide for a comprehensive list of updates.
Release Blog
Get Assistance for Current Projects
If you need technical support on ZenDNN, please file an issue ticket on the respective Github page:
- ZenDNN Library: https://github.com/amd/ZenDNN
- ZenDNN Plugin for PyTorch: https://github.com/amd/ZenDNN-pytorch-plugin
- ZenDNN Plugin for TensorFlow: https://github.com/amd/ZenDNN-tensorflow-plugin
- [Up to version 5.0]: ONNX Runtime with ZenDNN integrated: https://github.com/amd/ZenDNN-onnxruntime
Binaries Download Links:
| ZenDNN Plug-in for PyTorch | Description | MD5SUM |
| ZENTORCH_v5.2.0_Python_v3.10.zip | This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.10 | 01824257b50c1dae2d43bc120d731634 |
| ZENTORCH_v5.2.0_Python_v3.11.zip | This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.11 | 17d1524dc923a6bba5220fc576227379 |
| ZENTORCH_v5.2.0_Python_v3.12.zip | This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.12 | 3cd58ddf52352e66f48b4d18ccfa4365 |
| ZENTORCH_v5.2.0_Python_v3.13.zip | This zip file contains the zentorch wheel file and the necessary scripts to set up the environment variables. Compatible with Python version 3.13 | 6cca83b569f36f791bd5a421387ec8c2 |
| ZenDNN Plug-in for TensorFlow | Description | MD5SUM |
| ZENTF_v5.2.0_Python_v3.10.zip | This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.10 | 507512ca2cd5f24b32fb595cea4c0b24 |
| ZENTF_v5.2.0_Python_v3.11.zip | This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.11 | 5a0f784585ebb3441ef892841befb333 |
| ZENTF_v5.2.0_Python_v3.12.zip | This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.12 | 3c2b0e2e16e6634cfe1fe7bee6363991 |
| ZENTF_v5.2.0_Python_v3.13.zip | This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.13 | ca1965bd065587c1397b199dce29ba45 |
| ZENTF_v5.2.0_Python_v3.9.zip | This zip file contains the zentf wheel file and the necessary scripts to set up the environment variables. Compatible with Python 3.9 | 5e827f66ed4afff6c84396d9596a4aff |
| ZENTF_v5.2.0_C++_API.zip | This zip file contains the ZenDNN TensorFlow Plug-in with C++ APIs | d4c0888f9037ede2d83f05449009222b |
Binaries are available on the PyPI repository as well and below are the links:
ZenTF: https://pypi.org/project/zentf/
ZenTorch : https://pypi.org/project/zentorch/
Refer to the user guide for more details.
Archive Access: For those requiring versions up to ZenDNN 5.1, our archives provide easy access to previous releases, ensuring you have the tools and resources you need for any project.
Sign Up for ZenDNN News
Keep up-to-date on the latest product releases, news, and tips.