[UPDATED HOW-TO] Running Optimized Automatic1111 Stable Diffusion WebUI on AMD GPUs

Sep 08, 2023

[UPDATE]: TheAutomatic1111-directML branch now supports Microsoft Olive under the Automatic1111 WebUI interface, which allows for generating optimized models and running them all under the Automatic1111 WebUI, without a separatebranch needed to optimize for AMD platforms. The original blog with additional instructions on how to manually generate and run Stable Diffusion Automatic1111 with Olive Optimizations is available here - ORIGINAL HOW-TO GUIDE

Prepared by Hisham Chowdhury (AMD),Lucas Neves (AMD), andJustin Stoecker (Microsoft)

Did you know you can enable Stable Diffusion with Microsoft Olive under Automatic1111(Xformer) to get a significant speedup via Microsoft DirectML on Windows? Microsoft and AMD have been working together to optimize the Olive path on AMD hardware, accelerated via the Microsoft DirectML platform API and the AMD User Mode Driver’s ML (Machine Learning) software layer allowing users access to the power of the AMD GPU’s AI (Artificial Intelligence) capabilities.

图像缩放

1. Prerequisites

Installed Git (Git for Windows)
Installed Anaconda/Miniconda (Miniconda for Windows)
- Ensure Anaconda/Miniconda directory is added to PATH
Platform having AMD Graphics Processing Units (GPU)
- Driver: AMD Software: Adrenalin Edition™ 23.7.2 or newer (https://www.amd.com/en/support)

2. Overview of Microsoft Olive

Olive is a Python tool that can be used to convert, optimize, quantize, and auto-tune models for optimal inference performance with ONNX Runtime execution providers like DirectML. Olive greatly simplifies model processing by providing a single toolchain to compose optimization techniques, which is especially important with more complex models like Stable Diffusion that are sensitive to the ordering of optimization techniques. The DirectML sample for Stable Diffusion applies the following techniques:

Model conversion: translates the base models from PyTorch to ONNX.
Transformer graph optimization: fuses subgraphs into multi-head attention operators and eliminating inefficient from conversion.
Quantization: converts most layers from FP32 to FP16 to reduce the model's GPU memory footprint and improve performance.

Combined, the above optimizations enable DirectML to leverage AMD GPUs for greatly improved performance when performing inference with transformer models like Stable Diffusion.

3. Generate and Run Olive Optimized Stable Diffusion Models with Automatic1111 WebUI on AMD GPUs

Here is how to generate Microsoft Olive optimized stable diffusion model and run it using Automatic1111 WebUI:

Open Anaconda/Miniconda Terminal.
Enter the following commands in the terminal, followed by the enter key, to install Automatic1111 WebUI
- conda create --name Automatic1111_olive python=3.10.6
- conda activate Automatic1111_olive
- git clone https://github.com/lshqqytiger/stable-diffusion-webui-directml
- cd stable-diffusion-webui-directml
- git submodule update --init --recursive
- webui.bat --onnx --backend directml
  - This step will install all its dependencies needed for olive, onnxruntime, other packages and start up, this may take a few minutes.
CTRL+CLICK on the URL following "Running on local URL:" to run the WebUI

If the WebUI returns an error on the first run, follow below instructions:

cd stable-diffusion-webui-directml\venv\Scripts
pip install httpx==0.24.1

Go to the Olive optimization tab and start the optimization pass

图像缩放

Select the optimized model that will appear in the checkpointdropdown

图像缩放

Go to the "txt2img" tab and run your inference!

图像缩放

图像缩放

Running on the default PyTorch path, the AMD Radeon RX 7900 XTX delivers1.87 iterations/second.

Running on the optimized model with Microsoft Olive, the AMD Radeon RX 7900 XTX delivers18.59 iterations/second.

End Result is up to 9.9X improvementin performance on AMD Radeon™ RX 7900 XTX.

3.1 Stable Diffusion XL on AMD Radeon Graphics Cards

Note: Stable Diffusion XL requires lot more memory than Stable Diffusion 1.5 so its recommended to use system with 16GB or higher VRAM

To run Stable Diffusion XL version from Stability AI

Go to the Olive Optimization tab
Start the optimization pass and change
- ONNX Model ID = stabilityai/stable-diffusion-xl-base-1.0
- Uncheck “Safety Checker”

图像缩放

Select the optimized model that will appear in the checkpointdropdown

图像缩放

Go to the "txt2img" tab and run your inference!

Article By

Adit Bhutani

white pearl gradient medium color divider

Related Blogs

View All Blogs

附注

4. Disclaimers & Footnotes

Links to third-party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites, and no endorsement is implied. GD-98Testing conducted by AMD as of August 15th, 2023, on a test system configured with a Ryzen9 7950X 3D(4.2GHz) CPU, 32GB DDR5, Radeon RX 7900XTX GPU, Windows 11 Pro, with AMD Software: Adrenalin Edition 23.7.2, using the application Stable Diffusion 1.5 with Microsoft Olive under Automatic 1111 vs. Default Automatic 1111. Performance may vary. System manufacturers may vary configurations, yielding different results. RS-587The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. GD-18. THIS INFORMATION IS PROVIDED ‘AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.Copyright 2023 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Radeon, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Microsoft is a registered trademark of Microsoft Corporation in the US and/or other countries. Other product names used in this publication are for identification purposes only and may be trademarks of their respective owners. Other product names used in this publication are for identification purposes only and may be trademarks of their respective owners.

数据中心

商用系统

个人和游戏

嵌入式产品

资源

加速器

自适应加速器

DPU 加速器

以太网适配器

工作站

台式机

笔记本电脑

资源

自适应 SoC 和 FPGA

模块化系统 (SOM)

技术

开发者资源

评估板与套件

处理器工具

显卡工具和应用

自适应 SoC 和 FPGA

IP 与应用

GPU 加速器工具和应用

概要

面向数据中心和云计算

面向边缘计算和终端

面向开发人员

行业

行业

行业

行业

Industrias

工作负载

游戏

系统

技术

资源

EPYC（霄龙）处理器

Radeon 显卡与 AMD 芯片组

FPGA 和自适应 SoC

Alveo 加速器和 Kria SOM

锐龙处理器

以太网适配器

概要

处理器

加速器

自适应 SoC、FPGA 和 SOM

显卡

概要

资源按市场领域

资源按产品

资源按类型

关于我们的合作伙伴

AMD 全球支持

处理器与显卡

加速器

FPGA 与自适应 SoC

选择我们的零售合作伙伴

自适应和嵌入式计算

Get AMD Fan Gear

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

[UPDATED HOW-TO] Running Optimized Automatic1111 Stable Diffusion WebUI on AMD GPUs

Article By

Related Blogs