[UPDATED HOW-TO] Running Optimized Automatic1111 Stable Diffusion WebUI on AMD GPUs
Sep 08, 2023

[UPDATE]: TheAutomatic1111-directML branch now supports Microsoft Olive under the Automatic1111 WebUI interface, which allows for generating optimized models and running them all under the Automatic1111 WebUI, without a separatebranch needed to optimize for AMD platforms. The original blog with additional instructions on how to manually generate and run Stable Diffusion Automatic1111 with Olive Optimizations is available here - ORIGINAL HOW-TO GUIDE
Prepared by Hisham Chowdhury (AMD),Lucas Neves (AMD), andJustin Stoecker (Microsoft)
Did you know you can enable Stable Diffusion with Microsoft Olive under Automatic1111(Xformer) to get a significant speedup via Microsoft DirectML on Windows? Microsoft and AMD have been working together to optimize the Olive path on AMD hardware, accelerated via the Microsoft DirectML platform API and the AMD User Mode Driver’s ML (Machine Learning) software layer allowing users access to the power of the AMD GPU’s AI (Artificial Intelligence) capabilities.
- Installed Git (Git for Windows)
- Installed Anaconda/Miniconda (Miniconda for Windows)
- Ensure Anaconda/Miniconda directory is added to PATH
- Platform having AMD Graphics Processing Units (GPU)
- Driver: AMD Software: Adrenalin Edition™ 23.7.2 or newer (https://www.amd.com/en/support)
Olive is a Python tool that can be used to convert, optimize, quantize, and auto-tune models for optimal inference performance with ONNX Runtime execution providers like DirectML. Olive greatly simplifies model processing by providing a single toolchain to compose optimization techniques, which is especially important with more complex models like Stable Diffusion that are sensitive to the ordering of optimization techniques. The DirectML sample for Stable Diffusion applies the following techniques:
- Model conversion: translates the base models from PyTorch to ONNX.
- Transformer graph optimization: fuses subgraphs into multi-head attention operators and eliminating inefficient from conversion.
- Quantization: converts most layers from FP32 to FP16 to reduce the model's GPU memory footprint and improve performance.
Combined, the above optimizations enable DirectML to leverage AMD GPUs for greatly improved performance when performing inference with transformer models like Stable Diffusion.
Here is how to generate Microsoft Olive optimized stable diffusion model and run it using Automatic1111 WebUI:
- Open Anaconda/Miniconda Terminal.
- Enter the following commands in the terminal, followed by the enter key, to install Automatic1111 WebUI
- conda create --name Automatic1111_olive python=3.10.6
- conda activate Automatic1111_olive
- git clone https://github.com/lshqqytiger/stable-diffusion-webui-directml
- cd stable-diffusion-webui-directml
- git submodule update --init --recursive
- webui.bat --onnx --backend directml
- This step will install all its dependencies needed for olive, onnxruntime, other packages and start up, this may take a few minutes.
- CTRL+CLICK on the URL following "Running on local URL:" to run the WebUI
If the WebUI returns an error on the first run, follow below instructions:
- cd stable-diffusion-webui-directml\venv\Scripts
- pip install httpx==0.24.1
- Go to the Olive optimization tab and start the optimization pass
Running on the default PyTorch path, the AMD Radeon RX 7900 XTX delivers1.87 iterations/second.
Running on the optimized model with Microsoft Olive, the AMD Radeon RX 7900 XTX delivers18.59 iterations/second.
To run Stable Diffusion XL version from Stability AI
- Go to the Olive Optimization tab
- Start the optimization pass and change
- ONNX Model ID = stabilityai/stable-diffusion-xl-base-1.0
- Uncheck “Safety Checker”
- Select the optimized model that will appear in the checkpointdropdown
