Running Stable Diffusion Image Generation on AMD GPU & Windows

AI image generation has been making a lot of buzz lately. It’s an exciting space due to the systems like DALL·E and Midjourney becoming available for people outside the AI world to play with. However, the one I find compelling, from a tech nerd viewpoint, is Stable Diffusion because it is open source and can run on local hardware. So, of course, I have to try it out and see how it worked. 

Getting Stable Diffusion running on my hardware is more complicated than at first glance. Out of the box, the project is designed to run on the PyTorch machine learning framework. That’s a problem for me because PyTorch only supports hardware acceleration on Windows using NVIDIA’s CUDA API. And I have an AMD GPU. Linux has a better chance of success because PyTorch supports AMD’s ROCm API, but that’s a project for another day. There is another option.

UPDATE November 2022:

Wow, things are really changing fast. In the two months since writing this article, improvements to Diffusers and ONNX have been incredible. Diffusers now has an ONNX pipeline for img2img generation and the overall speed of the pipeline is 2-4x faster on my GPU. The instructions below have been updated to make the setup easier to follow.

Background Info

See if you can follow along with this: the Hugging Face, INC team has integrated Stable Diffusion into their Diffusers project as a pipeline. Plus, they’ve created a script that can convert the Stable Diffusion pre-trained model into an ONNX compatible model. In addition, ONNX models can be used through Microsoft’s DirectML API. And finally, DirectML can be accelerated using any DirectX 12 compatible GPU, including AMD and Intel hardware.

With all that preface, here is how I got Stable Diffusion running on a Windows PC using an AMD GPU (RX6800). I can’t take all the credit. The GitHub user harishanand95 has contributed all the steps for setting up the Python environment for running Stable Diffusion on AMD hardware (both Windows and Linux). Their original instructions are available at To help simplify the generation part, I created a Python script to mimic the options offered by Stable Diffusion’s script file. But even that wasn’t an original piece of work. A blog post at helped me understand how to pass specific settings through the pipeline. Finally, the setup has been simplified thanks to a guide written by averad. Their guide is available at Stable Diffusion for AMD GPUs on Windows using DirectML and has been updated with a lot of great info including how to use customized weight checkpoint files with ONNX.

The Setup

  1. Install Git. Git is version control software, used for tracking code changes as well as pushing/pulling those changes from a central repository.
  2. Install a python environment. This can be as simple as installing Python on your system and running everything in a single environment. However, using Python virtual environments is recommended to isolate projects from each other and reduce the risk of package conflicts.
    • One of the easiest methods for setting up virtual environments is to use Conda. That is what the rest of the instructions will use.
  3. Go to , create a free account, and verify your email address. This is where the Stable Diffusion model is hosted.
    1. Go to the and click the “Access repository” button. By doing this, you are sharing your contact info with the Stable Diffusion authors.
    2. (If you want to use SD 1.5) Go to the and click the “Access repository” button. By doing this, you are sharing your contact info with the Stable Diffusion authors.
  4. Launch a Conda command prompt. Windows will have a new icon in the start menu labeled Anaconda Prompt (miniconda3).
  5. Download my repo using Git and go to the directory.
    git clone
    cd onnx-stablediffusion-scripts
  6. Create a new Conda environment using the provided config file. This may take a few minutes with little feedback as pip installs packages.
    conda env create --file environment.yml
  7. Activate the new environment.
    conda activate onnx
  8. Force install onnxruntime-directml. ¯\\_(ツ)_/¯ Don’t know why force install is required, but it worked for averad.
    • pip install onnxruntime-direcml --force-reinstall
    • Install the latest 1.14 nightly build of the ONNX DirectML runtime. The nightlys appear to have significant performance improvements over the released versions available through pip. (I’ve seen a 2-3x speed increase)
      1. Go to
      2. Click on the latest version available. Download the WHL file for your Python environment. If you used the environment file above to set up Conda, choose the `cp39` file (aka Python 3.9).
      3. Run this command Run the command `pip install “path to the downloaded WHL file” –force-reinstall` to install the package.
        pip install "path to the downloaded WHL file" --force-reinstall
  9. Download the weights for Stable Diffusion. You should be prompted to enter your Hugging Face username and password the first time. The clone job will take several minutes to process and download. Be patient.
    git clone --branch onnx --single-branch stable_diffusion_onnx
    • If you want to use the 1.5 version of Stable Diffusion, use this command repo instead.
      git clone --branch onnx --single-branch stable_diffusion_onnx
  10. The environment setup is now complete.

How to generate an image

Now that the Python environment is setup, the last thing to do is try to generate an image. That’s where my script makes it easy.

  1. Launch a Conda command prompt. Windows will have a new icon in the start menu labeled Anaconda Prompt (miniconda3).
  2. Activate the Python environment created earlier.
    conda activate onnx
  3. Change do the correct directory.
    cd "path to onnx-stablediffusion-scripts folder"
  4. Run the following command to generate an image. The prompt can be change to whatever text you want.
    python –prompt "astronaut riding a horse" --random_seed True

Example Commands

python --prompt "astronaut riding a horse" --random_seed True
seed: 3844704755
100%|███████████████████████████████| 51/51 [00:26<00:00,  1.90it/s]

python --prompt "cowboy riding a pig" --init_img ".\outputs\txt2img-samples\samples\00001.png" --random_seed true
loaded input image of size (512, 512) from .\outputs\txt2img-samples\samples\00292.png
seed: 2912103985
100%|███████████████████████████████| 50/50 [00:24<00:00,  2.01it/s]

Advance script usage

As you play around with Stable Diffusion, you may want to adjust some parameters passed to the model. Check the README file in my GitHub Repo

  • ddim_steps changes how many times the image is passed through the model. Anything less than 35 will start to return images with more artifacts. Using a setting higher than 50 may produce a better image, but it could also go too far and introduce artifacts again.
  • H & W adjusts the size of the image returned. 512 is the ideal size due to how the model is built.
  • n_samples determines how many images to generate at the same time. The higher the number, the more time and RAM it takes to complete. See notes below.
  • seed allows you to pick a specific seed number. If you want to recreate the same image later, you must know this number.
  • random_seed picks a number between 1 and 2^32. Different seeds will generate different images from the same prompt.
  • hardware allows you to choose between GPU and CPU processing. Yes, if you want, you can run this pipeline on a CPU. The process is slower but still useful.
  • loop tells the script how many times to run. Due to the RAM issues caused by the ONNX pipeline, it’s impossible to use a high n_samples number. Instead, the script can open the pipeline once and feed the parameters with a new seed to generate multiple images.
  • log outputs all the settings for the image generation to a CSV file. This allows you to keep track of what settings you used for generating an image and recreate it if desired.

As noted at, running through the ONNX pipeline uses more memory than the regular PyTorch pipeline. This makes it hard to do more than one-two samples simultaneously, even for a GPU with 16GB RAM. If you have less than 16GB of VRAM. Use --random_seed True --loop [int] to loop through the pipeline and create as many images as you want from the same prompt.


  • Why are my images black?
    • The NSFW safety checker thought the image had naughty bits. Try a different seed number or prompt. I’ve also seen that reducing the image size can cause the safety checker to flag the image.

Have fun.