RTX 5090 not working with PyTorch and Stable Diffusion (sm_120 unsupported)

Hello,

I recently purchased a laptop with an Hello,

I recently purchased a laptop with an RTX 5090 GPU (Blackwell architecture), but unfortunately, it’s not usable with PyTorch-based frameworks like Stable Diffusion or ComfyUI. The current PyTorch builds do not support CUDA capability sm_120 yet, which results in errors or CPU-only fallback.

This is extremely disappointing for those of us who invested in high-end hardware expecting out-of-the-box support for AI tools.

Could NVIDIA please work closely with the PyTorch team to ensure official support as soon as possible?

Thanks for your attention.
(Blackwell architecture), but unfortunately, it’s not usable with PyTorch-based frameworks like Stable Diffusion or ComfyUI. The current PyTorch builds do not support CUDA capability sm_120 yet, which results in errors or CPU-only fallback.

This is extremely disappointing for those of us who invested in high-end hardware expecting out-of-the-box support for AI tools.

Could NVIDIA please work closely with the PyTorch team to ensure official support as soon as possible?

Thanks for your attention.

3 Likes

5060ti tried nightlies pytorch 577 driver comes with 12.9 cuda tried uninstall driver and using 12.8 with driver 571.96 no luck error CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

someone sleeping at the wheel?

can’t use forge flux

I have windows 11 and ran a script with the following results:

(base) C:\Windows\System32>conda activate forge

(forge) C:\Windows\System32>python
Python 3.10.18 | packaged by Anaconda, Inc. | (main, Jun 5 2025, 13:08:55) [MSC v.1929 64 bit (AMD64)] on win32
Type “help”, “copyright”, “credits” or “license” for more information.

import torch

print(“Torch version:”, torch.version)
Torch version: 2.6.0.dev20241112+cu121
print(“CUDA version:”, torch.version.cuda)
CUDA version: 12.1
print(“CUDA available:”, torch.cuda.is_available())
CUDA available: True
print(“Device count:”, torch.cuda.device_count())
Device count: 1

if torch.cuda.is_available():
… print(“Device name:”, torch.cuda.get_device_name(0))
… # Try to run a CUDA operation
… try:
… x = torch.tensor([1.0], device=“cuda”)
… print(“Tensor on GPU:”, x)
… except Exception as e:
… print(“CUDA operation failed:”, e)
… else:
… print(“CUDA is not available.”)

C:\Users\jlitw\miniconda3\envs\forge\lib\site-packages\torch\cuda_init_.py:235: UserWarning:
NVIDIA GeForce RTX 5060 Ti with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.
If you want to use the NVIDIA GeForce RTX 5060 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(
Device name: NVIDIA GeForce RTX 5060 Ti
Tensor on GPU: CUDA operation failed: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

jlitz

Add a comment

50XX was supported from Cuda 12.8.

You appear to have a version compiled with Cuda 12.1

Looking there, 12.8 is an option so you might want to install that.

yes i installed the 12.8 nightly just now results

I installed the 12.8 version and checked to confirm still get error on image generation attempt
INSTALLED 12.8
forgeflux) C:\FORGE\webui_forge_cu121_torch231>python -c “import torch; print(torch.version); print(torch.version.cuda)”
2.9.0.dev20250726+cu128
12.8

(forge) C:\Windows\System32>python
Python 3.10.18 | packaged by Anaconda, Inc. | (main, Jun 5 2025, 13:08:55) [MSC v.1929 64 bit (AMD64)] on win32
Type “help”, “copyright”, “credits” or “license” for more information.

import torch

print(“Torch version:”, torch.version)
Torch version: 2.9.0.dev20250729+cu128
print(“CUDA version:”, torch.version.cuda)
CUDA version: 12.8
print(“CUDA available:”, torch.cuda.is_available())
CUDA available: True
print(“Device count:”, torch.cuda.device_count())
Device count: 1

if torch.cuda.is_available():
… print(“Device name:”, torch.cuda.get_device_name(0))
… # Try to run a CUDA operation
… try:
… x = torch.tensor([1.0], device=“cuda”)
… print(“Tensor on GPU:”, x)
… except Exception as e:
… print(“CUDA operation failed:”, e)
… else:
… print(“CUDA is not available.”)

Device name: NVIDIA GeForce RTX 5060 Ti
Tensor on GPU: tensor([1.], device=‘cuda:0’)

THE COMPLETE RUN INFO

[Unload] Trying to free 1024.00 MB for cuda:0 with 1 models keep loaded … Current free memory is 9897.80 MB … Done.
Traceback (most recent call last):
File “C:\FORGE\webui_forge_cu121_torch231\webui\modules_forge\main_thread.py”, line 30, in work
self.result = self.func(*self.args, **self.kwargs)
File “C:\FORGE\webui_forge_cu121_torch231\webui\modules\txt2img.py”, line 131, in txt2img_function
processed = processing.process_images(p)
File “C:\FORGE\webui_forge_cu121_torch231\webui\modules\processing.py”, line 842, in process_images
res = process_images_inner(p)
File “C:\FORGE\webui_forge_cu121_torch231\webui\modules\processing.py”, line 962, in process_images_inner
p.setup_conds()
File “C:\FORGE\webui_forge_cu121_torch231\webui\modules\processing.py”, line 1601, in setup_conds
super().setup_conds()
File “C:\FORGE\webui_forge_cu121_torch231\webui\modules\processing.py”, line 503, in setup_conds
self.uc = self.get_conds_with_caching(prompt_parser.get_learned_conditioning, negative_prompts, total_steps, [self.cached_uc], self.extra_network_data)
File “C:\FORGE\webui_forge_cu121_torch231\webui\modules\processing.py”, line 474, in get_conds_with_caching
cache[1] = function(shared.sd_model, required_prompts, steps, hires_steps, shared.opts.use_old_scheduling)
File “C:\FORGE\webui_forge_cu121_torch231\webui\modules\prompt_parser.py”, line 189, in get_learned_conditioning
conds = model.get_learned_conditioning(texts)
File “C:\FORGE\webui_forge_cu121_torch231\system\python\lib\site-packages\torch\utils_contextlib.py”, line 115, in decorate_context
return func(*args, **kwargs)
File “C:\FORGE\webui_forge_cu121_torch231\webui\backend\diffusion_engine\flux.py”, line 86, in get_learned_conditioning
cond_l, pooled_l = self.text_processing_engine_l(prompt)
File “C:\FORGE\webui_forge_cu121_torch231\webui\backend\text_processing\classic_engine.py”, line 272, in call
z = self.process_tokens(tokens, multipliers)
File “C:\FORGE\webui_forge_cu121_torch231\webui\backend\text_processing\classic_engine.py”, line 305, in process_tokens
z = self.encode_with_transformers(tokens)
File “C:\FORGE\webui_forge_cu121_torch231\webui\backend\text_processing\classic_engine.py”, line 128, in encode_with_transformers
self.text_encoder.transformer.text_model.embeddings.position_embedding = self.text_encoder.transformer.text_model.embeddings.position_embedding.to(dtype=torch.float32)
File “C:\FORGE\webui_forge_cu121_torch231\system\python\lib\site-packages\torch\nn\modules\module.py”, line 1173, in to
return self._apply(convert)
File “C:\FORGE\webui_forge_cu121_torch231\system\python\lib\site-packages\torch\nn\modules\module.py”, line 804, in _apply
param_applied = fn(param)
File “C:\FORGE\webui_forge_cu121_torch231\system\python\lib\site-packages\torch\nn\modules\module.py”, line 1159, in convert
return t.to(
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions
.

I have no experience with pytorch, but I wonder if you still have part/all of some 12.1 version, as all the lines after “THE COMPLETE RUN INFO”, are prefixed:

File “C:\FORGE\webui_forge_cu121_torch231

which indicates 12.1 somewhere.

Fix: “Torch not compiled with CUDA enabled” in Automatic1111 on RTX 5090 (Windows)

This is a complete, reproducible fix for getting Automatic1111 Stable Diffusion WebUI to use the GPU on an RTX 5090.
It captures the exact errors I hit, why they happened, and the step‑by‑step commands that solved them.


TL;DR (Quick Fix)

  1. Activate your WebUI venv (mine is E:\Automatic111\sd-venv312):
E:\Automatic111\stable-diffusion-webui> call E:\Automatic111\sd-venv312\Scripts\activate
  1. Clean old Torch installs & cache:
pip uninstall -y torch torchvision torchaudio xformers
pip cache purge
  1. Install PyTorch nightly with CUDA 12.8 (sm_120 support for RTX 50‑series):
pip install --no-cache-dir --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu128
  1. Verify GPU is detected:
python -c "import torch,torchvision; print('torch',torch.__version__,'cuda',getattr(torch.version,'cuda',None)); \
print('avail',torch.cuda.is_available()); print('name', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'NO GPU'); \
print('cap', torch.cuda.get_device_capability(0) if torch.cuda.is_available() else None)"

You should see something like:

torch 2.9.0.dev20xx+cu128 cuda 12.8
avail True
name NVIDIA GeForce RTX 5090
cap (12, 0)
  1. Launch WebUI with a simple webui-user.bat (no extra Torch commands, no skip‑cuda‑test):
set COMMANDLINE_ARGS=--opt-sdp-attention
call webui.bat

If the UI shows steps running ~20–30 it/s and no “CUDA not enabled” errors, you’re good.


My Environment (when it failed & then worked)

  • Windows
  • GPU: NVIDIA GeForce RTX 5090
  • Python: 3.10.11 (64‑bit)
  • WebUI: v1.10.1 (82a973c0...)
  • Venv: E:\Automatic111\sd-venv312
  • Final working Torch/TV:
    • torch 2.9.0.dev...+cu128
    • torchvision 0.24.0.dev...+cu128
    • CUDA runtime reported by Torch: 12.8

Note: cu124 (CUDA 12.4) does not include sm_120 for RTX 50‑series, so those wheels either warn about unsupported capability or fall back to CPU. Nightly cu128 wheels do include sm_120 support.


The Errors I Saw

1) CPU build or CUDA disabled

From WebUI and terminal:

AssertionError: Torch not compiled with CUDA enabled
Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS ...

and

torch 2.8.0+cpu cuda None is_available False

2) Older CUDA (12.4) wheels on a 5090

UserWarning:
NVIDIA GeForce RTX 5090 with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 ... sm_90.

This means those wheels don’t include sm_120, so the 5090 won’t be used.


Root Cause (Why it Broke)

  • I had CPU‑only or older CUDA (cu124) Torch/TV wheels installed.
  • RTX 5090 requires sm_120 support, which currently ships in nightly CUDA 12.8 wheels (cu128).
  • WebUI’s auto‑install / custom index settings can sometimes pull the wrong wheels (CPU or older CUDA).

The Full Fix (Step by Step)

Paths below are mine; adjust for your setup.

0) Open a fresh terminal and activate the correct venv

call E:\Automatic111\sd-venv312\Scripts\activate

Confirm you’re in the venv:

python -c "import sys; print(sys.executable)"

Expected:

E:\Automatic111\sd-venv312\Scripts\python.exe

1) Remove bad installs and cache

pip uninstall -y torch torchvision torchaudio xformers
pip cache purge

2) Install nightly cu128 wheels (have sm_120 for 50‑series)

pip install --no-cache-dir --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu128

3) Sanity‑check GPU from Python

python -c "import torch,torchvision; print('torch',torch.__version__); print('torchvision',torchvision.__version__); \
print('cuda?',torch.cuda.is_available()); print('cuda runtime',getattr(torch.version,'cuda',None)); \
print(torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'NO GPU'); \
print('cap', torch.cuda.get_device_capability(0) if torch.cuda.is_available() else None)"

I got:

torch 2.9.0.dev...+cu128
torchvision 0.24.0.dev...+cu128
cuda? True
cuda runtime 12.8
NVIDIA GeForce RTX 5090
cap (12, 0)

4) Keep WebUI from re‑installing the wrong Torch

Use a minimal webui-user.bat. Mine looks like this:

@echo off
rem --- Use the venv that already has the correct Torch installed ---
set PYTHON=E:\Automatic111\sd-venv312\Scripts\python.exe
set VENV_DIR=E:\Automatic111\sd-venv312

rem --- Do NOT force torch installs here ---
set TORCH_COMMAND=

rem --- Clean, safe args (no skip-cuda-test needed) ---
set COMMANDLINE_ARGS=--opt-sdp-attention

rem --- Nuke any custom pip index URLs that could fetch CPU/old wheels ---
set TORCH_INDEX_URL=
set PIP_INDEX_URL=
set PIP_EXTRA_INDEX_URL=

call webui.bat

If you must install through WebUI, set TORCH_COMMAND to the nightly cu128 line:

set TORCH_COMMAND=pip install --no-cache-dir --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu128

But I prefer keeping it empty once I’ve installed the right wheels in the venv.

5) Launch and confirm it’s using the GPU

On launch I see:

Applying attention optimization: sdp... done.
Model loaded in 3.1s ...
20/20 [00:00<00:00, 21–27 it/s]

That iteration speed is GPU‑level. No more CUDA errors.


What Didn’t Work (and Why)

  • cu124 wheels (Torch 2.6.0+cu124, TV 0.21.0+cu124) → missing sm_120, so 5090 prints warnings and/or falls back to CPU.
  • CPU wheels (torch 2.8.0+cpu)torch.cuda.is_available() is False and WebUI throws “not compiled with CUDA”.
  • Adding --skip-torch-cuda-test → just hides the problem; it doesn’t enable GPU.

Optional Notes

  • xFormers is optional. With modern GPUs, PyTorch SDPA (--opt-sdp-attention) is fast and stable.
  • The TF32 warning from PyTorch 2.9 is harmless; it’s just a heads‑up about a future API change.
  • If you ever slip back to CPU, rerun the uninstall + purge + cu128 install steps above.

Log Snippets (for searchability)

Failure (CPU / no CUDA):

AssertionError: Torch not compiled with CUDA enabled
Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
torch 2.8.0+cpu cuda None is_available False

Failure (old CUDA 12.4 on 5090):

UserWarning:
NVIDIA GeForce RTX 5090 with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.

Success:

torch 2.9.0.dev...+cu128 cuda 12.8
avail True
name NVIDIA GeForce RTX 5090
cap (12, 0)

Applying attention optimization: sdp... done.
... 20/20 [00:00<00:00, 21–27 it/s]

Credit / Context

This write‑up is distilled from a live troubleshooting session.
If it helps you, consider replying with your exact GPU / driver / Torch versions so others can compare.