Facing difficulty in installing Webui related things in Jetson AGX Thor

Hi Nivida team,

I recently got the Jetson AGX Thor and started installing the required packages (basically I want to install Holoscan sensor bridge and interface to camera).

I am following Quick Start Guide — Jetson AGX Thor Developer Kit - User Guide and able to install CUDA, Docker etc. When I tried to run

jetson-containers run $(autotag stable-diffusion-webui)

I am getting series of issues and was not able to resolve it (and trying to understand the issue).

Gist of the issue is : Architecture mismatch between Triton 3.6.0 and Jetson AGX Thor (Blackwell Tegra) preventing GPU kernel compilation. The bundled ptxas-blackwell binary rejects the sm_110a target generated by the Triton/LLVM backend.

Environment Details

  • Hardware: Jetson AGX Thor DevKit (Blackwell-based SoC)

  • Architecture ID: sm_110a

  • Docker Image: stable-diffusion-webui:r38.3.arm64-sbsa-cu130-24.04-triton

  • Software Stack: Triton 3.6.0, CUDA 13.0 (internal to container

Here is sequence I am not very clear - seems it is catch 22 situation.

  1. Instruction Generation: The Triton compiler correctly identifies the hardware as sm_110a and generates a PTX file with .target sm_110a.

  2. Assembler Failure: Triton calls the bundled assembler: /opt/venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas-blackwell.

  3. The Error: ptxas fails with: fatal : Unsupported .target 'sm_110a'.

  4. The Catch-22: If the user manually overrides the arch to sm_100 to satisfy the assembler, the CUDA driver subsequently rejects the resulting binary with: RuntimeError: Triton Error [CUDA]: no kernel image is available for execution on the device because of the strict sm_110a requirement on Thor.

Please provide an updated ptxas-blackwell binary or a specific NVCC/Triton configuration path that allows for native sm_110a compilation on Jetson Thor without requiring manual binary header patching.

Also let me know if I am missing anything in this sequence I am trying.

With best regards,

Phani.

Hi,

The stable-diffusion-webui doesn’t support JetPack 7.
Could you check our new VLM WebUI to see if this can meet your requirements?

You can find the container in the link below:

Thanks.

Hi Nvidia team,

Thanks for this update. It took some time for me to experiment with this suggestions due to year end holidays.

Now I understand Nvidia suggests VLM WebUI instead of normal stable-diffusion WeibUI (which is in installation guide)

I am attempting to run the VILA VLM WebUI on a Jetson AGX Thor (Blackwell) using the dustynv/vila:r36.4.0-cu128-24.04 container on JetPack 7.1. While NVIDIA previously recommended using the VLM WebUI over Stable Diffusion for this platform, the VLM stack currently fails to initialize the GPU.

I have also attempted to find the nano_llm or vila.serve.live_llm modules within the dustynv/vila container, but they appear to be missing from the environment path, forcing the use of the raw server.py in /opt/VILA.

Primary Error: Inside the container, any attempt to access the GPU (even a simple torch.cuda.init()) results in: RuntimeError: Unexpected error from cudaGetDeviceCount(). Error 801: operation not supported

Key Technical Details:

  • Hardware: Jetson AGX Thor (Blackwell / sm_110a)

  • Software Stack: JetPack 7.1, CUDA 13.0, Triton 3.6.0

  • Container: dustynv/vila:r36.4.0-cu128-24.04 (Ubuntu 24.04)

  • Observations:

    1. The server.py script fails because Triton 3.6.0/DeepSpeed cannot find an “active driver.”

    2. Manual torch.cuda calls fail with Error 801.

    3. Standard jetson-containers launch flags (including --privileged and --runtime nvidia) do not seem to resolve the Blackwell-specific UVM/handshake issue.

Below are my questions.

  1. Is there a specific Blackwell-optimized docker launch command required to map the sm_110a device nodes correctly?
  2. Is the server.py in the dustynv/vila container the “Official Correct WebUI” for Thor, or should we be using a NanoLLM/MLC based service for Blackwell?
  3. How can we bypass the Error 801 when the hardware is natively sm_110a but the containerized Triton backend expects sm_100 or sm_120?

Here are some of the details I captured so that you can get the clarity on the issue.

jetsonthor@jetsonthor:~$ nvidia-smi
Tue Jan 6 18:01:20 2026
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.00 Driver Version: 580.00 CUDA Version: 13.0 |
±----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA Thor Off | 00000000:01:00.0 Off | N/A |
| N/A N/A N/A N/A / N/A | Not Supported | 39% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+

±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2657 G /usr/lib/xorg/Xorg 0MiB |
| 0 N/A N/A 3457 G /usr/bin/gnome-shell 0MiB |
| 0 N/A N/A 5240 G /usr/bin/gnome-text-editor 0MiB |
±----------------------------------------------------------------------------------------+
jetsonthor@jetsonthor:~$

jetsonthor@jetsonthor:~$ cat /proc/devices | grep nvidia
195 nvidia
195 nvidia-modeset
195 nvidiactl
488 nvidia-uvm
489 nvidia-nvswitch
490 nvidia-nvlink
491 nvidia-caps
492 nvidia-caps-imex-channels

jetsonthor@jetsonthor:~$ ls -l /dev/nvidia*
crw-rw-rw- 1 root root 195, 0 Dec 24 17:48 /dev/nvidia0
crw-rw-rw- 1 root root 195, 1 Dec 24 17:48 /dev/nvidia1
crw-rw-rw- 1 root root 195, 255 Dec 24 17:48 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Dec 24 17:48 /dev/nvidia-modeset
crw-rw-rw- 1 root root 488, 0 Dec 24 17:48 /dev/nvidia-uvm
crw-rw-rw- 1 root root 488, 1 Dec 24 17:48 /dev/nvidia-uvm-tools/dev/nvidia-caps:
total 0
cr-------- 1 root root 491, 1 Dec 24 17:48 nvidia-cap1
cr–r–r-- 1 root root 491, 2 Dec 24 17:48 nvidia-cap2
jetsonthor@jetsonthor:~$

jetsonthor@jetsonthor:~$ sudo dmesg | grep -i NVRM
[sudo] password for jetsonthor:
[ 13.585044] NVRM: devm_reset_control_get failed, err: -2
[ 13.585046] NVRM: devm_reset_control_get failed, err: -2
[ 13.585048] NVRM: mipi_cal devm_reset_control_get failed, err: -2
[ 13.589443] NVRM: loading NVIDIA UNIX Open Kernel Module for aarch64 TempVersion Release Build (bugfix_main) (buildbrain@5bf75f7d-240f-4779-b613-6ccb8a8ceac2-z7fr-wgfvp) Thu Aug 21 17:42:20 PDT 2025
[ 14.883717] NVRM: rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x731341 result 0xffff: Failure: Generic Error [NV_ERR_GENERIC]
[ 34.909502] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 51.145879] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 69.774508] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 129.773854] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 189.773573] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 249.773385] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 309.773101] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 335.254198] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 368.589065] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 407.693048] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706

Once I am in container I run the command to import torch or cuda I get 801 error

jetsonthor@jetsonthor:~$ jetson-containers run --privileged --ipc=host $(autotag vila)
Namespace(packages=[‘vila’], prefer=[‘local’, ‘registry’, ‘build’], disable=[‘’], user=‘dustynv’, output=‘/tmp/autotag’, quiet=False, verbose=False)
– L4T_VERSION=38.3.0 JETPACK_VERSION=7.1 CUDA_VERSION=13.0
– Finding compatible container image for [‘vila’]
dustynv/vila:r36.4.0-cu128-24.04
V4L2_DEVICES:

DISPLAY environmental variable is already set: “:0”

localuser:root being added to access control list

ARM64 architecture detected

Jetson Detected

SYSTEM_ARCH=tegra-aarch64

  • docker run --runtime nvidia --env NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics -it --rm --network host --shm-size=8g --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/jetsonthor/jetson-containers/data:/data -v /etc/localtime:/etc/localtime:ro -v /etc/timezone:/etc/timezone:ro --device /dev/snd -e PULSE_SERVER=unix:/run/user/1000/pulse/native -v /run/user/1000/pulse:/run/user/1000/pulse --device /dev/bus/usb -e DISPLAY=:0 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-3 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-6 --device /dev/i2c-7 --device /dev/i2c-8 --device /dev/i2c-9 -v /run/jtop.sock:/run/jtop.sock --name jetson_container_20260106_180209 --privileged --ipc=host dustynv/vila:r36.4.0-cu128-24.04
    root@jetsonthor:/# python3 -c “import torch; torch.cuda.init()”
    Traceback (most recent call last):
    File “”, line 1, in
    File “/opt/venv/lib/python3.12/site-packages/torch/cuda/init.py”, line 286, in init
    _lazy_init()
    File “/opt/venv/lib/python3.12/site-packages/torch/cuda/init.py”, line 319, in _lazy_init
    torch._C._cuda_init()
    RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 801: operation not supported
    root@jetsonthor:/#

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.