Hi Nvidia team,
Thanks for this update. It took some time for me to experiment with this suggestions due to year end holidays.
Now I understand Nvidia suggests VLM WebUI instead of normal stable-diffusion WeibUI (which is in installation guide)
I am attempting to run the VILA VLM WebUI on a Jetson AGX Thor (Blackwell) using the dustynv/vila:r36.4.0-cu128-24.04 container on JetPack 7.1. While NVIDIA previously recommended using the VLM WebUI over Stable Diffusion for this platform, the VLM stack currently fails to initialize the GPU.
I have also attempted to find the nano_llm or vila.serve.live_llm modules within the dustynv/vila container, but they appear to be missing from the environment path, forcing the use of the raw server.py in /opt/VILA.
Primary Error: Inside the container, any attempt to access the GPU (even a simple torch.cuda.init()) results in: RuntimeError: Unexpected error from cudaGetDeviceCount(). Error 801: operation not supported
Key Technical Details:
-
Hardware: Jetson AGX Thor (Blackwell / sm_110a)
-
Software Stack: JetPack 7.1, CUDA 13.0, Triton 3.6.0
-
Container: dustynv/vila:r36.4.0-cu128-24.04 (Ubuntu 24.04)
-
Observations:
-
The server.py script fails because Triton 3.6.0/DeepSpeed cannot find an “active driver.”
-
Manual torch.cuda calls fail with Error 801.
-
Standard jetson-containers launch flags (including --privileged and --runtime nvidia) do not seem to resolve the Blackwell-specific UVM/handshake issue.
Below are my questions.
- Is there a specific Blackwell-optimized docker launch command required to map the
sm_110a device nodes correctly?
- Is the
server.py in the dustynv/vila container the “Official Correct WebUI” for Thor, or should we be using a NanoLLM/MLC based service for Blackwell?
- How can we bypass the
Error 801 when the hardware is natively sm_110a but the containerized Triton backend expects sm_100 or sm_120?
Here are some of the details I captured so that you can get the clarity on the issue.
jetsonthor@jetsonthor:~$ nvidia-smi
Tue Jan 6 18:01:20 2026
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.00 Driver Version: 580.00 CUDA Version: 13.0 |
±----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA Thor Off | 00000000:01:00.0 Off | N/A |
| N/A N/A N/A N/A / N/A | Not Supported | 39% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2657 G /usr/lib/xorg/Xorg 0MiB |
| 0 N/A N/A 3457 G /usr/bin/gnome-shell 0MiB |
| 0 N/A N/A 5240 G /usr/bin/gnome-text-editor 0MiB |
±----------------------------------------------------------------------------------------+
jetsonthor@jetsonthor:~$
jetsonthor@jetsonthor:~$ cat /proc/devices | grep nvidia
195 nvidia
195 nvidia-modeset
195 nvidiactl
488 nvidia-uvm
489 nvidia-nvswitch
490 nvidia-nvlink
491 nvidia-caps
492 nvidia-caps-imex-channels
jetsonthor@jetsonthor:~$ ls -l /dev/nvidia*
crw-rw-rw- 1 root root 195, 0 Dec 24 17:48 /dev/nvidia0
crw-rw-rw- 1 root root 195, 1 Dec 24 17:48 /dev/nvidia1
crw-rw-rw- 1 root root 195, 255 Dec 24 17:48 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Dec 24 17:48 /dev/nvidia-modeset
crw-rw-rw- 1 root root 488, 0 Dec 24 17:48 /dev/nvidia-uvm
crw-rw-rw- 1 root root 488, 1 Dec 24 17:48 /dev/nvidia-uvm-tools/dev/nvidia-caps:
total 0
cr-------- 1 root root 491, 1 Dec 24 17:48 nvidia-cap1
cr–r–r-- 1 root root 491, 2 Dec 24 17:48 nvidia-cap2
jetsonthor@jetsonthor:~$
jetsonthor@jetsonthor:~$ sudo dmesg | grep -i NVRM
[sudo] password for jetsonthor:
[ 13.585044] NVRM: devm_reset_control_get failed, err: -2
[ 13.585046] NVRM: devm_reset_control_get failed, err: -2
[ 13.585048] NVRM: mipi_cal devm_reset_control_get failed, err: -2
[ 13.589443] NVRM: loading NVIDIA UNIX Open Kernel Module for aarch64 TempVersion Release Build (bugfix_main) (buildbrain@5bf75f7d-240f-4779-b613-6ccb8a8ceac2-z7fr-wgfvp) Thu Aug 21 17:42:20 PDT 2025
[ 14.883717] NVRM: rpcRmApiControl_dce: NVRM_RPC_DCE: Failed RM ctrl call cmd:0x731341 result 0xffff: Failure: Generic Error [NV_ERR_GENERIC]
[ 34.909502] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 51.145879] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 69.774508] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 129.773854] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 189.773573] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 249.773385] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 309.773101] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 335.254198] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 368.589065] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
[ 407.693048] NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706
Once I am in container I run the command to import torch or cuda I get 801 error
jetsonthor@jetsonthor:~$ jetson-containers run --privileged --ipc=host $(autotag vila)
Namespace(packages=[‘vila’], prefer=[‘local’, ‘registry’, ‘build’], disable=[‘’], user=‘dustynv’, output=‘/tmp/autotag’, quiet=False, verbose=False)
– L4T_VERSION=38.3.0 JETPACK_VERSION=7.1 CUDA_VERSION=13.0
– Finding compatible container image for [‘vila’]
dustynv/vila:r36.4.0-cu128-24.04
V4L2_DEVICES:
DISPLAY environmental variable is already set: “:0”
localuser:root being added to access control list
ARM64 architecture detected
Jetson Detected
SYSTEM_ARCH=tegra-aarch64
- docker run --runtime nvidia --env NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics -it --rm --network host --shm-size=8g --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/jetsonthor/jetson-containers/data:/data -v /etc/localtime:/etc/localtime:ro -v /etc/timezone:/etc/timezone:ro --device /dev/snd -e PULSE_SERVER=unix:/run/user/1000/pulse/native -v /run/user/1000/pulse:/run/user/1000/pulse --device /dev/bus/usb -e DISPLAY=:0 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-3 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-6 --device /dev/i2c-7 --device /dev/i2c-8 --device /dev/i2c-9 -v /run/jtop.sock:/run/jtop.sock --name jetson_container_20260106_180209 --privileged --ipc=host dustynv/vila:r36.4.0-cu128-24.04
root@jetsonthor:/# python3 -c “import torch; torch.cuda.init()”
Traceback (most recent call last):
File “”, line 1, in
File “/opt/venv/lib/python3.12/site-packages/torch/cuda/init.py”, line 286, in init
_lazy_init()
File “/opt/venv/lib/python3.12/site-packages/torch/cuda/init.py”, line 319, in _lazy_init
torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 801: operation not supported
root@jetsonthor:/#