Unrelaibale cuInit with sandboxes and containers (CUDA cuInit: Unknown error)

Hello there it took me some time to pinpoint this issue as it plagues me a while now.

I run all my applications either as flatpak or via docker / podman containers. Like Blender, DaVinci Resolve, InvokeAI, Alpaca (Ollama UI), OBS Studio and more. On every fresh boot of my system none of these applications are able to make use of CUDA or NVENC. OpenGL and Vulkan works fine.

Blender (flatpak) throws error: CUDA cuInit: Unknown error

DaVinci Resolve (podman container):
Claims no supported GPU found and does not display any errors.

  • Rocky Linux 8.9 distrobox box container with --nvidia flag to share the entire host driver with the container

OBS Studio (flatpak):

FFmpeg VAAPI HEVC encoding not supported
[NVENC] Test process failed: cuda_init_999
NVENC not supported
Failed to initialize module 'obs-nvenc.so'

cuda_init_999 is also known as: CUDA cuInit: Unknown error

InvokeAI (podman container): torch.py CUDA cuInit: Unknown error

  • openSUSE Tumbleweed distrobox container with --nvidia flag to share entire host driver with the container

Running nvidia-smi will work just fine and not output any errors. Even if ran from a CUDA “disabled” container will work. But still CUDA won’t be usable. It will just show the driver and CUDA version as expected as well as the applications running on the GPU. Running GPU intensive apps like games and such will also work fine either from flatpak or a container. Only CUDA seems affected by this.

However without rebooting or doing anything with the system itself it will at any random time “self heal” and out of the sudden all containers and flatpaks will be able to use CUDA just fine. Without restarting the containers, restarting the flatpak sandbox or without rebooting the system. Just closing and reopening the application so it will run cuInit once more.

It was only today that I found out that some can speed up this “self healing” process by just running any CUDA enabled application without containerisation or a sandbox. Like downloading and running blender right from blender.org and run the native Linux binary.

Which makes me believe there is some obscure driver error at hand here and that it somehow does not allow for cuInit via containers or sandboxes unless some none-sandbox application has once triggered it.

Host System Specs:
OS: Aeon Desktop (based on openSUSE Tumbleweed)
Linux Kernel: 6.14.6-1-default
GPU: RTX 3080
Driver: 570.144
CUDA: 12.8
Podman: 5.4.2
flatpak: 1.16.0

nVidia related SystemD services:

● nvidia-persistenced.service - NVIDIA Persistence Daemon
     Loaded: loaded (/usr/lib/systemd/system/nvidia-persistenced.service; enabled; preset: enabled)
     Active: active (running) since Sat 2025-05-17 09:37:06 CEST; 35min ago
 Invocation: 9fa6c77c6dfc40ae857f10ef34cd2f74
    Process: 1369 ExecStart=/usr/bin/nvidia-persistenced --verbose (code=exited, status=0/SUCCESS)
   Main PID: 1424 (nvidia-persiste)
      Tasks: 1 (limit: 18476)
        CPU: 14ms
     CGroup: /system.slice/nvidia-persistenced.service
             └─1424 /usr/bin/nvidia-persistenced --verbose

Mai 17 09:37:06 makron systemd[1]: Starting NVIDIA Persistence Daemon...
Mai 17 09:37:06 makron nvidia-persistenced[1424]: Verbose syslog connection opened
Mai 17 09:37:06 makron nvidia-persistenced[1424]: Directory /var/run/nvidia-persistenced will not be removed on exit
Mai 17 09:37:06 makron nvidia-persistenced[1424]: Started (1424)
Mai 17 09:37:06 makron nvidia-persistenced[1424]: device 0000:01:00.0 - registered
Mai 17 09:37:06 makron nvidia-persistenced[1424]: device 0000:01:00.0 - persistence mode enabled.
Mai 17 09:37:06 makron nvidia-persistenced[1424]: device 0000:01:00.0 - NUMA memory onlined.
Mai 17 09:37:06 makron nvidia-persistenced[1424]: Local RPC services initialized
Mai 17 09:37:06 makron systemd[1]: Started NVIDIA Persistence Daemon.

○ nvidia-hibernate.service - NVIDIA system hibernate actions
     Loaded: loaded (/usr/lib/systemd/system/nvidia-hibernate.service; enabled; preset: enabled)
     Active: inactive (dead)
○ nvidia-powerd.service - nvidia-powerd service
     Loaded: loaded (/usr/lib/systemd/system/nvidia-powerd.service; enabled; preset: enabled)
     Active: inactive (dead) since Sat 2025-05-17 09:37:06 CEST; 35min ago
   Duration: 10ms
 Invocation: 29cc229c91ef487a8e04023133a50a3f
    Process: 1370 ExecStart=/usr/bin/nvidia-powerd (code=exited, status=1/FAILURE)
   Main PID: 1370 (code=exited, status=1/FAILURE)
        CPU: 6ms

Mai 17 09:37:06 makron systemd[1]: Started nvidia-powerd service.
Mai 17 09:37:06 makron /usr/bin/nvidia-powerd[1370]: nvidia-powerd version:1.0(build 1)
Mai 17 09:37:06 makron /usr/bin/nvidia-powerd[1370]: Found unsupported configuration. Exiting...
Mai 17 09:37:06 makron systemd[1]: nvidia-powerd.service: Deactivated successfully.
○ nvidia-suspend.service - NVIDIA system suspend actions
     Loaded: loaded (/usr/lib/systemd/system/nvidia-suspend.service; enabled; preset: enabled)
     Active: inactive (dead)
○ nvidia-resume.service - NVIDIA system resume actions
     Loaded: loaded (/usr/lib/systemd/system/nvidia-resume.service; enabled; preset: enabled)
     Active: inactive (dead)
○ nvidia-suspend-then-hibernate.service - NVIDIA actions for suspend-then-hibernate
     Loaded: loaded (/usr/lib/systemd/system/nvidia-suspend-then-hibernate.service; disabled; preset: disabled)
     Active: inactive (dead)

Tested container distros:
Rocky Linux 8.9
openSUSE Tumbleweed (latest snapshot)

Steps to (hopefully reproduce):

  • Setup Linux
  • Fresh boot
  • Run any app from flatpak such as Blender → Edit → Preferences → System: CUDA and OptiX will require GPU with compute support 3.0 and 5.0 respectively but can not find any
  • or: run any app from a container with shared nvidia host driver such as InvokeAI: CUDA cuInit: Unknown error
  • Run blender outside of flatpak and open up Edit → Preferences → System: CUDA and OptiX will show supported GPU
  • Run blender from flatpak: Edit → Preferences → System: CUDA and OptiX will now also shwo a supported GPU
  • Run InvokeAI from container: GPU will also by picked up by torch just fine

I am very sorry for this issue report as it seems to be very obscure but still it is at least somewhat consistent. Also I do not know what the driver did when the “self healing” occurs.

Bug report zip:
nvidia-bug-report.log.gz (553.7 KB)

Linux Kernel 6.15.0 with open kernel module and driver 570.153.02 . Issue persists.

CUDA still needs to be “kick started” by a none containerised / none sandboxed application before usable in a container or flatpak sandbox.