Unrelaibale cuInit with sandboxes and containers (CUDA cuInit: Unknown error)

Hello there it took me some time to pinpoint this issue as it plagues me a while now.

I run all my applications either as flatpak or via docker / podman containers. Like Blender, DaVinci Resolve, InvokeAI, Alpaca (Ollama UI), OBS Studio and more. On every fresh boot of my system none of these applications are able to make use of CUDA or NVENC. OpenGL and Vulkan works fine.

Blender (flatpak) throws error: CUDA cuInit: Unknown error

DaVinci Resolve (podman container):
Claims no supported GPU found and does not display any errors.

  • Rocky Linux 8.9 distrobox box container with --nvidia flag to share the entire host driver with the container

OBS Studio (flatpak):

FFmpeg VAAPI HEVC encoding not supported
[NVENC] Test process failed: cuda_init_999
NVENC not supported
Failed to initialize module 'obs-nvenc.so'

cuda_init_999 is also known as: CUDA cuInit: Unknown error

InvokeAI (podman container): torch.py CUDA cuInit: Unknown error

  • openSUSE Tumbleweed distrobox container with --nvidia flag to share entire host driver with the container

Running nvidia-smi will work just fine and not output any errors. Even if ran from a CUDA “disabled” container will work. But still CUDA won’t be usable. It will just show the driver and CUDA version as expected as well as the applications running on the GPU. Running GPU intensive apps like games and such will also work fine either from flatpak or a container. Only CUDA seems affected by this.

However without rebooting or doing anything with the system itself it will at any random time “self heal” and out of the sudden all containers and flatpaks will be able to use CUDA just fine. Without restarting the containers, restarting the flatpak sandbox or without rebooting the system. Just closing and reopening the application so it will run cuInit once more.

It was only today that I found out that some can speed up this “self healing” process by just running any CUDA enabled application without containerisation or a sandbox. Like downloading and running blender right from blender.org and run the native Linux binary.

Which makes me believe there is some obscure driver error at hand here and that it somehow does not allow for cuInit via containers or sandboxes unless some none-sandbox application has once triggered it.

Host System Specs:
OS: Aeon Desktop (based on openSUSE Tumbleweed)
Linux Kernel: 6.14.6-1-default
GPU: RTX 3080
Driver: 570.144
CUDA: 12.8
Podman: 5.4.2
flatpak: 1.16.0

nVidia related SystemD services:

● nvidia-persistenced.service - NVIDIA Persistence Daemon
     Loaded: loaded (/usr/lib/systemd/system/nvidia-persistenced.service; enabled; preset: enabled)
     Active: active (running) since Sat 2025-05-17 09:37:06 CEST; 35min ago
 Invocation: 9fa6c77c6dfc40ae857f10ef34cd2f74
    Process: 1369 ExecStart=/usr/bin/nvidia-persistenced --verbose (code=exited, status=0/SUCCESS)
   Main PID: 1424 (nvidia-persiste)
      Tasks: 1 (limit: 18476)
        CPU: 14ms
     CGroup: /system.slice/nvidia-persistenced.service
             └─1424 /usr/bin/nvidia-persistenced --verbose

Mai 17 09:37:06 makron systemd[1]: Starting NVIDIA Persistence Daemon...
Mai 17 09:37:06 makron nvidia-persistenced[1424]: Verbose syslog connection opened
Mai 17 09:37:06 makron nvidia-persistenced[1424]: Directory /var/run/nvidia-persistenced will not be removed on exit
Mai 17 09:37:06 makron nvidia-persistenced[1424]: Started (1424)
Mai 17 09:37:06 makron nvidia-persistenced[1424]: device 0000:01:00.0 - registered
Mai 17 09:37:06 makron nvidia-persistenced[1424]: device 0000:01:00.0 - persistence mode enabled.
Mai 17 09:37:06 makron nvidia-persistenced[1424]: device 0000:01:00.0 - NUMA memory onlined.
Mai 17 09:37:06 makron nvidia-persistenced[1424]: Local RPC services initialized
Mai 17 09:37:06 makron systemd[1]: Started NVIDIA Persistence Daemon.

○ nvidia-hibernate.service - NVIDIA system hibernate actions
     Loaded: loaded (/usr/lib/systemd/system/nvidia-hibernate.service; enabled; preset: enabled)
     Active: inactive (dead)
○ nvidia-powerd.service - nvidia-powerd service
     Loaded: loaded (/usr/lib/systemd/system/nvidia-powerd.service; enabled; preset: enabled)
     Active: inactive (dead) since Sat 2025-05-17 09:37:06 CEST; 35min ago
   Duration: 10ms
 Invocation: 29cc229c91ef487a8e04023133a50a3f
    Process: 1370 ExecStart=/usr/bin/nvidia-powerd (code=exited, status=1/FAILURE)
   Main PID: 1370 (code=exited, status=1/FAILURE)
        CPU: 6ms

Mai 17 09:37:06 makron systemd[1]: Started nvidia-powerd service.
Mai 17 09:37:06 makron /usr/bin/nvidia-powerd[1370]: nvidia-powerd version:1.0(build 1)
Mai 17 09:37:06 makron /usr/bin/nvidia-powerd[1370]: Found unsupported configuration. Exiting...
Mai 17 09:37:06 makron systemd[1]: nvidia-powerd.service: Deactivated successfully.
○ nvidia-suspend.service - NVIDIA system suspend actions
     Loaded: loaded (/usr/lib/systemd/system/nvidia-suspend.service; enabled; preset: enabled)
     Active: inactive (dead)
○ nvidia-resume.service - NVIDIA system resume actions
     Loaded: loaded (/usr/lib/systemd/system/nvidia-resume.service; enabled; preset: enabled)
     Active: inactive (dead)
○ nvidia-suspend-then-hibernate.service - NVIDIA actions for suspend-then-hibernate
     Loaded: loaded (/usr/lib/systemd/system/nvidia-suspend-then-hibernate.service; disabled; preset: disabled)
     Active: inactive (dead)

Tested container distros:
Rocky Linux 8.9
openSUSE Tumbleweed (latest snapshot)

Steps to (hopefully reproduce):

  • Setup Linux
  • Fresh boot
  • Run any app from flatpak such as Blender → Edit → Preferences → System: CUDA and OptiX will require GPU with compute support 3.0 and 5.0 respectively but can not find any
  • or: run any app from a container with shared nvidia host driver such as InvokeAI: CUDA cuInit: Unknown error
  • Run blender outside of flatpak and open up Edit → Preferences → System: CUDA and OptiX will show supported GPU
  • Run blender from flatpak: Edit → Preferences → System: CUDA and OptiX will now also shwo a supported GPU
  • Run InvokeAI from container: GPU will also by picked up by torch just fine

I am very sorry for this issue report as it seems to be very obscure but still it is at least somewhat consistent. Also I do not know what the driver did when the “self healing” occurs.

Bug report zip:
nvidia-bug-report.log.gz (553.7 KB)

Linux Kernel 6.15.0 with open kernel module and driver 570.153.02 . Issue persists.

CUDA still needs to be “kick started” by a none containerised / none sandboxed application before usable in a container or flatpak sandbox.

Issue also exists with KDE Plasma 6

KDE: 6.4.3
Kernel: 6.15.7
nVidia: 570.172.08 (open kernel module)
CUDA: 12.8

I still have to run Blender once outside of flatpak, enable CUDA in there and afterwards I can use CUDA in flatpak based applications as well like Blender, OBS Studio, Steam (for games makeung use of CUDA for some PhysX for example).

nvidia-bug-report.log.gz (444.2 KB)

I can confirm that this workaround did the trick. Please give us a fix.

Driver: 580.76.05 (Open Kernel Module)
Kernel: 6.16.1

Still affected by this Bug

This issue meanwhile also affect DirectX 12 games running via VKD3D. Namely Senua’s Saga: Hellblade II and Satisfactory which do both complain about “missing DX12 support” while in fact they seem just not to be able use some compute features. After “kick starting” CUDA via the method mentioned above these games magically start working again.

Please … it is beyond ridiculous right now. Please someone at nVidia look into this or at least speed up about proper open source driver support. You choose.

At least it is in your interest and your A.I. hype to have working CUDA using podman, docker and flatpaks…

For the record:

Driver 580.82.07 still suffers from this issue as well.

To reiterate about affected workloads:

  • Ollama using Alpaca to use local Chat AI (flatpak)
  • InvokeAI (eg. stable diffusion and other image generating LLMs) via podman
  • Upscayl to AI enhance and upscale images (flatpak)
  • Blender using Cycles (flatpak)
  • Games using DirectX 12 and makeing use of additional compute features and running via flatpak
    • Senua’s Saga: Hellblade II
    • Satisfactory
    • The Talos Principle II
    • I suppose any Unreal Engine 5 games is then affected by this which would sky rocket this list
  • DaVinci Resolve running on Rocky Linux 8 via podman
  • OBS Studio using nVenc (flatpak)

All can be fixed by “kick starting” CUDA outside of the sandbox / containers and WITHOUT a system reboot. They just start working.

Specs:

  • Kernel: 6.16.7
  • Flatpak: 1.16.1
  • podman: 5.6.0
  • Disotrs: Aeon, Kalpa, EndevourOS

It seems to be a regression with distributions shipping a more up-to-date software stack eg. rolling releases. Therefore point-releases will catch up at times.

Since Rocky Linux 8 is on the list and in this cases it shares the Kernel from the host it is more likely an issue with more recent Kernels rather than other software involved.

Hunt Showdown 1896 joined the rank of not working games without the CUDA kick-start if running from flatpak.

I assume it has something to do with compute shaders.

  • Driver Version: 580.95.05
  • CUDA Version: 13.0

Still affected.

nvidia-bug-report.log.gz (468.6 KB)

  • 580.105.08
  • CUDA 13.0
  • Operating System: Kalpa Desktop 20251109
  • KDE Plasma Version: 6.5.2
  • KDE Frameworks Version: 6.19.0
  • Qt Version: 6.10.0
  • Kernel Version: 6.17.7-1-default (64-bit)
  • Graphics Platform: Wayland
  • Processors: 16 × AMD Ryzen 7 7800X3D 8-Core Processor
  • Memory: 16 GiB of RAM (15.2 GiB usable)
  • Graphics Processor: NVIDIA GeForce RTX 3080
  • flatpak: 1.16.1

Issue partially fixed.

The underlying issue is still there. I only tested flatpk with this driver as of now but here are the symptoms:

After rebooting or a cold boot of the system and then launching a CUDA enabled application (tested with Blender from flathub) said application will not be able to run cuInit with unknown error as describe multiple times throughout this thread.

Before 580.105.08 I had to run an unsandboxed version of blender, the one from their website, and then go to File → Preferences → System and enable CUDA. Or, if CUDA was once enabled before, it was enough to just open this dialogue.

Then running flatpak Blender, it was able to use CUDA as well.

After the update to 580.105.08 it is now enought to just run nvidia-smi once. This will somehow fix the flatpak CUDA issue as well. Tested this 3 times with consistent results.

Another note: To circumvent this error before I wrote a little start up script which instructs the un-sandboxed blender to render the default cube using CUDA so CUDA got initialized at login time for my user. This reliably “fixed” the error as well. Since that I never had to manually kick start CUDA as described above.

I’ll now change the kickstart script to just run nvidia-smi and see if that works as well as of now.

I attached the most recent bug report:

nvidia-bug-report.log.gz (457.8 KB)

Hi, I also still have this issue with flatpak applications accesing CUDA before any non-flatpak native binaries properly initialized CUDA.
I am trying to run parsec (gaming app) with hardware NVIDIA decoding on latest Fedora, which is only available as a flatpak or deb (broken in fedora).
After a few frustrating days diagnosing the problem, it’s the Nvidia driver runtime’s fault (cuInit probably). If I run the app on a fresh boot:
parsec
[D 2025-12-21 22:28:13] log: Parsec release13 (150-101, Service: -1, Loader: 12)
[D 2025-12-21 22:28:13] MTY_DeleteFile: ‘remove’ failed with errno 39
[D 2025-12-21 22:28:13] log: Parsec getting initial user data.
[D 2025-12-21 22:28:13] log: Parsec got initial user data.
[2 2025-12-21 22:28:13] Force Relay Mode: Off
[2 2025-12-21 22:28:13] Force Relay Mode: Off
[2 2025-12-21 22:28:13] UPNP: upnp_create
[AVHWDeviceContext @ 0x7f22a071f280] cu->cuInit(0) failed → CUDA_ERROR_UNKNOWN: unknown error
[AVHWDeviceContext @ 0x7f22a0728940] cu->cuInit(0) failed → CUDA_ERROR_UNKNOWN: unknown error
[D 2025-12-21 22:28:13] Client status changed to: -3
[AVHWDeviceContext @ 0x7f22a07397c0] libva: /usr/lib/x86_64-linux-gnu/dri/nvidia-vaapi-driver/nvidia_drv_video.so init failed
[AVHWDeviceContext @ 0x7f22a07397c0] Failed to initialise VAAPI connection: 1 (operation failed).
[AVHWDeviceContext @ 0x7f22a2972cc0] libva: /usr/lib/x86_64-linux-gnu/dri/nvidia-vaapi-driver/nvidia_drv_video.so init failed
[AVHWDeviceContext @ 0x7f22a2972cc0] Failed to initialise VAAPI connection: 1 (operation failed).
[D 2025-12-21 22:28:21] UPNP: No devlist

It fails, cannot find nvidia_drv_video.so and the VAAPI layer.

However, running nvidia-smi just before solves the issue:

nvidia-smi
Sun Dec 21 22:29:04 2025
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.119.02             Driver Version: 580.119.02     CUDA Version: 13.0     |
±----------------------------------------±-----------------------±---------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2080 …    Off |   00000000:01:00.0  On |                  N/A |
|  0%   36C    P8              8W /  250W |     285MiB /   8192MiB |      1%      Default |
|                                         |                        |                  N/A |
±----------------------------------------±-----------------------±---------------------+

±----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1363      G   /usr/libexec/Xorg                        55MiB |
|    0   N/A  N/A            2792      G   /usr/lib64/firefox/firefox              209MiB |
±----------------------------------------------------------------------------------------+
worker@fedora:~$ parsec
[D 2025-12-21 22:29:06] log: Parsec release13 (150-101, Service: -1, Loader: 12)
[D 2025-12-21 22:29:06] MTY_DeleteFile: ‘remove’ failed with errno 39
[D 2025-12-21 22:29:06] log: Parsec getting initial user data.
[D 2025-12-21 22:29:06] log: Parsec got initial user data.
[2 2025-12-21 22:29:06] Force Relay Mode: Off
[2 2025-12-21 22:29:06] Force Relay Mode: Off
[2 2025-12-21 22:29:06] UPNP: upnp_create
[D 2025-12-21 22:29:06] Client status changed to: -3
[D 2025-12-21 22:29:14] Client status changed to: 20
[D 2025-12-21 22:29:14] UPNP: No devlist
[3 2025-12-21 22:29:15] Sent candidate.
[3 2025-12-21 22:29:15] CANDEX: LAN 192.168.88.3:21682
[2 2025-12-21 22:29:16] Adding LAN Candidate from peer (1); 192.168.88.3:21682
[3 2025-12-21 22:29:16] sent: {“action”:“candex”,“version”:1,“payload”:{“attempt_id”:“d2b8dd47-00a7f90b-e98fa494-905ee722-bfcbff6e-51d9c3ed”,“data”:{“lan”:true,“port”:30806,“ver_data”:1,“versions”:{“bud”:1,“control”:1,“p2p”:1,“audio”:1,“init”:1,“video”:1},“from_stun”:false,“sync”:false,“ip”:“192.168.88.2”},“to”:“376eH12iaH8NwX717lfrUl2irHx”}}
[3 2025-12-21 22:29:16] CANDEX: LAN 192.168.88.3:21682
[2 2025-12-21 22:29:16] Adding LAN Candidate from peer (1); 192.168.88.3:21682
[2 2025-12-21 22:29:16] Adding LAN Candidate from peer (2); ::ffff:192.168.88.3:21682
[D 2025-12-21 22:29:16] net           = BUD|::ffff:192.168.88.3|21682
[D 2025-12-21 22:29:16] BUD AES_GCM   = 256
[3 2025-12-21 22:29:16] CANDEX: LAN 2a04:241e:106:3d00:7046:d6e2:488c:8bea:21682
[2 2025-12-21 22:29:18] Rejecting LAN Candidate from peer; 2a04:241e:106:3d00:7046:d6e2:488c:8bea:21682
[3 2025-12-21 22:29:18] CANDEX: WAN ::ffff:78.96.85.248:57876
[2 2025-12-21 22:29:18] Rejecting WAN Candidate from peer; ::ffff:78.96.85.248:57876
[D 2025-12-21 22:29:18] FFMPEG 7 NVIDIA
[4 2025-12-21 22:29:18] FFMPEG 7.0.0 testing hw type 2
[2 2025-12-21 22:29:19] FFMPEG 7.0.0 hw type 2
[2 2025-12-21 22:29:19] FFMPEG format 23
[I 2025-12-21 22:29:25] Host’s virtual microphone is disabled
[D 2025-12-21 22:29:42] Client status changed to: -3

Driver Version: 580.119.02 CUDA Version: 13.0

Happens with earlier versions as well, at least as far as 570, couldn’t try lower.

Yes I found this too with some of the recent driver updates. I added nvidia-smi to my autostart so it get’s run once I log into my computer. Since then I never ran into this issue.

However I noticed nvidia_uvm to be the problem. Before nvidia-smi it is not loaded inside flatpak. Afterwards it magically appears.