Hello there it took me some time to pinpoint this issue as it plagues me a while now.
I run all my applications either as flatpak or via docker / podman containers. Like Blender, DaVinci Resolve, InvokeAI, Alpaca (Ollama UI), OBS Studio and more. On every fresh boot of my system none of these applications are able to make use of CUDA or NVENC. OpenGL and Vulkan works fine.
Blender (flatpak) throws error: CUDA cuInit: Unknown error
DaVinci Resolve (podman container):
Claims no supported GPU found and does not display any errors.
- Rocky Linux 8.9 distrobox box container with --nvidia flag to share the entire host driver with the container
OBS Studio (flatpak):
FFmpeg VAAPI HEVC encoding not supported
[NVENC] Test process failed: cuda_init_999
NVENC not supported
Failed to initialize module 'obs-nvenc.so'
cuda_init_999 is also known as: CUDA cuInit: Unknown error
InvokeAI (podman container): torch.py CUDA cuInit: Unknown error
- openSUSE Tumbleweed distrobox container with --nvidia flag to share entire host driver with the container
Running nvidia-smi will work just fine and not output any errors. Even if ran from a CUDA “disabled” container will work. But still CUDA won’t be usable. It will just show the driver and CUDA version as expected as well as the applications running on the GPU. Running GPU intensive apps like games and such will also work fine either from flatpak or a container. Only CUDA seems affected by this.
However without rebooting or doing anything with the system itself it will at any random time “self heal” and out of the sudden all containers and flatpaks will be able to use CUDA just fine. Without restarting the containers, restarting the flatpak sandbox or without rebooting the system. Just closing and reopening the application so it will run cuInit once more.
It was only today that I found out that some can speed up this “self healing” process by just running any CUDA enabled application without containerisation or a sandbox. Like downloading and running blender right from blender.org and run the native Linux binary.
Which makes me believe there is some obscure driver error at hand here and that it somehow does not allow for cuInit via containers or sandboxes unless some none-sandbox application has once triggered it.
Host System Specs:
OS: Aeon Desktop (based on openSUSE Tumbleweed)
Linux Kernel: 6.14.6-1-default
GPU: RTX 3080
Driver: 570.144
CUDA: 12.8
Podman: 5.4.2
flatpak: 1.16.0
nVidia related SystemD services:
● nvidia-persistenced.service - NVIDIA Persistence Daemon
Loaded: loaded (/usr/lib/systemd/system/nvidia-persistenced.service; enabled; preset: enabled)
Active: active (running) since Sat 2025-05-17 09:37:06 CEST; 35min ago
Invocation: 9fa6c77c6dfc40ae857f10ef34cd2f74
Process: 1369 ExecStart=/usr/bin/nvidia-persistenced --verbose (code=exited, status=0/SUCCESS)
Main PID: 1424 (nvidia-persiste)
Tasks: 1 (limit: 18476)
CPU: 14ms
CGroup: /system.slice/nvidia-persistenced.service
└─1424 /usr/bin/nvidia-persistenced --verbose
Mai 17 09:37:06 makron systemd[1]: Starting NVIDIA Persistence Daemon...
Mai 17 09:37:06 makron nvidia-persistenced[1424]: Verbose syslog connection opened
Mai 17 09:37:06 makron nvidia-persistenced[1424]: Directory /var/run/nvidia-persistenced will not be removed on exit
Mai 17 09:37:06 makron nvidia-persistenced[1424]: Started (1424)
Mai 17 09:37:06 makron nvidia-persistenced[1424]: device 0000:01:00.0 - registered
Mai 17 09:37:06 makron nvidia-persistenced[1424]: device 0000:01:00.0 - persistence mode enabled.
Mai 17 09:37:06 makron nvidia-persistenced[1424]: device 0000:01:00.0 - NUMA memory onlined.
Mai 17 09:37:06 makron nvidia-persistenced[1424]: Local RPC services initialized
Mai 17 09:37:06 makron systemd[1]: Started NVIDIA Persistence Daemon.
○ nvidia-hibernate.service - NVIDIA system hibernate actions
Loaded: loaded (/usr/lib/systemd/system/nvidia-hibernate.service; enabled; preset: enabled)
Active: inactive (dead)
○ nvidia-powerd.service - nvidia-powerd service
Loaded: loaded (/usr/lib/systemd/system/nvidia-powerd.service; enabled; preset: enabled)
Active: inactive (dead) since Sat 2025-05-17 09:37:06 CEST; 35min ago
Duration: 10ms
Invocation: 29cc229c91ef487a8e04023133a50a3f
Process: 1370 ExecStart=/usr/bin/nvidia-powerd (code=exited, status=1/FAILURE)
Main PID: 1370 (code=exited, status=1/FAILURE)
CPU: 6ms
Mai 17 09:37:06 makron systemd[1]: Started nvidia-powerd service.
Mai 17 09:37:06 makron /usr/bin/nvidia-powerd[1370]: nvidia-powerd version:1.0(build 1)
Mai 17 09:37:06 makron /usr/bin/nvidia-powerd[1370]: Found unsupported configuration. Exiting...
Mai 17 09:37:06 makron systemd[1]: nvidia-powerd.service: Deactivated successfully.
○ nvidia-suspend.service - NVIDIA system suspend actions
Loaded: loaded (/usr/lib/systemd/system/nvidia-suspend.service; enabled; preset: enabled)
Active: inactive (dead)
○ nvidia-resume.service - NVIDIA system resume actions
Loaded: loaded (/usr/lib/systemd/system/nvidia-resume.service; enabled; preset: enabled)
Active: inactive (dead)
○ nvidia-suspend-then-hibernate.service - NVIDIA actions for suspend-then-hibernate
Loaded: loaded (/usr/lib/systemd/system/nvidia-suspend-then-hibernate.service; disabled; preset: disabled)
Active: inactive (dead)
Tested container distros:
Rocky Linux 8.9
openSUSE Tumbleweed (latest snapshot)
Steps to (hopefully reproduce):
- Setup Linux
- Fresh boot
- Run any app from flatpak such as Blender → Edit → Preferences → System: CUDA and OptiX will require GPU with compute support 3.0 and 5.0 respectively but can not find any
- or: run any app from a container with shared nvidia host driver such as InvokeAI: CUDA cuInit: Unknown error
- Run blender outside of flatpak and open up Edit → Preferences → System: CUDA and OptiX will show supported GPU
- Run blender from flatpak: Edit → Preferences → System: CUDA and OptiX will now also shwo a supported GPU
- Run InvokeAI from container: GPU will also by picked up by torch just fine
I am very sorry for this issue report as it seems to be very obscure but still it is at least somewhat consistent. Also I do not know what the driver did when the “self healing” occurs.
Bug report zip:
nvidia-bug-report.log.gz (553.7 KB)