Prime render offloading not working through AMDGPU

I have a dual-gpu system with an RTX4070 and a polaris radeon (I think a 440?). This is an opensuse tumbleweed system where I have the rtx set up to detach from linux to be used in a windows 10 VM. All of that works just fine. Each card is hooked up to a different input on the monitor, the radeon is HDMI->HDMI and the RTX is DP->HDMI through a converter. I have stubbed nvidia-drm and it does not load, which means Xorg does not grab the nvidia card for a display.

Any attempts to use render offloading (either vulkan or gl) throw some sort of error. My understanding is that nvidia-drm is not necessary for anything other than driving a display, which I am not interested in having it do on the host (linux) system. If that is incorrect, I would appreciate the correction. Otherwise, I am very confused as to why the rtx can be attached to the system, can be used for non-offloading tasks, and can be selected for offloading, but offloading errors out.

Modules loaded:

nvidia_modeset       1605632  0
nvidia_uvm           6610944  0
nvidia              60370944  2 nvidia_uvm,nvidia_modeset
video                  77824  4 asus_wmi,amdgpu,asus_nb_wmi,nvidia_modeset

confirmation that the RTX is accessible (and the system can use the CUDA cores):

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070        Off |   00000000:08:00.0 Off |                  N/A |
| 30%   44C    P2             27W /  200W |    2137MiB /  12282MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     15742      C   python3                                      2132MiB |
+-----------------------------------------------------------------------------------------+

Attempts to offload glx rendering:

> __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia glxgears
X Error of failed request:  BadAlloc (insufficient resources for operation)
  Major opcode of failed request:  151 (GLX)
  Minor opcode of failed request:  5 (X_GLXMakeCurrent)
  Serial number of failed request:  0
  Current serial number in output stream:  36

> __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia glxinfo | grep renderer
X Error of failed request:  BadAlloc (insufficient resources for operation)
  Major opcode of failed request:  151 (GLX)
  Minor opcode of failed request:  5 (X_GLXMakeCurrent)
  Serial number of failed request:  0
  Current serial number in output stream:  31

Attempts to offload vulkan rendering:

> __NV_PRIME_RENDER_OFFLOAD=1 vkcube
Selected GPU 0: NVIDIA GeForce RTX 4070, type: DiscreteGpu
Segmentation fault (core dumped)

Journalctl output:

                                                               #3  0x00007faadce2b8a1 n/a (libnvidia-glcore.so.550.54.14 + 0xe2b8a1)
                                                               #4  0x00007faadc9f5924 n/a (libnvidia-glcore.so.550.54.14 + 0x9f5924)
                                                               #5  0x00007faade292bb2 start_thread (libc.so.6 + 0x92bb2)
                                                               #6  0x00007faade31400c __clone3 (libc.so.6 + 0x11400c)
                                                               
                                                               Stack trace of thread 13849:
                                                               #0  0x00007faade28effe __futex_abstimed_wait_common (libc.so.6 + 0x8effe)
                                                               #1  0x00007faade291d40 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x91d40)
                                                               #2  0x00007faad8a38b3b n/a (libvulkan_radeon.so + 0x238b3b)
                                                               #3  0x00007faad8a485f7 n/a (libvulkan_radeon.so + 0x2485f7)
                                                               #4  0x00007faade292bb2 start_thread (libc.so.6 + 0x92bb2)
                                                               #5  0x00007faade31400c __clone3 (libc.so.6 + 0x11400c)
                                                               
                                                               Stack trace of thread 13852:
                                                               #0  0x00007faade28effe __futex_abstimed_wait_common (libc.so.6 + 0x8effe)
                                                               #1  0x00007faade291d40 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x91d40)
                                                               #2  0x00007faadc9f340c n/a (libnvidia-glcore.so.550.54.14 + 0x9f340c)
                                                               #3  0x00007faadce26255 n/a (libnvidia-glcore.so.550.54.14 + 0xe26255)
                                                               #4  0x00007faadc9f5924 n/a (libnvidia-glcore.so.550.54.14 + 0x9f5924)
                                                               #5  0x00007faade292bb2 start_thread (libc.so.6 + 0x92bb2)
                                                               #6  0x00007faade31400c __clone3 (libc.so.6 + 0x11400c)
                                                               
                                                               Stack trace of thread 13853:
                                                               #0  0x00007faade28effe __futex_abstimed_wait_common (libc.so.6 + 0x8effe)
                                                               #1  0x00007faade292065 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x92065)
                                                               #2  0x00007faadc9f346c n/a (libnvidia-glcore.so.550.54.14 + 0x9f346c)
                                                               #3  0x00007faadce1009d n/a (libnvidia-glcore.so.550.54.14 + 0xe1009d)
                                                               #4  0x00007faadc9f5924 n/a (libnvidia-glcore.so.550.54.14 + 0x9f5924)
                                                               #5  0x00007faade292bb2 start_thread (libc.so.6 + 0x92bb2)
                                                               #6  0x00007faade31400c __clone3 (libc.so.6 + 0x11400c)
                                                               
                                                               Stack trace of thread 13851:
                                                               #0  0x00007faade28effe __futex_abstimed_wait_common (libc.so.6 + 0x8effe)
                                                               #1  0x00007faade292065 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x92065)
                                                               #2  0x00007faadc9f346c n/a (libnvidia-glcore.so.550.54.14 + 0x9f346c)
                                                               #3  0x00007faadce3c681 n/a (libnvidia-glcore.so.550.54.14 + 0xe3c681)
                                                               #4  0x00007faadc9f5924 n/a (libnvidia-glcore.so.550.54.14 + 0x9f5924)
                                                               #5  0x00007faade292bb2 start_thread (libc.so.6 + 0x92bb2)
                                                               #6  0x00007faade31400c __clone3 (libc.so.6 + 0x11400c)
                                                               
                                                               Stack trace of thread 13854:
                                                               #0  0x00007faade28effe __futex_abstimed_wait_common (libc.so.6 + 0x8effe)
                                                               #1  0x00007faade292065 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x92065)
                                                               #2  0x00007faadc9f346c n/a (libnvidia-glcore.so.550.54.14 + 0x9f346c)
                                                               #3  0x00007faadcf076d4 n/a (libnvidia-glcore.so.550.54.14 + 0xf076d4)
                                                               #4  0x00007faadcef47b6 n/a (libnvidia-glcore.so.550.54.14 + 0xef47b6)
                                                               #5  0x00007faadc9f5924 n/a (libnvidia-glcore.so.550.54.14 + 0x9f5924)
                                                               #6  0x00007faade292bb2 start_thread (libc.so.6 + 0x92bb2)
                                                               #7  0x00007faade31400c __clone3 (libc.so.6 + 0x11400c)
                                                               ELF object binary architecture: AMD x86-64

drm stands for Direct Rendering Manager, without nvidia-drm, rendering won’t work.

Doh, thanks for setting me straight.

I only plugged the module because it was impossible to unload even when the rtx was excluded from driving displays, which in turn meant the rtx couldn’t be detached for the VM. Any thoughts on how to achieve that outside of restarting the Linux host DE?

That won’t work due to several reasons.

If the module isn’t driving a display and has no processes it’s attached to, why wouldn’t you be able to unload the module like the other Nvidia kernel modules? I get that it gets upset when you try, but I’m struggling to understand why.

What you want is a hot unplug situation which is shitty with linux, something always keeps a hold on the gpu and won’t let go. Regarding hot unplug, Xorg is shitty, Wayland as well. The linux drm infra is shitty, the nvidia driver as well.

I did a little bit more research and I have an update: like you said, nvidia-drm is necessary, so I’m no longer stubbing it. Instead I am turning nvidia kms off and removing the egl spec file (15_nvidia_gbm.json) from /usr/share/egl/egl_external_platform.d/ . This prevents wayland and gnome from grabbing the nvidia card. It can now be swapped without any issues within the same DM session using virt-manager’s automatic detach/attach process.

Still can’t get prime render offloading working when the rtx is attached to the host. Same errors as before. However, this time all of the driver modules (nvidia, *-drm, *-uvm, *-modesetting) are loaded. Is modesetting on the nvidia driver a requirement for prime offloading?

I have seen at least one report of a similar setup working by specifying the egl/glx file. I tried redirecting to a different path where I moved the egl file and no dice. Doing the same with vulkan is also suggested but for some reason tumbleweed doesn’t ship with an nvidia icd (even on my optimus laptop with fully-functional prime offloading, I can’t find it anywhere).

That’s one of the issues. while disabling nvidia drm kms is one prerequisite to unload the nvidia gpu, it’s at the same time needed for many offloading scenarios. Also with subtle difference between intel (i915) or amd (amdgpu) being the target.
The vulkan icd should be in /usr/share/vulkan/icd.d

That’s where I’m confused, because there are reports of that setup working. I wonder if it’s just vulkan offloading that works. I haven’t been able to test that since I can’t find an ICD file to aim it at. I’ll keep looking into that.

I do wonder if there’s any way to completely exclude a GPU from attaching to wayland. I know there isn’t (really) in x11, but the nvidia support for wayland is so new I honestly have no idea. I would think keeping the drm kms loaded but finding some way to keep EGL and wayland from attaching seems like it would work. Like I said, I have no idea how that would look.

I suppose another option would be to use bumblebee and eat the performance cost, since bumblebee works on its own display that can be destroyed without impacting the user session.