I realise this isn’t a support forum for nouveau, but my question specifically relates to the behaviour of nvidia-modprobe
I’ve not provided logs or a bug report here as I’m more generally interested in how to tweak observed behaviour of the nVidia userspace driver components (shared libraries).
I’m running Arch linux and I frequently switch between driver combinations to test various development scenarios.
I have drivers installed for nvidia (545.29.06) + nouveau (mesa) + amdgpu.
I switch between these using kernel boot options to blacklist as necessary. This has worked well for me on Fedora 38/39 and I’m now broadening my development/test-bed surface.
I’ve found that when blacklisting the nvidia drivers and running a simple diagnostic (vulkaninfo, eglinfo, gbminfo) the relevant ICD is attempting to poll the nvidia driver to see what’s available.
That’s all normal and fine, it does the same for all other ICD’s I have installed and if the driver isn’t running then it’s by definition not available.
Except for nvidia :)
When I run one of these (vulkaninfo in particular) the ICD wants to call out to libGLX_nvidia.so.0 which is then automagically trying to load the nVidia drivers on-demand by way of nvidia-modprobe
That behaviour is documented as such:
If the user-space NVIDIA driver component cannot load the kernel module or create the device files itself, it will attempt to invoke the setuid root nvidia-modprobe utility, which will perform these operations on behalf of the non-privileged driver.
While that sounds like a great idea, in this scenario I’m not using nvidia drivers. I blacklisted them at boot time and am actively using nouveau drivers. Obviously, it fails (correctly) to load the nvidia drivers.
vulkaninfo (and friends) report:
ERROR: [Loader Message] Code 0 : loader_scanned_icd_add: Could not get 'vkCreateInstance' via 'vk_icdGetInstanceProcAddr' for ICD libGLX_nvidia.so.0
Now, that’s probably not what I’d expect (though I can live with it). I have radeon ICD’s installed and no such issue is triggered when I’m not using the AMD drivers.
However… in the case of vulkaninfo it appears to want to query the driver many times consecutively… this causes nvidia-modprobe to be pinged 32 times and it takes a solid 42 seconds (and some amount of load) for nvidia-modprobe to repeatedly fail to load drivers that are not going to load.
My question is… (finally)
How can I best augment this behaviour in this arrangement?
Option 1)
I can temporarily move the nvidia ICD’s out of the way. This works and has the desired effect, but isn’t a robust solution since I’m switching between drivers frequently.
Option 2)
I can move nvidia-modprobe out of the way. The ICD loader still throws an error, but it does so quickly and without anything trying to spin up the nvidia drivers. Unfortunately this also prevents the ICD loader from working correctly even when the nvidia drivers are loaded (when I want them to be).
So it seems there is a hard dependency on nvidia-modprobe.
Is there some other way I can:
a) Tell the nvidia libs (libGLX_nvidia.so.0 and friends) to not attempt calling out to nvidia-modprobe
b) tell nvidia-modprobe to do nothing if the nvidia modules aren’t loaded already?
I did take a look at setting NVreg_ModifyDeviceFiles=0 but I’m not sure it applies to this situation (and did not appear to help as a kernel boot option)
It seems there must already be a mechanism to do something along these lines - since I do not encounter this behaviour on Fedora Linux (on the same machine) and more interestingly they do not appear to distribute nvidia-modprobe as part of their packages. It isn’t clear to me exactly how this is achieved but I suspect it has something to do with their nvidia distribution utilising akmods. Certainly I don’t encounter this issue on Fedora when the nvidia drivers are blacklisted.
Any guidance or correction in my understanding would be much appreciated, thank you.