Can’t rebind GPU with ‘driverctl’ if system booted with GPU attached to nvidia driver.
Please consider this an issue to look into. I didn’t find an issue tracker to post this to, but if there’s a better place, just let me know.
I’ve found a workaround (shown here) and am not expecting a quick/easy answer. HOWEVER if there is a step I am missing that would allow the issue to be ignored, I welcome the reply.
Came across this issue while working on moving from Windows to Debian Linux. I’m setting up my system to use VFIO for a Win VM.
Notes:
- Issue related to binding the driver, not related to any problems with graphics on desktop or games.
- I don’t have X installed in any form at this time
- I don’t currently have any functional VMs, this is a step I wanted to get through prior to setting those up
Given the process I’m about to outline works fine when using nouveau and vfio-pci, I’m guessing there’s something with the nvidia-driver package causing this behavior.
Relevant virtualization package install line:
Note: System is Debian 11 Bullseye
apt -y install qemu-kvm qemu-utils libvirt-daemon-system libvirt-clients virt-manager ovmf driverctl
At this point, with no ‘nvidia-driver’ package installed, I can successfully rebind the GPU between ‘nouveau’ and ‘vfio-pci’ to my heart’s content via:
# 3080 TI ... first = GPU, second = HD Audio
driverctl set-override 0000:0b:00.0 vfio-pci
driverctl set-override 0000:0b:00.1 vfio-pci
driverctl set-override 0000:0b:00.0 nouveau
driverctl set-override 0000:0b:00.1 snd_hda_audio
Likewise I can dynamically switch back to the defaults (nouveau & hda audio) like this:
driverctl unset-override 0000:0b:00.0
driverctl unset-override 0000:0b:00.1
Where the issue happens:
If I do not have an override set (as in I didn’t execute the commands above or I run the ‘unset-override’ to remove the overrides and go back to a stock config) and I do this:
apt -y install nvidia-driver firmware-misc-nonfree
and reboot, the system will come up with the GPU bound to the nvidia driver, like I’m sure most bare metal GPU users would expect. And this is where it gets ugly.
At this point if I try to:
driverctl set-override 0000:0b:00.0 vfio-pci
or
driverctl set-override 0000:0b:00.0 nouveau
the ‘driverctl’ command hangs hard. No way to kill, have to start a new shell to issue a reboot command to clear it out.
BUT
Workaround:
If I boot with an override (vfio-pci or nouveau), I -can- successfully:
driverctl --nosave set-override 0000:0b:00.0 nvidia
IMPORTANT: add the ‘–nosave’ so that, on a reboot, the system doesn’t bind the GPU to the nvidia driver.
And at this point I can rebind between nvidia / vfio-pci / nouveau / etc multiple times without issue.
Conclusion:
I can bind/unbind the nvidia driver dynamically so long as the system doesn’t attach to it at boot.
Why? I don’t know.
It’s possible this is an issue specific to driverctl but it doesn’t currently feel like it.
FWIW, I have a thread on Level1Techs where I’ll probably be more active in trying to post anything relevant that other users might want if they see this in a search.