New drivers broke Tesla P40 WDDM mode on KVM and GPU Passthrough

Running Telsa P40 on Linux KVM set up with PCI passthrough to both Windows 10 and Windows 11 VMs.

When I try to install driver > 514 and switch to WDDM mode it BSODs on reboot.

Windows 10: VIDEO_TDR_FAILURE nvlddmkm.sys
Windows 11: Error 225 “A required device isn’t connected or can’t be accessed”

NVIDIA grid driver 514 or less seems to be fine. Any grid driver or normal driver later than that causes the above.

Please advise.

Hi, please open a support ticket with NVES. As your setup requires vGPU licensing you are eligable for support so please use this route to get the issue investigated.

I tried loading the 16.2 / 535 vGPU Linux KVM drivers on an underlying host and get this after module install. No luck either. No other modules interfering.

[ 406.795521] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 535.129.03 Thu Oct 19 18:56:32 UTC 2023
[ 406.918612] failing symbol_get of non-GPLONLY symbol nvidia_vgpu_vfio_get_ops.
[ 406.918613] [nvidia-vgpu-vfio] Unable to get symbol for nvidia_vgpu_vfio_get_ops from nvidia.ko

Having the same issue…

Debian 12 (Bookworm)
Kernel 6.1.0-13


ERROR: Unable to load the kernel module 'nvidia-vgpu-vfio.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sou│
                                                                                                                                                                                    │
Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.                            │
-> Kernel module load error: Invalid argument                                                                                                                                       │
-> Kernel messages:                                                                                                                                                                 │
[   13.124182] audit: type=1400 audit(1701031203.576:26): apparmor="STATUS" operation="profile_load" profile="unconfined" name="containers-default-0.33.4" pid=5162 comm="apparmor_p│
[   14.256687] audit: type=1400 audit(1701031204.708:27): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-aafbd28a-d5ea-402e-a25d-b6450afc8b6c" pid=61│
[   14.338979] audit: type=1400 audit(1701031204.792:28): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-aafbd28a-d5ea-402e-a25d-b6450afc8b6c" pid│
[   14.422360] audit: type=1400 audit(1701031204.872:29): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-aafbd28a-d5ea-402e-a25d-b6450afc8b6c" pid│
[   14.560626] audit: type=1400 audit(1701031205.012:30): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-4800897d-9ee1-44b6-a028-1eeb86cc43d6" pid=63│
[   14.639981] audit: type=1400 audit(1701031205.092:31): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-4800897d-9ee1-44b6-a028-1eeb86cc43d6" pid│
[   14.719984] audit: type=1400 audit(1701031205.172:32): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-4800897d-9ee1-44b6-a028-1eeb86cc43d6" pid│
[   14.803550] audit: type=1400 audit(1701031205.256:33): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-26565782-dbc0-4646-a57d-3770fc721080" pid=63│
[   14.881579] audit: type=1400 audit(1701031205.332:34): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-26565782-dbc0-4646-a57d-3770fc721080" pid│
[   14.962175] audit: type=1400 audit(1701031205.412:35): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-26565782-dbc0-4646-a57d-3770fc721080" pid│
[   15.056273] tun: Universal TUN/TAP device driver, 1.6                                                                                                                            │
[   15.056741] br0: port 3(vnet0) entered blocking state                                                                                                                            │
[   15.056748] br0: port 3(vnet0) entered disabled state                                                                                                                            │
[   15.056783] device vnet0 entered promiscuous mode                                                                                                                                │
[   15.056837] br0: port 3(vnet0) entered blocking state                                                                                                                            │
[   15.056843] br0: port 3(vnet0) entered forwarding state                                                                                                                          │
[   15.577181] x86/split lock detection: #AC: CPU 1/KVM/6433 took a split_lock trap at address: 0x7efce050                                                                          │
[   27.100979] NFSD: all clients done reclaiming, ending NFSv4 grace period (net f0000000)                                                                                          │
[  124.117408] nvidia-nvlink: Nvlink Core is being initialized, major device number 235                                                                                             │
                                                                                                                                                                                    │
[  124.118287] nvidia 0000:01:00.0: enabling device (0000 -> 0002)                                                                                                                  │
[  124.234040] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  535.129.03  Thu Oct 19 18:56:32 UTC 2023                                                                             │
[  124.239964] failing symbol_get of non-GPLONLY symbol nvidia_vgpu_vfio_get_ops.                                                                                                   │
[  124.239965] [nvidia-vgpu-vfio] Unable to get symbol for nvidia_vgpu_vfio_get_ops from nvidia.ko                                                                                  │
[  133.689014] nvidia-nvlink: Unregistered Nvlink Core, major device number 235

Found a solution: downgrade to kernel 6.1.0-11