570 release feedback & discussion

I had a reproducer lately:

Playing Star Wars Outlaws pretty reliably “leaks” VRAM in the driver which nvidia-smi doesn’t show. What usually follows are errors like this:

Mär 21 02:20:32 jupiter kernel: NVRM: Xid (PCI:0000:01:00): 69, pid=7566, name=chrome, Class Error: ChId 00b0, Class 0000902d, Offset 0000023c, Data 00000000, ErrorCode 00000004
Mär 21 02:20:33 jupiter kernel: NVRM: Xid (PCI:0000:01:00): 69, pid=82105, name=chrome, Class Error: ChId 00b0, Class 0000902d, Offset 0000023c, Data 00000000, ErrorCode 00000004
Mär 21 02:20:34 jupiter kernel: NVRM: Xid (PCI:0000:01:00): 69, pid=82137, name=chrome, Class Error: ChId 00b0, Class 0000902d, Offset 0000023c, Data 00000000, ErrorCode 00000004
Mär 21 02:20:34 jupiter kernel: [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
Mär 21 02:20:34 jupiter kernel: [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
Mär 21 02:20:34 jupiter kernel: [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
Mär 21 02:20:34 jupiter kernel: [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
Mär 21 02:20:35 jupiter kernel: [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
Mär 21 02:20:35 jupiter kernel: [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
Mär 21 02:20:35 jupiter kernel: [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
Mär 21 02:20:35 jupiter kernel: [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object

After quitting the game and restarting all the processes using VRAM according to nvidia-smi, there were still around 500 MB missing. DXVK games seem to not trigger the missing memory behavior. Maybe because DXVK games tend to do a lot less Vulkan memory allocations due to the chunk allocator?

Anyway, after the NVKMS errors in dmesg, I usually also see errors about the kernel having page fault errors and I have to reboot. So the driver probably leaks not only VRAM but also somehow damages other kernel memory structures:

Mär 22 03:22:56 jupiter kernel: Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF
Mär 22 03:22:56 jupiter kernel: Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF
Mär 22 03:22:56 jupiter kernel: Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF

I am not able to reproduce the latter without the NVIDIA driver or on system without NVIDIA hardware but otherwise very similar configuration. It happens with both DXVK and vkd3d games.

This has become better and worse with the latest 570 driver series: While the desktop no longer renders black windows randomly or completely crashes, I can trigger this problem within 24 hours usually.

The problem is worse using the kernel open driver, so I’m currently going with the closed driver.

1 Like