[Regression 460 series] Black screen on boot: nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer

After upgrading from 455.45.01 to 460.32.03, I get a black screen after loading of nvidia-modeset: First, the screen goes black, then the backlight is toggled several times (about 10 times), finally it stays blank and dark.
Kernel log contains:

Jan 10 19:05:05 localhost kernel: [    5.773942] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Jan 10 19:05:05 localhost kernel: [   10.275739] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
Jan 10 19:05:20 localhost kernel: [   35.249151] nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
Jan 10 19:05:20 localhost kernel: [   35.249945] nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer

Downgrading back to 455.45.01 fixes this. I observe the same with 460.27.04.
Since 460.32.03 is now released as stable and contains critical security fixes, I’m raising this as a new issue, though it is likely the same as already reported in:

Here’s the bug report file:
nvidia-bug-report.log.gz (1.1 MB)
As usual, I had to kill vulkaninfo which did hang in terminal mode otherwise.

1 Like

You could try to work around the bug by

  • enabling CSM in bios (but still boot using efi)
  • setting nvidia-drm.modeset=1 kernel parameter

Thanks for the tips!

However:

enabling CSM in bios (but still boot using efi)

That would also mean to disable secure boot, in turn causing other issues (security, functionality loss in Windows). I can for sure try that for a test, but could you explain / link to some explanation on why you expect this to change behaviour?

setting nvidia-drm.modeset=1 kernel parameter

That’s already the case, as you can see from the report:

$ zgrep nvidia-drm nvidia-bug-report.log.gz | tail -n1    
root=UUID=32278c21-b19c-47e0-8466-420bbb5a1642 ro rd.dm=0 nvidia-drm.modeset=1 net.ifnames=0 pcie_aspm=force initrd=boot\initramfs-5.9.11-gentoo.img

Any further hint appreciated. If it’s better to also send the report to the nvidia bug report mail, just let me know.

The error you’re getting is a recurring bug on older hardware, seems to be related to early vbios init. Turning on CSM often worked around it. Since this bug is very harware specific, it’s rarely ever getting fixed.

Thanks, I will give CSM a try later then (disabling SecureBoot means unregistering my signing keys, so I am reluctant there).

For me, the issue is new with the 460 drivers and to my memory never showed up before even though I have been following new releases (also beta) for years, so in my case, it is not recurring.

I am seeing a similar error, machine boots ok but after suspend/resume shows a black screen then after ~120 seconds an error message. This is with nvidia-460 on a Razr Blade 15" (2018) with external monitors. Enabling CSM did not help. [also nvidia-drm.modeset=1 does not help].

Are you sure this isn’t just a regression? I’ve not seen this error previously in 18 months or so of using this laptop, and the laptop is newer hardware.

[  309.142164] nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
[  309.142319] nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
[  313.142165] nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
[  313.142348] nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
[  313.151885] acpi LNXPOWER:08: Turning OFF
[  313.151898] acpi LNXPOWER:04: Turning OFF
[  313.152351] acpi LNXPOWER:03: Turning OFF
[  313.153064] acpi LNXPOWER:02: Turning OFF
1 Like

Actually, turns out I can’t: Activating CSM on that laptop happens implicitly only after the following steps:

  • Disable SecureBoot (which leads to loss of functionality and safety, but fine for a test). Doing this alone does not change anything, I still see the issue.
  • Enable “Load legacy option ROMs”. This also means the legacy graphics will be loaded, and while my Linux EFI bootloader refind is still seen by the UEFI, it does not load it anymore.

So it seems enabling CSM and booting via UEFI is not possible with this UEFI.

Other ideas welcome. Also, please let me know if this issue should be reported to the nvidia bug report mail or whether reporting in these forums is sufficient to raise awareness. While my hardware is old(ish), I’m still reluctant to accept this is not a regression, given that this is new behaviour with the R460 series on my hardware, and seeing the reports by others.