[Regression 460 series] Black screen on boot: nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer

After upgrading from 455.45.01 to 460.32.03, I get a black screen after loading of nvidia-modeset: First, the screen goes black, then the backlight is toggled several times (about 10 times), finally it stays blank and dark.
Kernel log contains:

Jan 10 19:05:05 localhost kernel: [    5.773942] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Jan 10 19:05:05 localhost kernel: [   10.275739] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
Jan 10 19:05:20 localhost kernel: [   35.249151] nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
Jan 10 19:05:20 localhost kernel: [   35.249945] nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer

Downgrading back to 455.45.01 fixes this. I observe the same with 460.27.04.
Since 460.32.03 is now released as stable and contains critical security fixes, Iā€™m raising this as a new issue, though it is likely the same as already reported in:

Hereā€™s the bug report file:
nvidia-bug-report.log.gz (1.1 MB)
As usual, I had to kill vulkaninfo which did hang in terminal mode otherwise.

4 Likes

You could try to work around the bug by

  • enabling CSM in bios (but still boot using efi)
  • setting nvidia-drm.modeset=1 kernel parameter

Thanks for the tips!

However:

enabling CSM in bios (but still boot using efi)

That would also mean to disable secure boot, in turn causing other issues (security, functionality loss in Windows). I can for sure try that for a test, but could you explain / link to some explanation on why you expect this to change behaviour?

setting nvidia-drm.modeset=1 kernel parameter

Thatā€™s already the case, as you can see from the report:

$ zgrep nvidia-drm nvidia-bug-report.log.gz | tail -n1    
root=UUID=32278c21-b19c-47e0-8466-420bbb5a1642 ro rd.dm=0 nvidia-drm.modeset=1 net.ifnames=0 pcie_aspm=force initrd=boot\initramfs-5.9.11-gentoo.img

Any further hint appreciated. If itā€™s better to also send the report to the nvidia bug report mail, just let me know.

1 Like

The error youā€™re getting is a recurring bug on older hardware, seems to be related to early vbios init. Turning on CSM often worked around it. Since this bug is very harware specific, itā€™s rarely ever getting fixed.

Thanks, I will give CSM a try later then (disabling SecureBoot means unregistering my signing keys, so I am reluctant there).

For me, the issue is new with the 460 drivers and to my memory never showed up before even though I have been following new releases (also beta) for years, so in my case, it is not recurring.

I am seeing a similar error, machine boots ok but after suspend/resume shows a black screen then after ~120 seconds an error message. This is with nvidia-460 on a Razr Blade 15" (2018) with external monitors. Enabling CSM did not help. [also nvidia-drm.modeset=1 does not help].

Are you sure this isnā€™t just a regression? Iā€™ve not seen this error previously in 18 months or so of using this laptop, and the laptop is newer hardware.

[  309.142164] nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
[  309.142319] nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
[  313.142165] nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
[  313.142348] nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
[  313.151885] acpi LNXPOWER:08: Turning OFF
[  313.151898] acpi LNXPOWER:04: Turning OFF
[  313.152351] acpi LNXPOWER:03: Turning OFF
[  313.153064] acpi LNXPOWER:02: Turning OFF
4 Likes

Actually, turns out I canā€™t: Activating CSM on that laptop happens implicitly only after the following steps:

  • Disable SecureBoot (which leads to loss of functionality and safety, but fine for a test). Doing this alone does not change anything, I still see the issue.
  • Enable ā€œLoad legacy option ROMsā€. This also means the legacy graphics will be loaded, and while my Linux EFI bootloader refind is still seen by the UEFI, it does not load it anymore.

So it seems enabling CSM and booting via UEFI is not possible with this UEFI.

Other ideas welcome. Also, please let me know if this issue should be reported to the nvidia bug report mail or whether reporting in these forums is sufficient to raise awareness. While my hardware is old(ish), Iā€™m still reluctant to accept this is not a regression, given that this is new behaviour with the R460 series on my hardware, and seeing the reports by others.

1 Like

I am getting the same error. every time when the computer resumes after suspending

1 Like

Same problem here when I close and then open the laptop lid, anybody knows if is there a fix now?

I have the same issue on my machine. I also notice the same error in my log files:

 kernel: nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
 kernel: nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
 kernel: ahci 0000:00:17.0: port does not support device sleep
 kernel: ata5.00: Enabling discard_zeroes_data
 kernel: nvme nvme0: 12/0/0 default/read/poll queues
 kernel: nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
 kernel: nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
 kernel: nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
 kernel: nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
 kernel: ata5.00: Enabling discard_zeroes_data
 kernel: nvme nvme0: 12/0/0 default/read/poll queues
 kernel: nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
 kernel: nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
 kernel: nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
 kernel: nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
 kernel: fbcon: Taking over console
 kernel: acpi LNXPOWER:08: Turning OFF
 kernel: acpi LNXPOWER:04: Turning OFF
 zim kernel: mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_component_ops [i915])
 kernel: acpi LNXPOWER:03: Turning OFF
 kernel: acpi LNXPOWER:02: Turning OFF
 kernel: OOM killer enabled.
 kernel: Restarting tasks ... 
 kernel: pci_bus 0000:07: Allocating resources
 kernel: pci_bus 0000:3d: Allocating resources

I also tried various bios switches. Enable CSM, Disable secure boot. But nothing mattered.

Iā€™ve run in the same issue, see my duplicate help request (Iā€™m sorry about making a duplicate). Today Iā€™ve tried various bioses for my HP zbook studio G3 (Nvidia Quadro M1000M) but it didnā€™t help. I played around with secure boot and legacy bios modus. Nothing worked.

Added to the issue of crashing on waking from sleep I experience problems with audio over HDMI. Where my tv was able to provide audio with the 450 drivers Iā€™m unable to have it work under the 460 drivers. Independent whether I use the On-Demand or Nvidia (Performance Mode). Neither was I able to make it work on my Intel (Power Saving Mode) although that wasnā€™t possible with the 450 drivers either.

As a side note my HDMI connector is linked to the Nvidia card and I needed to wait on the reverse prime to be able to use Nvidia On-Demand when using my HDMI connection instead of my internal screen. Iā€™m not sure if this detail matters for our current issue.

Note that this still happens for me with 460.56 (but there was also no related change in the release notes). If a new bug report might be helpful for further investigation, just let me know.

The model numbers this update broke is quite high and even recent ones like the lenovo legion 7 are affected. The bios update lenovo provided for that only fixed the situation running windows. I really donā€™t know where this is leading.

Iā€™m pretty sure this is a regression as I was working fine with Ubuntu 18.04 with a 410.x driver . Iā€™m now running 460.39 under Ubuntu 20.10. The hardware is HP Zbook Studio G3 with Quadro M1000M GPU.

I have the same issue with the 460.39 driver on Ubuntu 20.04/Kernel 5.8.0 on Dell Precision 7510 with Quadro M2000M.

Exactly the same happens on my Razor 15" (2018) without any external monitor. NVIDIA GeForce GTX 1060 Max-Q. It happened on Ubuntu 20.04 and still happens on Ubuntu 20.10. Branches 450 and 460 both does not work properly. Only the legacy version 390 works correctly.

This started happening for me (black screen when resuming after suspend) after an automatic update to 460.32.03 from 455.38 (see history.log file). Iā€™m on an Acer Aspire 7 with NVIDIA GeForce GTX 1050, with Ubuntu 20.04.2 LTS.

Iā€™ve been circumventing the issue by keeping my PRIME Profile on ā€˜Intel (Power Saving Mode)ā€™ unless Iā€™m using some more graphics intensive software.

12-01-12_history.log (12.8 KB) nvidia-bug-report.log.gz (403.8 KB)

Same issue on a Lenovo IdeaPad L340 with NVIDIA GeForce GTX 1050, Ubuntu 20.04.2 LTS. Only happens when coming back from sleep mode.

Happens to me as well on a Lenovo Legion Y720 with Ubuntu 20.04 , driver 460 . If I have a second monitor attached via the HDMI port screen doesnā€™t stay blank. In fact if I remove the second monitor suspend still works until the laptop is rebooted without the second monitor. So if I want suspend to work I just need to connect a second monitor, then remove it. If I switch to my Intel card using prime-select this also works.

Iā€™ve rolled back to 450 drivers. But came back here to see whether a fix was found or developed by nvidia.
Now the new driver 465.24.02 notes discusses various regressions that are fixed. It is unclear however whether our problem is in that list. I donā€™t think it is, but there is the mention of a regression related to suspend behaviour. Is it worth testing these drivers? Or is it better to wait longer for a new production branch driver?