[Regression 460 series] Black screen on boot: nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer

Tried to switch to other driver versions in “Software and Updates” → “Additional drivers” - didn’t help (tried all available versions there). Also tried to install latest Nvidia driver from nvidea.com - didn’t help as well.

The only solution that worked for me, was to install nvidia-driver-450-server driver, but not from “Software and Updates” GUI, but from terminal, like that:

sudo apt purge nvidia-*
sudo apt install nvidia-driver-450-server

Any updates on this?

Happens to me too, driver 460 and 470, ubuntu 21.04, kernel 5.11.0-34-generic #36-Ubuntu SMP.

Changing driver to 460 and 450 with “Software and updates” didn’t solve it.

Needed to change it in the command line as @alex21975, but I also removed unused nvidia packages before reinstalling:

sudo apt purge nvidia-*
sudo apt autoremove
sudo apt install nvidia-driver-450-server

reboot

Is there any update on this ?

I have this issue with my Razer Blade 15 Advanced (NVIDIA GTX 1060 Max Q) running Ubuntu 20.04 LTS (470.57.02)

Can we get an update?

I’m also on Acer Aspire 7 with Nvidia GeForce GTX 1050, but with ubuntu 21.10.
I was able to solve it, as the way @alex21975 and @hdaniel mentioned, but with nvidia-driver-460-server:

sudo apt purge nvidia-*
sudo apt autoremove
sudo apt install nvidia-driver-460-server

reboot

Nowt the computer will not stuck on boot after suspend
but if I do
sudo service nvidia-suspend status
it will show
“Unit nvidia-suspend.service could not be found.”

Solving it with
NVIDIA Suspend fix
still shows me
"
nvidia-suspend.service - NVIDIA system suspend actions
Loaded: loaded (/etc/systemd/system/nvidia-suspend.service; enabled; vendor preset: enabled)
Active: inactive (dead)
"
and the logs shows

kernel: snd_hda_codec_hdmi hdaudioC1D0: Unable to sync register 0x7f0800. -5
kernel: snd_hda_intel 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible)

I’m suspecting this thread contains two different issues.
All users that, when running

sudo lspci -xxx -d 10de:*

get the audio device’s pci config space with all 0xFF, please try if this helps:

if not already mentioned in this thread.

1 Like

Method above to remove the nvidia audio device works for me.

Tried with 495 driver. Issue still persists unfortunately.

I found a kind of workaround. When the screen wakes from sleep (but goes black), use CTRL + ALT + F2 to switch to a terminal (terminal shows on the screen in a few seconds) and CTRL + ALT + F1 or F7 (depending on the system) to switch back to the graphical session. The screen will then work normally again (until the next time it goes to sleep).

The 495 driver seems to work, although this might be because I’ve tinkered around a lot when trying to fix previous driver versions. But I guess that it is worth trying the update. I do include some of my settings below, as those might be useful if the 495 driver is not working for you.

Getting the conformation on the installed driver.

user@device:~$ nvidia-smi
Tue Nov  9 09:58:24 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44       Driver Version: 495.44       CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro M1000M       Off  | 00000000:01:00.0  On |                  N/A |
| N/A   52C    P8    N/A /  N/A |    259MiB /  4043MiB |     22%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1320      G   /usr/lib/xorg/Xorg                158MiB |
|    0   N/A  N/A      2710      G   /usr/lib/xorg/Xorg                 97MiB |
+-----------------------------------------------------------------------------+

Getting some of my settings listed (I had changed some of these while trying to change the memory handling at hibernation), see above.

user@device:~$ cat /proc/driver/nvidia/params
ResmanDebugLevel: 4294967295
RmLogonRC: 1
ModifyDeviceFiles: 1
DeviceFileUID: 0
DeviceFileGID: 0
DeviceFileMode: 438
InitializeSystemMemoryAllocations: 1
UsePageAttributeTable: 4294967295
EnableMSI: 1
RegisterForACPIEvents: 1
EnablePCIeGen3: 0
MemoryPoolSize: 0
KMallocHeapMaxSize: 0
VMallocHeapMaxSize: 0
IgnoreMMIOCheck: 0
TCEBypassMode: 0
EnableStreamMemOPs: 0
EnableUserNUMAManagement: 1
NvLinkDisable: 0
RmProfilingAdminOnly: 1
PreserveVideoMemoryAllocations: 1
EnableS0ixPowerManagement: 0
S0ixPowerManagementVideoMemoryThreshold: 256
DynamicPowerManagement: 3
DynamicPowerManagementVideoMemoryThreshold: 200
RegisterPCIDriver: 1
EnablePCIERelaxedOrderingMode: 0
EnableGpuFirmware: 18
RegistryDwords: ""
RegistryDwordsPerDevice: ""
RmMsg: ""
GpuBlacklist: ""
TemporaryFilePath: "/tmp-nvidia"
ExcludedGpus: ""

I’ve also looked at the hibernate, suspend and resume services, which seem to be inactive but loaded.

user@device:~$ sudo service nvidia-suspend status
● nvidia-suspend.service - NVIDIA system suspend actions
     Loaded: loaded (/etc/systemd/system/nvidia-suspend.service; enabled; vendor preset: enabled)
     Active: inactive (dead)

nov 09 09:52:10 device systemd[1]: Starting NVIDIA system suspend actions...
nov 09 09:52:10 device suspend[6680]: nvidia-suspend.service
nov 09 09:52:10 device logger[6680]: <13>Nov  9 09:52:10 suspend: nvidia-suspend.service
nov 09 09:52:11 device systemd[1]: nvidia-suspend.service: Succeeded.
nov 09 09:52:11 device systemd[1]: Finished NVIDIA system suspend actions.
nov 09 09:53:06 device systemd[1]: Starting NVIDIA system suspend actions...
nov 09 09:53:06 device suspend[7975]: nvidia-suspend.service
nov 09 09:53:06 device logger[7975]: <13>Nov  9 09:53:06 suspend: nvidia-suspend.service
nov 09 09:53:07 device systemd[1]: nvidia-suspend.service: Succeeded.
nov 09 09:53:07 device systemd[1]: Finished NVIDIA system suspend actions.

user@device:~$ sudo service nvidia-hibernate status
● nvidia-hibernate.service - NVIDIA system hibernate actions
     Loaded: loaded (/etc/systemd/system/nvidia-hibernate.service; enabled; vendor preset: enabled)
     Active: inactive (dead)

user@device:~$ sudo service nvidia-resume status
● nvidia-resume.service - NVIDIA system resume actions
     Loaded: loaded (/etc/systemd/system/nvidia-resume.service; enabled; vendor preset: enabled)
     Active: inactive (dead)

nov 09 09:52:39 device systemd[1]: Starting NVIDIA system resume actions...
nov 09 09:52:39 device suspend[7377]: nvidia-resume.service
nov 09 09:52:39 device logger[7377]: <13>Nov  9 09:52:39 suspend: nvidia-resume.service
nov 09 09:52:39 device systemd[1]: nvidia-resume.service: Succeeded.
nov 09 09:52:39 device systemd[1]: Finished NVIDIA system resume actions.
nov 09 09:54:07 device systemd[1]: Starting NVIDIA system resume actions...
nov 09 09:54:07 device suspend[8614]: nvidia-resume.service
nov 09 09:54:07 device logger[8614]: <13>Nov  9 09:54:07 suspend: nvidia-resume.service
nov 09 09:54:07 device systemd[1]: nvidia-resume.service: Succeeded.
nov 09 09:54:07 device systemd[1]: Finished NVIDIA system resume actions.

@generix weirdly enough, it didn’t only fix the reboot problem, but even the audio device pci problem was fixed. The second device had all ffs listed before. (I only have a single graphics card, but ever since the start the M1000M gets also recognized as a 940MX).

user@device:~$  sudo lspci -xxx -d 10de:*
01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M1000M] (rev a2)
00: de 10 b1 13 07 04 10 00 a2 00 00 03 00 00 80 00
10: 00 00 00 e4 0c 00 00 a0 00 00 00 00 0c 00 00 b0
20: 00 00 00 00 01 30 00 00 00 00 00 00 3c 10 d4 80
30: 00 00 00 00 60 00 00 00 00 00 00 00 ff 01 00 00
40: 3c 10 d4 80 00 00 00 00 00 00 00 00 00 00 00 00
50: 01 00 00 00 01 00 00 00 ce d6 23 00 00 00 00 00
60: 01 68 03 00 08 00 00 00 05 78 81 00 38 0a e0 fe
70: 00 00 00 00 00 00 00 00 10 00 02 00 e1 8d 2c 01
80: 30 21 00 00 03 3d 46 00 43 01 01 11 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 13 00 04 00
a0: 00 00 00 00 0e 00 00 00 03 00 1f 00 00 00 00 00
b0: 00 00 00 00 09 00 14 01 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

01:00.1 Audio device: NVIDIA Corporation GM107 High Definition Audio Controller [GeForce 940MX] (rev a1)
00: de 10 bc 0f 06 00 10 00 a1 00 03 04 00 00 80 00
10: 00 00 00 e5 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 ff 02 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 ce d6 23 00 00 00 00 00
60: 01 68 03 00 08 00 00 00 05 78 80 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 10 00 02 00 e1 8d 2c 01
80: 30 29 00 00 03 3d 45 00 03 01 01 11 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 13 00 04 00
a0: 00 00 00 00 0e 00 00 00 00 00 01 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Here is my nvidia-bug-report.log.gz (456.0 KB).

The only difference with my configuration (except for hardware) are the PreserveVideoMemoryAllocations and TemporaryFilePath params so the issue probably only occurs when PreserveVideoMemoryAllocations is disabled (0).

Hmm that might very well be related. I tried setting these because they were recommended above by @generix. But when I tried it in the past, with the 465 drivers, this didn’t help. Could you try changing these setting?

Please try follow the Arch manual for the TemporaryFilePath memory allocation, and see if you can come up with the same results that I did. I’m interested to see whether this makes a difference or not.

(Sorry for the late reply. The forum system didn’t allow me to send a third message in this topic unil now.)
I expected it would solve the issue but unfortunately it didn’t (also I had to update the initramfs for the modprobe settings to be applied at boot). Also, either way, I don’t see any errors in the kernel logs anymore, so maybe I’m now running into a different issue. (I’ve had issues with displayport before, but HDMI worked fine until 460.) Maybe you could try disabling PreserveVideoMemoryAllocations to see if the issue pops up again?

Okay, this was surprising to me, disabling PreserveVideoMemoryAllocation did not give any issues. Hibernation works fine for ‘Nvidia-only’, ‘On-demand’ and the ‘Intel’ mode. I do agree that this might point out that you run into a different kind of issue.

Anyway, on my side there is still enough other, and probably also related issues. I’ll have a look at them in the future, but at least they are less frustrating.

  • On-Demand mode gets really slow when I look at an external screen only. The mouse moves fine, but any interaction with the application seems to have a significant delay (1 to 3 seconds). (Not related to hibernation at all, but it makes the On-demand mode kinda useless for me).
  • Intel mode has issues detecting external screens after resuming from hibernation.

As a result, I’ll have to use the Nvidia mode for now.

I also tested the 495 driver, it is not useful in my case, since it does not support my card anymore.

So all currently maintained nvidia drivers either fail on my card with black screen on boot, or don’t support my card anymore, and I am locked-in to older Xorg and kernel versions.
There’s nothing to configure in terms of power management, as my first post outlined, the issue happens on my machine on boot, not after suspend or anything (even though it looks similar to the issues reported for these cases).

I have tested the following versions, which yield a black screen after a turning the backlight on and off several times directly on boot:

  • 460.27.04
  • 460.32.03
  • 460.39
  • 460.56
  • 460.67
  • 460.91.03
  • 465.27

The following versions gave me an immediate black screen on boot:

  • 470.42.01
  • 470.94

All with the same error message (Failed to allocate display engine core DMA push buffer).

The last working version in my case is 455.45.01 which has several security issues by now and lacks support for any recent kernels and supported Xorg versions. Any update from nvidia on this bug is greatly appreciated.

Can you clarify whether this bug is the one affecting users after suspend only, or also about “black screen on cold boot” — or is this the same bug?
For those who get a black screen on boot in this thread from this regression, none of the currently supported drivers are usable anymore, and they are locked-in to nvidia drivers with security issues and lack of Xorg / kernel support.

It looks like I am also hitting this issue on desktop PC running Ubuntu 22.04 with NVIDIA driver 510.60.02 installed. Looking in the journal I see a spattering of the following messages that look related to the above:

Possible Related Log Entries

Apr 08 08:49:23 pkkid-desktop kernel: nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
Apr 08 08:49:23 pkkid-desktop kernel: nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer

Apr 08 08:50:06 pkkid-desktop kernel: x86/cpu: SGX disabled by BIOS.

Apr 08 08:50:06 pkkid-desktop kernel: tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x200] vs fed40080 f80
Apr 08 08:50:06 pkkid-desktop kernel: tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x200] vs fed40080 f80

Apr 08 08:50:09 pkkid-desktop gnome-session-binary[1858]: GLib-GIO-CRITICAL: g_bus_get_sync: assertion ‘error == NULL || *error == NULL’ failed
Apr 08 08:50:09 pkkid-desktop gnome-session-binary[1858]: GLib-GIO-CRITICAL: g_bus_get_sync: assertion ‘error == NULL || *error == NULL’ failed

Apr 08 08:50:15 pkkid-desktop gdm-password][3125]: gkr-pam: unable to locate daemon control file

Apr 08 08:50:16 pkkid-desktop systemd[3142]: app-gnome-gnome\x2dkeyring\x2dpkcs11-3345.scope: Failed to add PIDs to scope’s control group: No such process
Apr 08 08:50:16 pkkid-desktop systemd[3142]: app-gnome-gnome\x2dkeyring\x2dpkcs11-3345.scope: Failed with result ‘resources’.
Apr 08 08:50:16 pkkid-desktop systemd[3142]: Failed to start Application launched by gnome-session-binary.

Apr 08 08:50:18 pkkid-desktop gnome-session-binary[1858]: WARNING: Lost name on bus: org.gnome.SessionManager
Apr 08 08:50:18 pkkid-desktop gnome-session[1858]: gnome-session-binary[1858]: WARNING: Lost name on bus: org.gnome.SessionManager
Apr 08 08:50:18 pkkid-desktop gdm-launch-environment][1651]: pam_unix(gdm-launch-environment:session): session closed for user gdm
Apr 08 08:50:18 pkkid-desktop gdm-launch-environment][1651]: GLib-GObject: g_object_unref: assertion ‘G_IS_OBJECT (object)’ failed

Apr 08 08:50:18 pkkid-desktop kernel: [drm:nv_drm_master_set [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
Apr 08 08:50:31 pkkid-desktop kernel: [drm:nv_drm_master_set [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership

It looks like I should be trying to set the PreserveVideoMemoryAllocations=1 in /proc/driver/nvidia/params. I imaging editing this file directly is not how that is done. Can someone explain where I put this setting? I can report back if this fixes the blank screen when resuming from suspend.

@michael.shepanski You set it on the kernel command-line. For example I have the following in /etc/default/grub :

GRUB_CMDLINE_LINUX_DEFAULT="nvidia.NVreg_PreserveVideoMemoryAllocations=1"

Edit: the parameter should be on nvidia, not nvidia_modeset.