[Regression] 510.60.02 -> 510.73.05 : black screen on cold boot

Hello,

After upgrading NVIDIA driver from 510.60.02 to 510.73.05, my laptop ends up with a black screen on cold boot.
If I plug an external monitor before booting, I can get a picture on it but still black screen on the laptop screen.

The laptop has a discrete GTX 980M only (no hybrid graphics) and a G-Sync monitor, running Ubuntu 22.04. The driver is installed from Ubuntu’s deb packages.
Kernel : 5.15.0-30-generic (and 5.15.0-27-generic)

The black screen happens even in recovery mode (as soon as user has the input: black screen).

After downgrading back to 510.60.02, the laptop monitor lights up again.

Extract from kern.log:

May 18 07:28:46 thebat kernel: [  123.998588] nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
May 18 07:28:46 thebat kernel: [  123.999167] nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
May 18 07:29:03 thebat kernel: [  140.218673] nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
May 18 07:29:03 thebat kernel: [  140.219095] nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
May 18 07:29:19 thebat kernel: [  156.438696] nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
May 18 07:29:19 thebat kernel: [  156.438846] nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
May 18 07:29:35 thebat kernel: [  172.658781] nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
May 18 07:29:35 thebat kernel: [  172.659731] nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
May 18 07:29:51 thebat kernel: [  188.878788] nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
May 18 07:29:51 thebat kernel: [  188.879231] nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
May 18 07:30:07 thebat kernel: [  205.098852] nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
May 18 07:30:07 thebat kernel: [  205.099085] nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
May 18 07:30:18 thebat kernel: [  215.429448] ahci 0000:00:17.0: port does not support device sleep
May 18 07:30:24 thebat kernel: [  221.378791] nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
May 18 07:30:24 thebat kernel: [  221.379353] nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
May 18 07:30:40 thebat kernel: [  237.598812] nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
May 18 07:30:40 thebat kernel: [  237.599485] nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
May 18 07:30:56 thebat kernel: [  253.818836] nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
May 18 07:30:56 thebat kernel: [  253.818995] nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
May 18 07:31:06 thebat kernel: [  262.040478] BUG: kernel NULL pointer dereference, address: 0000000000000070
May 18 07:31:06 thebat kernel: [  262.040482] #PF: supervisor read access in kernel mode
May 18 07:31:06 thebat kernel: [  262.040483] #PF: error_code(0x0000) - not-present page
May 18 07:31:06 thebat kernel: [  262.040484] PGD 0 P4D 0 
May 18 07:31:06 thebat kernel: [  262.040487] Oops: 0000 [#1] SMP PTI
May 18 07:31:06 thebat kernel: [  262.040488] CPU: 5 PID: 1889 Comm: Xorg Tainted: P           OE     5.15.0-27-generic #28-Ubuntu
May 18 07:31:06 thebat kernel: [  262.040490] Hardware name: Notebook                         P7xxDM(-G)                      /P775DM(-G)                      , BIOS 1.05.09 12/28/2015
May 18 07:31:06 thebat kernel: [  262.040491] RIP: 0010:_nv002531kms+0x18/0x70 [nvidia_modeset]
May 18 07:31:06 thebat kernel: [  262.040516] Code: ff c6 44 24 2f 01 eb af 66 2e 0f 1f 84 00 00 00 00 00 41 54 55 49 89 fc 53 89 d5 41 b8 04 00 00 00 ba 02 01 02 00 48 83 ec 10 <8b> 46 70 8b 3d 6f 9b 0b 00 48 8d 4c 24 0c 89 ee 89 44 24 0c e8 6f
May 18 07:31:06 thebat kernel: [  262.040517] RSP: 0018:ffffac0d80d3fc98 EFLAGS: 00010282
May 18 07:31:06 thebat kernel: [  262.040519] RAX: 0000000000000000 RBX: 0000000020020000 RCX: ffff9d1fcc4b6d00
May 18 07:31:06 thebat kernel: [  262.040520] RDX: 0000000000020102 RSI: 0000000000000000 RDI: ffff9d1fd92c4008
May 18 07:31:06 thebat kernel: [  262.040521] RBP: 0000000000010009 R08: 0000000000000004 R09: 00000000fffffffe
May 18 07:31:06 thebat kernel: [  262.040523] R10: ffff9d1fc6f6b000 R11: 0000000000000001 R12: ffff9d1fd92c4008
May 18 07:31:06 thebat kernel: [  262.040524] R13: ffff9d1fd92c40a0 R14: 0000000000000fff R15: 0000000000010008
May 18 07:31:06 thebat kernel: [  262.040525] FS:  00007f129f8d4a80(0000) GS:ffff9d2766540000(0000) knlGS:0000000000000000
May 18 07:31:06 thebat kernel: [  262.040526] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 18 07:31:06 thebat kernel: [  262.040527] CR2: 0000000000000070 CR3: 0000000103d38002 CR4: 00000000003706e0
May 18 07:31:06 thebat kernel: [  262.040529] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 18 07:31:06 thebat kernel: [  262.040530] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
May 18 07:31:06 thebat kernel: [  262.040530] Call Trace:
May 18 07:31:06 thebat kernel: [  262.040532]  <TASK>
May 18 07:31:06 thebat kernel: [  262.040533]  ? _nv002530kms+0xb3/0x150 [nvidia_modeset]
May 18 07:31:06 thebat kernel: [  262.040553]  ? _nv002308kms+0x499/0x680 [nvidia_modeset]
May 18 07:31:06 thebat kernel: [  262.040572]  ? _nv000453kms+0xa0/0xa0 [nvidia_modeset]
May 18 07:31:06 thebat kernel: [  262.040585]  ? _copy_from_user+0x2e/0x60
May 18 07:31:06 thebat kernel: [  262.040589]  ? _nv000453kms+0xa0/0xa0 [nvidia_modeset]
May 18 07:31:06 thebat kernel: [  262.040601]  ? _nv000637kms+0x34/0x50 [nvidia_modeset]
May 18 07:31:06 thebat kernel: [  262.040614]  ? nvKmsIoctl+0x96/0x1d0 [nvidia_modeset]
May 18 07:31:06 thebat kernel: [  262.040627]  ? nvkms_ioctl+0x104/0x170 [nvidia_modeset]
May 18 07:31:06 thebat kernel: [  262.040639]  ? nvidia_frontend_unlocked_ioctl+0x58/0x90 [nvidia]
May 18 07:31:06 thebat kernel: [  262.040832]  ? __x64_sys_ioctl+0x91/0xc0
May 18 07:31:06 thebat kernel: [  262.040835]  ? do_syscall_64+0x5c/0xc0
May 18 07:31:06 thebat kernel: [  262.040838]  ? syscall_exit_to_user_mode+0x27/0x50
May 18 07:31:06 thebat kernel: [  262.040840]  ? do_syscall_64+0x69/0xc0
May 18 07:31:06 thebat kernel: [  262.040842]  ? asm_exc_page_fault+0x8/0x30
May 18 07:31:06 thebat kernel: [  262.040844]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
May 18 07:31:06 thebat kernel: [  262.040846]  </TASK>
May 18 07:31:06 thebat kernel: [  262.040847] Modules linked in: vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) cmac algif_hash algif_skcipher af_alg nvme_fabrics ccm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nvidia_uvm(POE) nfnetlink snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi bnep snd_hda_codec nvidia_drm(POE) intel_rapl_msr snd_hda_core intel_rapl_common snd_hwdep intel_tcc_cooling x86_pkg_temp_thermal nvidia_modeset(POE) snd_pcm intel_powerclamp coretemp uvcvideo nvidia(POE) kvm_intel videobuf2_vmalloc videobuf2_memops snd_seq_midi videobuf2_v4l2 clevo_xsm_wmi(OE) binfmt_misc kvm snd_seq_midi_event videobuf2_common snd_rawmidi rapl btusb btrtl btbcm
May 18 07:31:06 thebat kernel: [  262.040874]  btintel snd_seq bluetooth iwlmvm videodev ecdh_generic mc ecc intel_cstate mac80211 snd_seq_device drm_kms_helper nls_iso8859_1 cec joydev snd_timer libarc4 rc_core iwlwifi input_leds fb_sys_fops efi_pstore snd syscopyarea mei_me serio_raw sysfillrect cfg80211 intel_wmi_thunderbolt mxm_wmi wmi_bmof ee1004 sysimgblt soundcore mei intel_pch_thermal mac_hid acpi_pad sch_fq_codel ipmi_devintf ipmi_msghandler msr parport_pc ppdev lp drm parport ip_tables x_tables autofs4 dm_crypt raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ahci alx nvme sdhci_pci xhci_pci i2c_i801 cqhci psmouse nvme_core i2c_smbus sdhci libahci mdio xhci_pci_renesas wmi video
May 18 07:31:06 thebat kernel: [  262.040911] CR2: 0000000000000070
May 18 07:31:06 thebat kernel: [  262.040913] ---[ end trace eb9c0106367e10a0 ]---
May 18 07:31:06 thebat kernel: [  263.367391] RIP: 0010:_nv002531kms+0x18/0x70 [nvidia_modeset]
May 18 07:31:06 thebat kernel: [  263.367417] Code: ff c6 44 24 2f 01 eb af 66 2e 0f 1f 84 00 00 00 00 00 41 54 55 49 89 fc 53 89 d5 41 b8 04 00 00 00 ba 02 01 02 00 48 83 ec 10 <8b> 46 70 8b 3d 6f 9b 0b 00 48 8d 4c 24 0c 89 ee 89 44 24 0c e8 6f
May 18 07:31:06 thebat kernel: [  263.367419] RSP: 0018:ffffac0d80d3fc98 EFLAGS: 00010282
May 18 07:31:06 thebat kernel: [  263.367421] RAX: 0000000000000000 RBX: 0000000020020000 RCX: ffff9d1fcc4b6d00
May 18 07:31:06 thebat kernel: [  263.367422] RDX: 0000000000020102 RSI: 0000000000000000 RDI: ffff9d1fd92c4008
May 18 07:31:06 thebat kernel: [  263.367423] RBP: 0000000000010009 R08: 0000000000000004 R09: 00000000fffffffe
May 18 07:31:06 thebat kernel: [  263.367424] R10: ffff9d1fc6f6b000 R11: 0000000000000001 R12: ffff9d1fd92c4008
May 18 07:31:06 thebat kernel: [  263.367425] R13: ffff9d1fd92c40a0 R14: 0000000000000fff R15: 0000000000010008
May 18 07:31:06 thebat kernel: [  263.367426] FS:  00007f129f8d4a80(0000) GS:ffff9d2766540000(0000) knlGS:0000000000000000
May 18 07:31:06 thebat kernel: [  263.367428] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 18 07:31:06 thebat kernel: [  263.367429] CR2: 0000000000000070 CR3: 0000000103d38002 CR4: 00000000003706e0
May 18 07:31:06 thebat kernel: [  263.367430] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 18 07:31:06 thebat kernel: [  263.367431] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

nvidia-bug-report-510.73.05.log.gz (329.8 KB)

In the logs from the bug-report, I’m not sure the output errors (kern.log) are the same when I run with an external monitor than without. However, it’s hard to make a bug-report without any screen… Nevertheless, the errors when running without an external screen are recorded on may 18 (like the extract given above).

Regards,
David.

The issue seems to be gone after installing 515.48.07.