3070 Lenovo Legion S7 GPU driver issue

Hi,

I am experiencing an issue with my NVIDIA graphics card. I am seeing the following error message in my system logs (sudo dmesg):

[   55.345403] NVRM: GPU at PCI:0000:01:00: GPU-2e2f2161-51b2-3146-bc15-1c999a3437ce
[   55.345407] NVRM: Xid (PCI:0000:01:00): 79, pid=2088, GPU has fallen off the bus.
[   55.345409] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
[   55.345743] nvidia-modeset: ERROR: GPU:0: Failed detecting connected display devices
[   55.345823] nvidia-modeset: ERROR: GPU:0: Failed detecting connected display devices
[   55.345830] snd_hda_intel 0000:01:00.1: Unable to change power state from D3cold to D0, device inaccessible
[   55.785304] snd_hda_codec_hdmi hdaudioC0D0: Unable to sync register 0x4f0800. -5
[   55.785313] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD buf size -1
[   55.785316] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD buf size -1
[   55.785319] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD buf size -1
[   55.785321] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD buf size -1
[   55.785323] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD buf size -1
[   55.785325] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD buf size -1
[   55.785327] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD buf size -1
[   55.785329] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD buf size -1
[   56.138036] rfkill: input handler disabled
[   56.141203] audit: type=1400 audit(1671736455.500:56): apparmor="DENIED" operation="capable" profile="/snap/snapd/17883/usr/lib/snapd/snap-confine" pid=3051 comm="snap-confine" capability=12  capname="net_admin"
[   56.141211] audit: type=1400 audit(1671736455.500:57): apparmor="DENIED" operation="capable" profile="/snap/snapd/17883/usr/lib/snapd/snap-confine" pid=3051 comm="snap-confine" capability=38  capname="perfmon"
[   56.673239] nvidia-modeset: ERROR: GPU:0: Failed detecting connected display devices
[   56.673259] nvidia-modeset: ERROR: GPU:0: Failed detecting connected display devices
[   56.966847] audit: type=1326 audit(1671736456.328:58): auid=1000 uid=1000 gid=1000 ses=3 subj=snap.snap-store.ubuntu-software pid=3051 comm="snap-store" exe="/snap/snap-store/582/usr/bin/snap-store" sig=0 arch=c000003e syscall=314 compat=0 ip=0x7f86ce2ec73d code=0x50000
[   59.119662] audit: type=1400 audit(1671736458.479:59): apparmor="DENIED" operation="open" profile="snap.snap-store.ubuntu-software" name="/etc/PackageKit/Vendor.conf" pid=3051 comm="snap-store" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
[   59.292562] audit: type=1400 audit(1671736458.651:60): apparmor="DENIED" operation="open" profile="snap.snap-store.ubuntu-software" name="/etc/appstream.conf" pid=3051 comm="snap-store" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
[   62.377515] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
[   62.377574] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
[   62.377608] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
[   62.388305] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
[   62.388354] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
[   62.388392] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
[   62.399590] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
[   62.399637] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
[   62.399672] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
[   80.780734] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67d:0:0:0x0000000f
[   80.780740] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67e:1:0:0x0000000f
[   80.780743] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67e:3:0:0x0000000f
[   80.780745] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67e:5:0:0x0000000f
[   80.780748] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67e:7:0:0x0000000f

I am running Ubuntu 22.04.1 LTS.
My system specifications are:
CPU: 12th Gen Intel i7-12700H (20) @ 4.600GHz
GPU: Intel Alder Lake-P
GPU: NVIDIA GeForce RTX 3070 Mobile / Max-Q

I read that XID 79 could mean a thermal or PSU issue but the laptop feels cold to the touch and I got it 2 days ago so would be surprised if it it was broken.

I would greatly appreciate any help or suggestions on how to resolve this issue. Thank you in advance for your assistance.

Xid 79 on PSU or thermal issues only applies to desktop gpus, not mobile.
On notebooks, Xid 79 previously mostly meant a defective gpu (apart from some distinctive notebook models).
Unfortunately, a new issue arose on systems like yours, with 12th gen intel gpu. Something is broken in pcie power management with those (on some notebooks, not all). When the gpu reaches its lowest performance level for some time, the cpu seems to cut off power to the bus, thus killing the gpu.
On some notebook models, this is fixable by upgrading to kernel 5.19+, e.g. by using the liquorix ppa or upgrading to ubuntu 22.10. On others, this doesn’t help, only setting “max performance” in nvidia-settings as a power consuming work-around.
Please check for a bios update and upgrade kernel.

Wow thanks for the reply! So could it be that the GPU is now broken and I need to send it back for repair (under warranty). Or is it just that changing some settings could fix it?

I’ll try do the changes you suggested now!

I’ve got latest kernel (6.0.9-060009-generic) and latest bios.

The GPU did work for the first day I had it, then worked/started erroring the next time I turned it on, and now doesn’t seem to connect at all and isn’t showing up on lshw

If it’s now gone from lspci, it’s broken, warranty case.

ok, thank you for your help! - is there a guide i can follow for when it gets repaired so the same issue doesn’t happen again?

After doing a windows update the GPU is working again on the windows side, I completely removed ubuntu and windows and reinstalled and the GPU came back to life! I would much rather use ubuntu (I struggle using Windows now) how can I set it up so it doesn’t fail again?

Is there a guaranteed way to reinstall ubuntu without getting the same errors?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.