NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus - HP Studio G5

Fresh install of Ubuntu 19 keeps crashing with the message

NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus.

appearing in the logs.

$ lspci | grep -E "VGA|3D" 
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile)
01:00.0 VGA compatible controller: NVIDIA Corporation GP107GLM [Quadro P1000 Mobile] (rev a1)

Currently running nvidia-driver-418. When i am choosing to use the Intel GPU then the system does not crash but i would like to use an external monitor as well so I am trying to make the Nvidia work.

When in prime i am switching to use the Nvidia GPU the system will eventually crash after some random time.

I have tried the instructions from here, because i could not use the external monitor after the first install.

Additionally I have tried setting the GPU in persistent mode by

sudo nvidia-smi -pm 1

and also setting the grub option

pcie_aspm=off

which did nothing at the end.

i am unsure on how to proceed and any information would be helpful.

EDIT: I have dual boot with Windows and in Windows everything works fine without any problem!
nvidia-bug-report.log.gz (95.4 KB)


nvidiatemp.log (36.5 KB)
NEWnvidia-bug-report.log.gz (1.1 MB)

Some reasons for an XID 79 on a notebook are bios problems, overheating or defective gpu. Please check for a bios update, then check if it works when using Windows, otherwise RMA.

thank you for your answer. I have to say that i am using dual boot with windows and on windows everything is fine without the slightest problem.

On Windows, the nvidia gpu is normally off, did you check running e.g. an Unigine demo explicitly on the nvidia gpu to test?
Otherwise, please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post of yours will reveal a paperclip icon.
https://devtalk.nvidia.com/default/topic/1043347/announcements/attaching-files-to-forum-topics-posts/

I have attached to the first post the results from running Unigine using Windows. Everything works fine there and the external monitor works properly as well.

I have also attached the result from running the nvidia-bur-report.sh tool.

Thank you for your time in helping me.

There’s a bios update available, you should consider trying that.
The gpu falling off the bus seems to happen rather spontaneously, maybe this model is a dual fan design and there’s a problem with the second fan in Linux. Please create a temperature log using

nvidia-smi -q -l 2 -d TEMPERATURE -f nvidiatemp.log

while switch to nvidia profile until it crashes. After reboot, please attach the file.

i have updated the BIOS to the latest version and after it crashed i generated the nvidiatemp.log file as you specified above and I have attached it to the initial post.

Temperatures are fine, so it isn’t overheating, either.
Please check if the latest driver changes anything by adding the Ubuntu graphics ppa and installing the 430 driver from that.
Also, please create a new nvidia-bug-report.log while switched to the nvidia profile and the gpu still running so I can see the gpu’s settings.

I have just switched to driver 430 and attached to the first post the new logs(NEWnvidia-bug-report.log.gz) while on the nvidia profile. System is running for 5 minutes using an external monitor as well and still stable, will update further later on.

Nothing changed with the driver 430. System keeps crashing.

Maybe this is some kind of power problem. Please try limiting the clocks using

sudo nvidia-smi -i 0 --lock-gpu-clocks=139,1200

and check if the system becomes stable.

while running on the nvidia profile, i run the command above, the result in the terminal was

Setting locked Gpu clocks is not supported for GPU 00000000:01:00.0.
Treating as warning and moving on.
All done.

Also i noticed today and yesterday that while the laptop was charging, before it reaches full charge I had no problem. But after it is fully charged problem is the same as when not charging.

Unfortunate that your gpu does not support setting clocks.
Is it possible to remove the battery of the notebook?

No it is not removable :(

I am still unable to solve this :(

This seems to be a design flaw of that notebook model, don’t know if HP tweaked the Windows drivers to account for that but the generic Linux drivers just seem to need too much power.
Don’t know if HP supports Linux on that model, you should revert to them and see if you get some support or a replacement.

Hello.
I have Dell G3 notebook with hybrid video intel+nvidia with OpenSUSE Tumbleweed on board. And i have all that troubles with nvidia card. After some minutes after usage nvidia card it fallen from bus. In my case i have found solution. I’ve disable “c-states cotrol” in BIOS. And my nvidia stoped fallen.

Thank you for your answer, unfortunately I do not have such an option as “c-states control” in the bios

I got same problem on my clevo p775dm3 Bios upgraded.
Blank screen, cpu overheats (but not the gpu), i have toSysRq/Reboot .

lspci | grep -E "VGA|3D" 
01:00.0 VGA compatible controller: NVIDIA Corporation GP104M [GeForce GTX 1080] (rev a1)

I’m afraid of hardware problem.
Here’s the log.

Sep 29 19:55:01 mat63 kernel: [  383.345267] NVRM: GPU at PCI:0000:01:00: GPU-c3d61cfc-7fcd-01b4-8cc2-7e5a58666a05
Sep 29 19:55:01 mat63 kernel: [  383.345271] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x50ca24=0xbadf1401 0x50ca28=0xbadf1401 0x50ca2c=0xbadf1401 0x50ca34=0xbadf1
401
Sep 29 19:55:01 mat63 kernel: [  383.345295] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x50ca24=0xbadf1401 0x50ca28=0xbadf1401 0x50ca2c=0xbadf1401 0x50ca34=0xbadf1
401
Sep 29 19:55:01 mat63 kernel: [  383.345301] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception on (GPC 1, TPC 1): PIF_ERR
Sep 29 19:55:01 mat63 kernel: [  383.345303] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x50c884=0xbadf1401
Sep 29 19:55:01 mat63 kernel: [  383.345739] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ChID 0018, Class 0000c197, Offset 00001614, Data 00000000
Sep 29 19:55:01 mat63 kernel: [  383.437936] NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus.
Sep 29 19:55:01 mat63 kernel: [  383.437939] NVRM: GPU at 00000000:01:00.0 has fallen off the bus.
Sep 29 19:55:01 mat63 kernel: [  383.438656] NVRM: A GPU crash dump has been created. If possible, please run
Sep 29 19:55:01 mat63 kernel: [  383.438656] NVRM: nvidia-bug-report.sh as root to collect this data before
Sep 29 19:55:01 mat63 kernel: [  383.438656] NVRM: the NVIDIA kernel module is unloaded.
Sep 29 19:55:10 mat63 kernel: [  392.404813] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
Sep 29 19:55:10 mat63 kernel: [  392.404885] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
Sep 29 19:55:16 mat63 NetworkManager[651]: <info>  [1569779716.6477] manager: sleep: sleep requested (sleeping: no  enabled: yes)
Sep 29 19:55:16 mat63 NetworkManager[651]: <info>  [1569779716.6478] device (enp110s0): state change: unavailable -> unmanaged (reason 'sleeping', sys-iface-state: 'managed')
Sep 29 19:55:16 mat63 NetworkManager[651]: <info>  [1569779716.8840] manager: NetworkManager state is now ASLEEP
Sep 29 19:55:16 mat63 systemd[1]: Reached target Sleep.
Sep 29 19:55:16 mat63 systemd[1]: Starting Suspend...
Sep 29 19:55:16 mat63 systemd-sleep[7379]: Suspending system...
Sep 29 19:55:16 mat63 kernel: [  398.351586] PM: suspend entry (deep)
Sep 29 19:55:16 mat63 kernel: [  398.351587] PM: Syncing filesystems ... done.
Sep 29 19:55:18 mat63 kernel: [  400.406659] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000987d:0:0:0x0000000f
Sep 29 19:55:18 mat63 kernel: [  400.406662] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000917e:0:0:0x0000000f
Sep 29 19:55:18 mat63 kernel: [  400.406664] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
Sep 29 19:55:18 mat63 kernel: [  400.406666] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000917e:1:0:0x0000000f
Sep 29 19:55:18 mat63 kernel: [  400.406668] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:1:0:0x0000000f
Sep 29 19:55:18 mat63 kernel: [  400.406669] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000917e:2:0:0x0000000f
Sep 29 19:55:18 mat63 kernel: [  400.406671] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:2:0:0x0000000f
Sep 29 19:55:18 mat63 kernel: [  400.406673] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000917e:3:0:0x0000000f
Sep 29 19:55:18 mat63 kernel: [  400.406675] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:3:0:0x0000000f
Sep 29 19:55:18 mat63 kernel: [  400.406757] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000987d:0:0:0x0000000f
Sep 29 19:55:18 mat63 kernel: [  400.406769] nvidia-modeset: ERROR: GPU:0: Deactivating G-SYNC failed
Sep 29 19:55:18 mat63 kernel: [  400.406771] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000987d:0:0:0x0000000f

Any help is appreciated
nvidia-bug-report.log.gz (849 KB)