GTX 1070 "GPU has fallen off the bus" running 3D games in Arch Linux

soseihin · January 1, 2017, 12:04am

I’ve been trying to troubleshoot 3d games resulting in the GPU falling off the bus. I’ve run out of avenues to explore and am looking for any other suggestions of what I should look into before deciding to call this a hardware problem and pursue an RMA.

Dmesg output:
[ 189.427267] NVRM: GPU at PCI:0000:01:00: GPU-73236338-bf17-442f-b881-d785485aa3bf
[ 189.427287] NVRM: GPU Board Serial Number:
[ 189.427290] NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus.

[ 189.427296] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
[ 189.427312] NVRM: GPU is on Board .
[ 189.427325] NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
[ 204.377661] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
[ 204.378782] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
[ 204.379516] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
[ 204.380177] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f

Background details:
Eurocom Toronado F5 (MSI 16L13), i7-6700 cpu, GTX1070 gpu

hotbox% uname -a
Linux hotbox 4.8.13-1-ARCH #1 SMP PREEMPT Fri Dec 9 07:24:34 CET 2016 x86_64 GNU/Linux

hotbox% pacman -Ss nvidia | grep installed
extra/libvdpau 1.1.1-2 [installed]
extra/libxnvctrl 375.26-1 [installed]
extra/nvidia 375.26-1 [installed]
extra/nvidia-libgl 375.26-2 [installed]
extra/nvidia-settings 375.26-1 [installed]
extra/nvidia-utils 375.26-2 [installed]
multilib/lib32-nvidia-libgl 375.26-2 [installed]
multilib/lib32-nvidia-utils 375.26-2 [installed]

hotbox% lsmod | grep nvidia
nvidia_drm 49152 1
nvidia_modeset 782336 4 nvidia_drm
nvidia 11870208 65 nvidia_modeset
drm_kms_helper 126976 1 nvidia_drm
drm 294912 4 nvidia_drm,drm_kms_helper

Symptoms:
Running 3d games inevitably causes the gpu to fall off the bus, resulting in a blackscreen and the inability to use directly connected input devices (keyboard, mouse). Any background music continues to play. GPU temps remain between 40 and 60.

Running “The Long Dark” through the native Linux Steam client allows playability while remaining in interior locations. A crash will typically occur within a few minutes of entering an outside location, though on one occasion I was able to start a new game and play for roughly an hour.

Running “Insurgency” through Steam crashes shortly after the map has finished loading, though again there was an occasion where I was able to play longer.

When I run “Drunken Robot Pornography” or “Ziggurat” through Steam and “Mass Effect” through WINE, I get substanially longer game play - up to several hours on a stretch in “Mass Effect.”

I have yet to experience a crash in a 2d game, but haven’t put a lot of time into testing them. Day to day work with office tools, web browsing and media playback are all fine.

Troubleshooting steps:
I am able to start an SSH session, which I’ve used to collect the nvidia bug report and output of dmesg, journalctl -xe and Xorg.0.log immediately after a crash. (All should be attached)

After a crash nvidia-smi -r reports that the gpu is unable to be restarted and the system must be rebooted.

Using the Nvidia Settings utility to set perfomance to maximum and nvidia-smi to toggle persistance mode on/off has not made a difference. It appears I am unable to turn off ECC mode for testing purposes.

Previous logs mentioned ‘irq 16: nobody cared (try booting with the “irqpoll” option)’ immediately before the crash. Adding the irqpoll option as suggested continues to result in the crash and yeilds lots of messages about hpet losing large amounts of rtc interupts leading up to and after the crash. Adding the hpet=disable option fixes them, but still doesn’t solve the problem.

Nouveau seems to work, but yeilds one frame per second in (admittedly not comprehensive) testing so it’s not a feasible solution.

I found the following thread reporting very similar hardware and symptoms:
https://devtalk.nvidia.com/default/topic/984339/linux/gtx-1070m-on-clevo-p650rs-falling-off-the-bus/

It made the most sense for me to start a new thread, but perhaps the similarities warrant a merge.

Thank you for any help you can offer.
nvidia-bug-report.log.gz (269 KB)
dmesg.txt (89.4 KB)
journalctl.txt (97.8 KB)
xorgLog.txt (31.8 KB)

sandipt · January 2, 2017, 8:13am

Please share output of dmidecode command. Are you using steam client to play games? Make sure there is no any thermal or power issue to GPU and System. What desktop env you are running KDe, Gnome or else?

soseihin · January 2, 2017, 7:37pm

Hi Sandip, thanks for your direction.

Output of dmidecode should be attached.

I’ve tried running games both with and without the Steam client. The problem is reproduceable both ways. If you need, I can supply the crash logs from running a game without the Steam client.

I’m certain there isn’t a themal issue as I’ve monitored temps leading up to the GPU falling off the bus. Power should also be okay. Both PSU and battery are new. ACPI reports battery was last charged to 98% of capacity.

I believe I’ve ruled out the DE as a source of the problem. I typically use Budgie (Gnome). I’ve tested with Gnome Shell, LXQT and TWM. The problem remains reproduceable in all three desktop environments, as well as with a window manager only.
dmidecode.txt (24.7 KB)

Wild_Penguin · January 2, 2017, 7:57pm

Hi soseihin,

I had a very similar problem on my setup, which I ultimately determined to be caused by malfunctioning hardware. More specifically, I froze all system updates (Kernel, NVidia drivers and software - i.e. I ran no pacman) - and the issue went away be re-seating my graphics card and RAM. For details, see this thread.

There was some other user with a similar problem, though I don’t know if he ever got the problem fixed.

I don’t think this makes certain that it is certainly a HW issue in your case - but just my 2 cents…

soseihin · January 2, 2017, 8:56pm

Thanks Wild_Penguin. I’d love to know if Rattlewrench ever figured out his issue in that second post you linked me to. I think I recall having discovered your post fairly early in my troubleshooting. Your post, along with a few I found in the Arch forums seemed to point to the possibility of it being a hardware problem, which I increaseingly believe it is. But it’s still entirely likely that I’m over looking something obvious.

generix · January 3, 2017, 6:04pm

Looks like some MB issue, have a look at the official owner’s forum:
[url]TechnologyGuide - TechTarget
Maybe hook up with those people as one has RMA’d twice and he only got a new gpu which didn’t solve the problem.

soseihin · January 3, 2017, 11:41pm

Thanks, generix. Yeah, it looks like several people are having issues with the same hardware I am. I’ve started talking to Eurocom support about how to proceed. I’ll update this thread with a solution should they provide one. In the meantime I would still very much appreciate any ideas or information this community has to offer, and thank you all again for your help so far.

generix · January 7, 2017, 7:30pm

Might be interesting to have an output from nvidia-smi while the GPU is still working to see if autoboost is available and enabled. Then maybe disable it and see if the GPU is still falling off the bus.
AFAIK there’s no way to set the GPU to minimum performance thus limiting maximum power draw. There’s a feature request somewhere though.

Edit: seems frequency manipulation can be achieved using CoolBits:
[url]Linux Hardware Reviews & Performance Benchmarks, Open-Source News - Phoronix

soseihin · January 8, 2017, 6:22pm

Thanks for the new ideas, nvidia-smi gives me this report:

hotbox% nvidia-smi --auto-boost-default=1
Enabling/disabling default auto boosted clocks is not supported for GPU: 0000:01:00.0.
Treating as warning and moving on.
All done.

I tried to follow the directions outlined by phoronix, but I don’t use a xorg.conf file becaue it always seems to break X for me. So unfortunetaly I until I get a working xorg.conf I’m unable to add the coolbits option to my X config.

I went ahead and dumped the output of nvidia-smi -qi into the attached file, incase you care to take a look at it.

A small update:
Eurocom got back to me, suggesting that the latest nvidia drivers are buggy and to use the drivers they have available for download. Unfortunately they only provide the windows driver, but according to a post I found in the notebook review forum you had previously directed me to, they’re using 368.79 drivers. I found 367.XX and 370.XX drivers are still available for download from nvidia, so I’ll try those out and report back.
nvidia-smi-qi.txt (6.09 KB)

soseihin · January 8, 2017, 9:35pm

Thanks again for the advice. I managed to set up a functioning xorg.conf and enabled coolbits. Unfortunately underclocking doesn’t appear to make a difference, symptoms and error logs still remain the same.

I was unable to get the 367.44 driver to build, it fails with:

...

/tmp/selfgz647/NVIDIA-Linux-x86_64-367.44/kernel/nvidia-drm/nvidia-drm-modeset.c: In function ‘nvidia_drm_atomic_commit’:
/tmp/selfgz647/NVIDIA-Linux-x86_64-367.44/kernel/nvidia-drm/nvidia-drm-modeset.c:678:34: error: passing argument 1 of ‘drm_atomic_helper_swap_state’ from incompatible pointer type [-Werror=incompatible-pointer-types]
     drm_atomic_helper_swap_state(dev, state);
                                  ^~~
In file included from /tmp/selfgz647/NVIDIA-Linux-x86_64-367.44/kernel/nvidia-drm/nvidia-drm-modeset.c:37:0:
./include/drm/drm_atomic_helper.h:75:6: note: expected ‘struct drm_atomic_state *’ but argument is of type ‘struct drm_device *’
 void drm_atomic_helper_swap_state(struct drm_atomic_state *state,
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
  LD [M]  /tmp/selfgz647/NVIDIA-Linux-x86_64-367.44/kernel/nvidia-modeset.o
cc1: some warnings being treated as errors
make[2]: *** [scripts/Makefile.build:289: /tmp/selfgz647/NVIDIA-Linux-x86_64-367.44/kernel/nvidia-drm/nvidia-drm-modeset.o] Error 1
  LD [M]  /tmp/selfgz647/NVIDIA-Linux-x86_64-367.44/kernel/nvidia-uvm.o
make[2]: Target '__build' not remade because of errors.
make[1]: *** [Makefile:1473: _module_/tmp/selfgz647/NVIDIA-Linux-x86_64-367.44/kernel] Error 2
make[1]: Target 'modules' not remade because of errors.
make[1]: Leaving directory '/usr/lib/modules/4.8.13-1-ARCH/build'
make: *** [Makefile:81: modules] Error 2
ERROR: The nvidia kernel module was not created.
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

I did, however, get the 370.28 driver to install though it also fails in the exact same way, with the exact same errors. The error log is attached. I’ll report all this new information to Eurocom and see what they have to say and thanks again for your continued assistance.
nvidia-bug-report.txt (562 KB)

soseihin · February 7, 2017, 8:28am

To avoid needless shipping fees and to rule out a Linux/driver problem entirely, I installed Windows 10. After several days of testing I seem to be able to run 3D games without any problems. Not sure where to go from here as it doesn’t appear to be a hardware issue after all.

sandipt · February 7, 2017, 11:15am

Hi soseihin, We would like to reproduce this issue internally to debug further.
Could you please provide reproduction steps in step-by-steps manner?
You mentioned multiple games crash issue. So for every game you are getting same error in log/dmesg ?
Can you provide repro steps for one or two games?
How long need to play play?
Is the issue repro on specific MAP in game?
What action trigger this issue? Is all game patch/updates applied?
Please provide crash dump or backtrace when game crashes?
Did you see this issue on any other OS like Ubuntu/Fedora ?
did you test with 378.09 driver?
Any older driver help you to resolve this issue?
Any customer setting done in steam or game?
What is the resolution of game and display?
I think you OS in uefi/efi mode, Please share o/p of dmidecode command?

Please provide as much as info about you hardware/software setup and repro details that will help to replicate exact same environment here to try reproduction of this issue.

Nerdknight · March 18, 2017, 5:15am

I have the same problem with an EVGA gtx1080 on ubuntu 16.10. I discard a hardware problem too, so far this problem only happens to me when playing XCOM2 an Victor Vran, I played other games like Deus Ex: Mankind divided and Total War: Warhammer for hours without problems. I tested this with 375.39 and 378.13 drivers.

volker_holthaus · September 12, 2017, 6:35am

I have the same problem with the GTX1080 on Arch Linux. I use the 384.69 driver version and the problem appears after some minute in XCOM2/Alien Isolation. XCOM works fine.

disnel · February 18, 2018, 9:39am

I’am experiencing the same problem with Eurocom Sky X7E2 and GTX 1080. I have tried drivers 384.111, 387.34 and 390.25 (all available from Ubuntu repositories), it made no difference. I have problems with Unigine Valley, Total War Warhammer, War for the Overlord, virtually all more demanding games I’ve tried. Less demanding ones are fine (Minecraft). Surprisingly, Unigine Superposition benchmark finished wihtout problem. But it may be coincidence only.

Output of dmidecode and nvidia-bug-report attached. Both created after system restart. I probably can create them before restart through ssh, if necessary.
dmidecode.txt (15.8 KB)
nvidia-bug-report.log.gz (154 KB)

lukasz.tolwinski · March 19, 2020, 7:45am

I have the same problem, after 5-8 minutes of gaming, NVIDIA falls from the bus :/
it happened in 4 different games so far.

Dell XPS 15 7590 (GTX 1650), newest drivers: nvidia-driver-435, ubuntu 18.04 LTS

Topic		Replies	Views
Reproducible: NVRM: GPU at 0000:01:00.0 has fallen off the bus. -- Both screens black, Xorg at 100% Linux	24	50987	December 16, 2015
NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus - HP Studio G5 Linux	39	10615	March 18, 2025
GPU has fallen off the bus issues on daily basis (RTX 4090) Linux pcie , cuda , ubuntu , rtx	8	1158	December 12, 2024
GTX 1070M on Clevo P650RS (Sager NP8153-S) Falling off the bus Linux	9	3706	February 20, 2017
"GPU has fallen off the bus" on GTX 1070 Linux	38	24093	April 5, 2021
Crash on RTX 6000 Ada on Ubuntu 24.04 "GPU has fallen off the bus" Linux llama	8	167	March 14, 2025
GPU has fallen off the bus - GTX 1070 - nvidia-gfxG04-kmp-default-390.87 [Solved - dead GPU] Linux	9	1703	October 4, 2018
"GPU has fallen off the bus" while idle, only occurs when all displays powered off Linux	15	7823	March 15, 2025
Problems with Nvidia gforce 1070 max q Linux	18	1435	December 25, 2022
GPU has fallen off the bus \| GPU crashes after a while under load (ie. playing games) Linux	22	8185	October 14, 2021

GTX 1070 "GPU has fallen off the bus" running 3D games in Arch Linux

Related topics