When I try to run some more demanding games or demos, computers hangs up with “GPU fallen off the bus” error in kernel log. Especially Total War Warhammer from Steam hangs every time I try to run gameplay. Unigine Valley hangs after some time. Other games are hanging at random. Less demanding games, like Minecraft, are fine.
My machine is Eurocom Sky X7E2 laptop with GTX 1080. I have tried to open nvidia settings and xsensors with various temperatures when running games or demos. It seems, that cooling is not the problem here because it hangs with GPU temperature under or at 70 degrees Celsius.
I have created nvidia-bug-report through ssh after hang (it was possible, the system lives under crashed graphics). nvidia-bug-report.log.gz (295 KB)
Which game/demo did you run before creating the bug report?
You already ruled out temperature problems from CPU/GPU. XID 79 might also result from insufficient power. According to a review, Eurocom designed the PSU (330W) to the maximum power draw of the system without reserves. So if the PSU is heating up efficiency goes down so it might not be able to support the gpu on power peaks. It’s just a guess but maybe clean it and make sure it gets enough airflow.
Then there are also XID 16 visible in your logs from previous boots, did those come from TW:WH or does that game also result in XID 79?
The laptop is almost new (bought less than three months ago). PSU is clean and has some moderate temperature. Laptop ventilation is also clean. I have a temperature of about 64 degrees celsius on pch_skylake sensor, however. I don’t know, if it is fine there.
XID 16 is not from any game, the problem with TW:WH is related to XID 79. I don’t know what caused that XID 16s.
I forgot to mention that I’am using it mostly for Blender Cycles rendering and it works fine. Maybe one strange hang in these three months, but I don’t remember it exactly. And it puts heavy load on GPU as well. One difference may be that there are no big data transfers during heavy load, whole scene is loaded into GPU memory before render starts.
Yes, I think it’s unrelated.
One thing you could try about the XID 79 would be to set powermizer from adaptive to maximum to see if switching states is part of the problem.
I have tried it, but it behaves strange. Nvidia settings are showing me maximal performance level in all cases (level 4). But fans are quiet without heavy load and become noisy as I run something on GPU, as expected. If I try to change mode to maximum performace, everything behaves it the same way. If I close settings a reopen them “Auto” mode is there. Even if I try to “Save Current Configuration”
It seems, that those settings aren’t “wired” to real GPU. And I have another problem, backlight control does not work for me. With similar symptoms: I can change backlight level from GUI, but it does not change the real backlight. Is possible that both problems are caused by some BIOS bugs? My local Eurocom seller wrote me, that for Windows the problem with backlight can be solved by some BIOS upgrade.
UPDATE> Tried to install Win10 and check 3D things there. Everything (TW:WH, Furmark, 3DMark) works there without any problem. So hardware is probably healthy and the problem is somewhere in Linux driver.
I think we are having the same issue. My post is at:
We are playing different games, but both are experiencing the same problem. This actually reminds me a lot about a problem with nvidia and Linux back in 2013/14. Although not the exact same problem, it also affected laptops and the error message was the exact same (GPU has fallen off the bus).
I hope someone from Nvidia can have a look at both your and mine bugreports.
As I mentioned in my post, I didn’t have this problem a few months back. So I am suspecting it might be a kernel update that has exposed this issue. In the coming days, I will see if I have the time to test older kernels to see if the problem appears there as well.
So it was a faulty GPU in my case as well. One year ago it ended up as “problem with Linux, unable to reproduce in Windows”. On Christmas holiday I had tried to play some game under Windows and experienced the same problem (at last, I was unable to reproduce it under windows before). Last week I received a replacement GPU and now it works without any problems.