Xid issues on Ubuntu 16.04 - Kernel 4.17 and 4.17.1, Nvidia 390 and 396 BETA - GTX 1070

Hi All,

I have very strange issues after upgrading the kernel to 4.17.

The issue start happens after installing latest drivers 390 and new kernel 4.17. Everything works normally until I add Memory or Core clocks.

After that my Linux has been restarted each time when I changed clocks.

[   68.932599] NVRM: Xid (PCI:0000:05:00): 13, Graphics SM Warp Exception on (GPC 1, TPC 0): Misaligned Address
[   68.932604] NVRM: Xid (PCI:0000:05:00): 13, Graphics Exception: ESR 0x50c648=0x14000f 0x50c650=0x0 0x50c644=0xd3eff2 0x50c64c=0x17f
[   68.952118] NVRM: Xid (PCI:0000:05:00): 43, Ch 00000010, engmask 00000101

I tried with beta drivers but the issue is same.
The only difference and weird thing are that now I’m getting error Xid are 13 and 43, but before installing beta drivers (396.24.02) drivers Xid error was only 31 at drivers 390 stable.

nvidia-bug-report.log.gz (119 KB)

Use maximum
Coolbits 28
Bit 1 is undefined, bit 2 tries to force SLI, so don’t use them.
What command line are you using to set clocks?

Hi,

I’m using 31 coolbits, is that too much?

Clocking via terminal:

nvidia-settings -a [gpu:0]/GPUMemoryTransferRateOffset[3]=1000
nvidia-settings -a [gpu:0]/GPUGraphicsClockOffset[3]=120

Yes, 31 is wrong, use 28.
You’re setting the wrong attributes, they’ve been changed for Pascal gpus:
https://devtalk.nvidia.com/default/topic/1031142/linux/-390-x-unable-to-modify-gpumemorytransferrateoffset-and-gpugraphicsclockoffset-via-nvidia-settings-/post/5248352/#5248352

Something is changed in 390.xx and the old set doesn’t work anymore?

I’m trying now with 28 bits, will update the ticket in next 30minutes.

Yes, correct.

Same,

[ 1402.243411] NVRM: Xid (PCI:0000:05:00): 79, GPU has fallen off the bus.
[ 1402.343749] NVRM: Xid (PCI:0000:05:00): 62, ffffffff(ffffffff) ffffffff ffffffff

[  184.916644] NVRM: Xid (PCI:0000:05:00): 13, Graphics SM Warp Exception on (GPC 1, TPC 0): Misaligned Address
[  184.916649] NVRM: Xid (PCI:0000:05:00): 13, Graphics Exception: ESR 0x50c648=0x5000f 0x50c650=0x20 0x50c644=0xd3eff2 0x50c64c=0x17f
[  184.928566] NVRM: Xid (PCI:0000:05:00): 43, Ch 00000010, engmask 00000101

But the errors are a little different.

XID 79 means the gpu was overheating, underpowered or otherwise overstretched so said bye-bye.