GTX Titan: GPU lockup (after Xid 62) since 396.x

Starting with Linux driver 396.x my system gets a Xid 62 error and then the GPU locks up until reboot.
It’s a GTX Titan (Asus GTXTITAN-6GD5, 6GB VRAM) on an Asus X99-E WS, Intel 5960X CPU.

Before 396.x I had Xid 61 errors every n minutes/hours (depending on GPU load), but these didn’t cause any lockups.
The only problem I’ve always had were intermittent HDMI audio dropouts when the GPU was not operating at performance level P8. Switched to S/PDIF because of that :/

For now I’m stuck with the 390.x branch of the NVIDIA driver.

I really hope there is a way to fix this…

dmesg (396.x and newer):
NVRM: Xid (PCI:0000:05:00): 62, 0c56(16f4) 00000000 00000000
kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

dmesg (before 396.x):
NVRM: Xid (PCI:0000:05:00): 61, 1968(1818) 00000000 00000000

I forgot to mention that everything is working great in Windows 10, including HDMI audio. So it has to be a driver and/or kernel issue.

Any Linux developer from NVIDIA reading this forum?
If I can’t resolve this I have to buy a new graphics card :( Not sure the legacy branch will work with kernel 4.20+.

Any plans on making a real Bug Tracker? I mean, selling good hardware is nice… but whats good hardware without working drivers?

Tried to get rid of this problem for a very long time now… At least with 390.x or earlier it was “just” Xid 61 errors and audio skips.

Did you already try putting it in a different slot?

Yes, that was one of the first things I tried.
Also tested endless combinations of UEFI/BIOS settings, kernel configs, inspected the card and cleaned it. The temperatures are ok and it’s working perfectly fine with Windows 10.

Maybe see if a newer vbios is avalable and flash that.

I retested with the latest driver (418.43) and it’s still not working. Same kernel errors and the system freezes.

The 390.x branch still causes no freeze. Now running version 390.116. I wonder what changes besides the official ChangeLog they did in between 390.x and 396.x.

No NVIDIA Developer reading this forum?
It would be totally awesome to have some kind of official bug tracker. What harm could that cause… Something like Bugzilla. I guess as long as it works on their test systems :/

Hello GPUCat,

You can submit bugs to Nvidia through this page: https://developer.nvidia.com/nvidia_bug/add
Please keep us updated here in the forum.

Best,
Tom