[Solved] XServer Freezes during gaming - Attempted to yield the CPU while in atomic or interrupt con

On my Ubuntu 15.10 machine with a GTX970 XOrg freezes in a busy loop while sometimes the screen corrupts. This happens on both 361.18 and 355.11 and Linux 4.2 and 4.4, mostly during playing Divinity Original Sin Enhanced Edition. The Kernel log after the freeze reads:

[  474.778370] NVRM: GPU at PCI:0000:01:00: GPU-f83c7344-5fe6-0dd0-6020-53e351bb25d2
[  474.778379] NVRM: Xid (PCI:0000:01:00): 13, Graphics MME Exception Type:  MAX_INSTR_LIMIT
[  474.778383] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x404490=0x80000010
[  474.778392] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ChID 0020, Class 0000b197, Offset 00001f24, Data 00000fff
[  474.850717] NVRM: Xid (PCI:0000:01:00): 62, 1221(7534) 00000000 00000000
[  487.684223] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x405848=0x80000000
[  487.684228] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: Shader Program Header 9 Error
[  487.684229] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x405840=0xa0000200
[  487.684236] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ChID 0010, Class 0000b197, Offset 00002390, Data 44fffe00
[  489.682617] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
[  493.679435] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
[  495.678003] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

or

[  197.817969] NVRM: GPU at PCI:0000:01:00: GPU-f83c7344-5fe6-0dd0-6020-53e351bb25d2
[  197.817972] NVRM: Xid (PCI:0000:01:00): 13, Graphics MME Exception Type:  MAX_INSTR_LIMIT
[  197.817975] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x404490=0x80000010
[  197.817983] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ChID 0010, Class 0000b197, Offset 00001120, Data ffffff00
[  199.309204] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000010, engmask 00000111, intr 10000000
[  199.418979] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000010, engmask 00000111, intr 10000000
[  199.534524] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000010, engmask 00000111, intr 10000000
[  199.634223] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000010, engmask 00000111, intr 10000000
[  199.733947] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000010, engmask 00000111, intr 10000000
[  199.834381] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000010, engmask 00000111, intr 10000000
[  199.934001] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000010, engmask 00000111, intr 10000000
[  200.050528] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000010, engmask 00000111, intr 10000000
[  200.150546] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000010, engmask 00000111, intr 10000000
[  200.250648] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000010, engmask 00000111, intr 10000000
[  200.350520] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000010, engmask 00000111, intr 10000000
[  200.469579] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000010, engmask 00000111, intr 10000000
[  200.567281] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000010, engmask 00000111, intr 10000000
[  200.667278] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000010, engmask 00000111, intr 10000000
[  200.750630] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000010, engmask 00000111, intr 10000000
[  200.850888] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000010, engmask 00000111, intr 10000000
[  200.950595] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000010, engmask 00000111, intr 10000000
[  201.034284] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000010, engmask 00000111, intr 10000000
[  201.115862] NVRM: Xid (PCI:0000:01:00): 62, 3c59(9f48) 00000000 00000000
[  201.134002] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000010, engmask 00000111, intr 10000000
[  203.134199] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
[  205.134289] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Some people seem the have fixed similar problems by replacing there GPU. If this is indeed a hardware issue could I please get a conformation from Nvidia? Also I would have loved to include the nvidia-bug-report.log.gz file but I can’t find any attach function on the forum. Should I just email the file to linux-bugs@nvidia.com ? Should I add some reverence to this thread?
nvidia-bug-report.log.gz (208 KB)

I did a quick check and it also freezes in Shadow of Mordor. I attached gdb to the frozen Xserver this is the backtrace:

#0  0x00007f31234c62d6 in ?? () from /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so
#1  0x00007f3123525862 in ?? () from /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so
#2  0x00007f31239dc3b0 in ?? () from /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so
#3  0x0000000000000a00 in ?? ()
#4  0x00007fff4a86e780 in ?? ()
#5  0x0000563a7e8420b0 in ?? ()
#6  0x0000000000000001 in ?? ()
#7  0x0000563a7de90010 in ?? ()
#8  0x0000563a7de91010 in ?? ()
#9  0x0000563a7ddea900 in ?? ()
#10 0x00007f31239d9b15 in ?? () from /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so
#11 0x0000000000000000 in ?? ()

Register values:

rax            0xa058	41048
rbx            0xa058	41048
rcx            0x15500	87296
rdx            0x1	1
rsi            0x0	0
rdi            0x563a7ddce7e0	94809219721184
rbp            0x563a7ddce7e0	0x563a7ddce7e0
rsp            0x7fff4a86e5d0	0x7fff4a86e5d0
r8             0x1	1
r9             0x563a7fe57148	94809253835080
r10            0x563a7dde2c68	94809219804264
r11            0x7f31234c5d50	139849022332240
r12            0x7ff8	32760
r13            0x1ffe	8190
r14            0x0	0
r15            0xdf03	57091
rip            0x7f31234c62d6	0x7f31234c62d6
eflags         0x3287	[ CF PF SF IF #12 #13 ]
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x0	0
gs             0x0	0

The kernel log was:

[ 3440.744841] NVRM: GPU at PCI:0000:01:00: GPU-f83c7344-5fe6-0dd0-6020-53e351bb25d2
[ 3440.744845] NVRM: Xid (PCI:0000:01:00): 62, 1221(7e90) 8400c6dd a5a5a500

I’m not entirely sure how helpful this information is. If there is anything I can do to please tell me so

Please attach an automated bugreport as described here:
https://devtalk.nvidia.com/default/topic/522835/linux/if-you-have-a-problem-please-read-this-first/

Done. I have to say finding the upload function was not easy. It might be a good idea to add a sentence to the faq as I discovered in googling that I’m not the only one that finds the attach function hard to find ;-).

I googled a bit more and found this document https://docs.nvidia.com/deploy/pdf/XID_Errors.pdf. So I think the “Attempted to yield the CPU while in atomic or interrupt context” is actually a red herring and the result of general internal errors. Since it is not always occurring and always only is trigged after a flood of Xid messages.
The errors are mostly 13, 31 and 62 which are indicative of hardware or driver errors (in the expanded description of XID 13 it says: “In rare cases, it’s possible for a hardware failure or system software bugs to materialize as XID 13”). A driver error seems unlikely since the hardware worked fine with 358 back in November.

I see first error generated on your system is : [ 474.778379] NVRM: Xid (PCI:0000:01:00): 13, Graphics MME Exception Type: MAX_INSTR_LIMIT

Is any other driver version don’t have this issue?

JustMaximumpower how did you install the 4.2 and 4.4 kernels?

I didn’t expect the 355 series to work with 4.4. I recall the shim did not even compile with the 4.3 series. Somewhat expected with a deprecated version.

I used the ubuntu mainline ppa http://kernel.ubuntu.com/~kernel-ppa/mainline/ for 4.4. The other one is the standard 15.10 kernel.

I actually RMAed the card so lets see if what happens when I get something back.

Ah, it should work then. The 4.4 kernel packages are generated on 15.10. Hopefully it was just a defective card.

Finally got a response. The card was defective and could not be repaired, also they had no replacement card. So I got a full refund.

Btw is it normal that you have to wait 5 weeks in case of a RMA?