GTX970 + 1070 errors with Ryzen 2600

Hi All,
I want to report an issue with a errors I get when running GPU tasks. This could include watching a video (VLC or (s)mplayer) or playing game.

Specifics:
Kernel v4.20 and Nvidia driver 415.25 - also kernel v4.18 and Nvidia driver 410 on both a 970 and 1070. Ryzen 5 2600 and x370 Mobo. The 970 was working fine on my old LGA1156 mobo + i7 setup.

Any ideas? Any further tests I could run? Any further info required?

1nvidia-bug-report.log.gz (503 KB)

Since you have a 2nd gen Ryzen, the Ryzen bug is out of the game. Thus, the XIDs point towards the system memory. IIRC, depending on mainboard and dual/single channel, the memory is only stable at lower clocks. Some other user investigated that on his Ryzen2 system. What are the current settings in bios? Is it stable if you lower it? Then there’s of course the chance of a flawed memory module. Don’t use memtest86 or the like to test, that’s unreliable. Remove all but one module, test, then swap modules.

Sorry for the slow response. I got round to trying this today. Went one stick of ram each time and I still got the same issues. I did test before this thread with memtest86 and both were fine (just for adding extra info). What’s the issue with memtest? What else can I try?

I think you can assume the memory is fine. Did you try lowering the clocks?

Like under-clocking it? No, not yet, I can try that just to see if it works though

memory clocks, not cpu clock.

Also make sure you’re running the latest bios.

Yes memory clocks. I should have been clearer. Anyway I’ve spent a few hours underclocking and even slightly overclocking the memory. Each time I’d test with memtest with no errors (I’ve got no other real way of checking) and each time it was stable until I was in the OS trying to play a video.
BIOS is the latest from 15 days ago.

Then it’s probably not memory related, maybe something pcie. Two things to try:

  • disable iommu in bios
  • use kernel parameter pcie_aspm=off

I finally got round to testing those two things. Neither made a difference. Happy to try any other ideas.

I’m quite out of ideas, all that’s left is the mainboard and the cpu. Though it seems absurd, maybe try the ryzen-test: https://github.com/Oxalin/ryzen-test

Haven’t done the ryzen test yet. Might soon as I’d still like to play games. Just replying here for the next poor sap this issue impacts.

Starting VLC without hardware acceleration seems to stop the crash happening.
vlc --avcodec-hw none

(s)mplayer could probably also be told not to use hardware acceleration.

Furthermore this thread seems related.
https://bugs.launchpad.net/ubuntu/+source/mesa/+bug/1574354

This snippet from the above thread might also help people:
“It turns out it default to “automatic”, which is, in this configuration, VDPAU output. And this causes said problems. Switching to “OpenGL GLX video output (XCB)” works around the issue.”

Still bugged in driver v418

Mar 12 14:44:53 homer kernel: [ 968.864843] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000023 intr 02000000
Mar 12 14:44:53 homer kernel: [ 969.316834] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: Class 0x0 Subchannel 0x0 Mismatch
Mar 12 14:44:53 homer kernel: [ 969.316838] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x4041b0=0x0
Mar 12 14:44:53 homer kernel: [ 969.316841] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x404000=0x80000002
Mar 12 14:44:53 homer kernel: [ 969.316953] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ChID 0023, Class 0000b197, Offset 00001b0c, Data 1000f010
Mar 12 14:44:53 homer kernel: [ 969.317121] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000023 intr 02000000
Mar 12 14:44:56 homer kernel: [ 971.545807] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: Class 0x0 Subchannel 0x0 Mismatch
Mar 12 14:44:56 homer kernel: [ 971.545815] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x4041b0=0x0
Mar 12 14:44:56 homer kernel: [ 971.545820] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x404000=0x80000002
Mar 12 14:44:56 homer kernel: [ 971.545954] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ChID 0023, Class 0000b197, Offset 00001b0c, Data 1000f010
Mar 12 14:44:56 homer kernel: [ 971.546171] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000023 intr 02000000
Mar 12 14:44:57 homer kernel: [ 972.493887] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000023 intr 00800000
Mar 12 14:44:57 homer kernel: [ 972.494159] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000023 intr 00800000
Mar 12 14:44:58 homer kernel: [ 974.261117] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 0000021c, Data 00001211, ErrorCode 0000000c
Mar 12 14:45:03 homer kernel: [ 979.210714] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 0000238c, Data 08000020, ErrorCode 0000000c
Mar 12 14:45:04 homer kernel: [ 980.186369] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 00002384, Data 08000001, ErrorCode 0000000c
Mar 12 14:45:05 homer kernel: [ 980.880974] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: Class 0x0 Subchannel 0x0 Mismatch
Mar 12 14:45:05 homer kernel: [ 980.880982] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x4041b0=0x0
Mar 12 14:45:05 homer kernel: [ 980.880987] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x404000=0x80000002
Mar 12 14:45:05 homer kernel: [ 980.881125] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ChID 0023, Class 0000b197, Offset 00001b0c, Data 1000f010
Mar 12 14:45:05 homer kernel: [ 980.881345] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000023 intr 02000000
Mar 12 14:45:05 homer kernel: [ 981.010711] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 0000238c, Data 20000eb0, ErrorCode 0000000c
Mar 12 14:45:05 homer kernel: [ 981.202858] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 00002384, Data 20000001, ErrorCode 0000000c
Mar 12 14:45:07 homer kernel: [ 982.757795] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: Class 0x0 Subchannel 0x0 Mismatch
Mar 12 14:45:07 homer kernel: [ 982.757803] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x4041b0=0x0
Mar 12 14:45:07 homer kernel: [ 982.757807] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x404000=0x80000002
Mar 12 14:45:07 homer kernel: [ 982.757950] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ChID 0023, Class 0000b197, Offset 00001614, Data 00000000
Mar 12 14:45:07 homer kernel: [ 982.758173] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000023 intr 02000000
Mar 12 14:45:07 homer kernel: [ 982.818715] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000023 intr 00800000
Mar 12 14:45:07 homer kernel: [ 982.818937] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000023 intr 00800000
Mar 12 14:45:08 homer kernel: [ 984.021133] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: MISSING_MACRO_DATA
Mar 12 14:45:08 homer kernel: [ 984.021145] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x404490=0x80000001
Mar 12 14:45:08 homer kernel: [ 984.021288] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ChID 0023, Class 0000b197, Offset 00002390, Data 3e8b5f72
Mar 12 14:45:08 homer kernel: [ 984.106757] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: Class 0x0 Subchannel 0x0 Mismatch
Mar 12 14:45:08 homer kernel: [ 984.106767] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x4041b0=0x0
Mar 12 14:45:08 homer kernel: [ 984.106774] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x404000=0x80000002
Mar 12 14:45:08 homer kernel: [ 984.106942] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ChID 0023, Class 0000b197, Offset 00001b0c, Data 1000f010
Mar 12 14:45:08 homer kernel: [ 984.107207] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000023 intr 02000000
Mar 12 14:45:08 homer kernel: [ 984.245139] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 00002384, Data 08000001, ErrorCode 0000000c
Mar 12 14:45:09 homer kernel: [ 984.521703] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 00000d44, Data 200104ea, ErrorCode 00000004

Have you verified by installing driver 418.43, if yes, please share nvidia bug report and detailed repro steps.

Yes, the 418.43 driver.
It seems to give more information though than 415.

Mar 12 17:20:54 homer kernel: [ 151.326232] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ChID 0023, Class 0000a140, Offset 000001b4, Data 03000000
Mar 12 17:20:54 homer kernel: [ 151.326461] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000023 intr 02000000
Mar 12 17:20:54 homer kernel: [ 151.785248] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 0000161c, Data 200505f2, ErrorCode 0000000c
Mar 12 17:20:54 homer kernel: [ 151.809392] show_signal_msg: 12 callbacks suppressed
Mar 12 17:20:54 homer kernel: [ 151.809395] RenderingThread[2912]: segfault at 6d424020 ip 00007fdeeec823f7 sp 00007fdee326ba30 error 4 in libc-2.28.so[7fdeeec3e000+1df000]
Mar 12 17:20:54 homer kernel: [ 151.809402] Code: 0e 75 07 eb 1b 41 ff 0e 74 16 49 8d 3e 48 81 ec 80 00 00 00 e8 ea 4c 0e 00 48 81 c4 80 00 00 00 48 89 d0 48 c1 e0 05 4c 01 f8 <48> 8b 50 10 48 83 fa 03 74 37 48 83 fa 04 0f 85 55 ff ff ff 48 8b

Bug report attached.
nvidia-bug-report.log.gz (1.02 MB)

Thanks for sharing bug report, can you please explain expected and observed behavior and provide detailed repro steps so that I can attempt to reproduce it internally.

Oops, sorry.
Current steps to reproduce:
Download rocket league. Open. Play a game for a minute (max) till it crashes.
Heck even sometimes the steam client UI causes these errors in the kernel.log

Old steps:
Find a 1080p x264 video and play in VLC or mplayer. This now works because as per a few comments ago I’ve changed the video output module to XCB. I believe it defaults to VDPAU and that caused problems.

If you need any more info feel free to ask. I presume you’ll be able to get all the hardware info from the bug report log.

Also happy to try anything on this system you think will help you diagnose the issue better.
Thanks for your help.

Newest Bios (includes AGESA 0070 for the upcoming processors and improve some CPU compatibility.)
Newest stable kernel v5.0.1
Nvidia driver 418.43 installed directly (not via PPA)
Still crashes.

nvidia-bug-report.log.tar.gz (1.02 MB)

Seemed to lag a bit more before it finally crashed though if that helps. The Xid 69 errors are the lag where it usually would crash.

Mar 13 17:22:32 homer kernel: [ 672.091282] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ChID 0023, Class 0000b197, Offset 00002390, Data 00000000
Mar 13 17:22:34 homer kernel: [ 673.632970] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 0000238c, Data 20000e90, ErrorCode 0000000c
Mar 13 17:22:34 homer kernel: [ 673.743257] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 000017d0, Data 01000001, ErrorCode 0000000c
Mar 13 17:22:36 homer kernel: [ 675.890269] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 00001a14, Data 20000000, ErrorCode 0000000c
Mar 13 17:22:36 homer kernel: [ 676.194977] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 00002384, Data 01000001, ErrorCode 0000000c
Mar 13 17:22:37 homer kernel: [ 676.789996] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 0000238c, Data 20000020, ErrorCode 0000000c
Mar 13 17:22:38 homer kernel: [ 677.764137] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 0000204c, Data 20010811, ErrorCode 0000000c
Mar 13 17:22:42 homer kernel: [ 681.548393] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 0000204c, Data 20010811, ErrorCode 0000000c
Mar 13 17:22:43 homer kernel: [ 682.398117] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 0000238c, Data 20000020, ErrorCode 0000000c
Mar 13 17:22:46 homer kernel: [ 685.547959] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 0000238c, Data 200010c0, ErrorCode 0000000c
Mar 13 17:22:46 homer kernel: [ 685.818215] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 00002384, Data 20000001, ErrorCode 0000000c
Mar 13 17:22:47 homer kernel: [ 686.251990] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 0000238c, Data 20000020, ErrorCode 0000000c
Mar 13 17:22:47 homer kernel: [ 686.931023] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 00000d7c, Data 20000000, ErrorCode 0000000c
Mar 13 17:22:47 homer kernel: [ 687.018914] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 0000204c, Data 00000130, ErrorCode 0000000c
Mar 13 17:22:47 homer kernel: [ 687.223497] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 00000d7c, Data 01000000, ErrorCode 0000000c
Mar 13 17:22:49 homer kernel: [ 688.627610] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 00002384, Data 20000001, ErrorCode 0000000c
Mar 13 17:22:50 homer kernel: [ 690.179862] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 00002380, Data 01004100, ErrorCode 0000000c
Mar 13 17:22:51 homer kernel: [ 690.447789] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 00002490, Data 01000010, ErrorCode 0000000c
Mar 13 17:23:03 homer kernel: [ 702.764487] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 00002380, Data 01004100, ErrorCode 0000000c
Mar 13 17:23:04 homer kernel: [ 703.851828] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: Class 0x0 Subchannel 0x0 Mismatch
Mar 13 17:23:04 homer kernel: [ 703.851836] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x4041b0=0x0
Mar 13 17:23:04 homer kernel: [ 703.851840] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x404000=0x80000002
Mar 13 17:23:04 homer kernel: [ 703.851979] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ChID 0023, Class 0000b197, Offset 000023a8, Data 00000000
Mar 13 17:23:08 homer kernel: [ 707.864515] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 0000037c, Data 20000001, ErrorCode 0000000c
Mar 13 17:23:09 homer kernel: [ 708.414417] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 0000196c, Data 01000000, ErrorCode 0000000c
Mar 13 17:23:09 homer kernel: [ 708.785720] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: MISSING_MACRO_DATA
Mar 13 17:23:09 homer kernel: [ 708.785731] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x404490=0x80000001
Mar 13 17:23:09 homer kernel: [ 708.785873] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ChID 0023, Class 0000b197, Offset 00002390, Data 00000000
Mar 13 17:23:10 homer kernel: [ 709.300719] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 000017e4, Data 80000585, ErrorCode 0000000d
Mar 13 17:23:10 homer kernel: [ 709.342937] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 00000fc4, Data 2000ffff, ErrorCode 0000000c
Mar 13 17:23:11 homer kernel: [ 710.443406] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 0000238c, Data 01000e30, ErrorCode 0000000c
Mar 13 17:23:11 homer kernel: [ 710.876120] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 00002380, Data 01004100, ErrorCode 0000000c
Mar 13 17:23:12 homer kernel: [ 711.715389] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 0000196c, Data 20000000, ErrorCode 0000000c
Mar 13 17:23:14 homer kernel: [ 714.058767] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 0000161c, Data 200505f2, ErrorCode 0000000c
Mar 13 17:23:16 homer kernel: [ 715.238097] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 0000204c, Data 20010811, ErrorCode 0000000c
Mar 13 17:23:17 homer kernel: [ 716.948795] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 00000108, Data 20010d35, ErrorCode 0000000b
Mar 13 17:23:17 homer kernel: [ 717.071082] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 00000f64, Data 21000100, ErrorCode 00000004
Mar 13 17:23:18 homer kernel: [ 717.532508] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: Class 0x0 Subchannel 0x0 Mismatch
Mar 13 17:23:18 homer kernel: [ 717.532515] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x4041b0=0x0
Mar 13 17:23:18 homer kernel: [ 717.532519] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x404000=0x80000002
Mar 13 17:23:18 homer kernel: [ 717.532662] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ChID 0023, Class 0000b197, Offset 00001b0c, Data 1000f010
Mar 13 17:23:18 homer kernel: [ 717.532886] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000023 intr 02000000
Mar 13 17:23:19 homer kernel: [ 718.895239] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 0000238c, Data 20000020, ErrorCode 0000000c
Mar 13 17:23:20 homer kernel: [ 720.138353] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 0000238c, Data 20000020, ErrorCode 0000000c
Mar 13 17:23:21 homer kernel: [ 720.258999] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 00001ce0, Data 20000000, ErrorCode 0000000c
Mar 13 17:23:22 homer kernel: [ 722.166392] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x405848=0x80000000
Mar 13 17:23:22 homer kernel: [ 722.166412] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: Shader Program Header 18 Error
Mar 13 17:23:22 homer kernel: [ 722.166418] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x405840=0x82040000
Mar 13 17:23:22 homer kernel: [ 722.166557] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ChID 0023, Class 0000b197, Offset 00002390, Data 00000000