GTX970 + 1070 errors with Ryzen 2600

I played game multiple times on Intel based system but unable to reproduce issue.
I will be trying now on system having AMD CPU and will update you accordingly.

Yeah when the card was in my old i7 (first gen) rig it worked fine too.
What games are you using for testing? I’ll see how they go on my setup too.

I am testing rocket league game on system Alienware Area-51 R3 which has CPU model AMD Ryzen Threadripper 1950X 16-Core Processor but still not able to reproduce issue.
I checked with kernel version (5.0.1-050001-generic and 4.20.0-042000-generic); matched display settings and vbios of GTX 970 cards as well but no luck.
Do you have any other system and spare NVIDIA card where you can perform testing respectively, I think this might be related to specific system.

ElitistPhoenix, did you ever try putting the graphics card into a different slot?

amrits did you have a 2nd gen Ryzen to test on?
I did have a 1070 that I swapped in and it had the same issues.
I don’t have a second system running linux to swap the 970 into (the 1070 computer runs win10).
It is a weird bug. The computer is 100% stable on anything other than graphics related tasks.
Also the 970 was stable on the i7 it came out of. It was running the same OS (slightly older kernel). When the 970 was in a w7 computer it ran fine too.

generix yeah I did try both the 970 and 1070 in different slots. Same deal. Thanks for the suggestion though.

I found system with AMD Ryzen™ 7 2700X Processor; will perform testing and share results with you.
Moreover, can you please confirm if your system crashes or just game application crashes resulting in Xid errors.

Attempted repro on ASUS PRIME X470-PRO which has CPU AMD Ryzen 7 2700X Eight-Core Processor (2nd gen) with NVIDIA driver 418.43 and kernel version - 5.0.1-050001-generic but no luck in reproducing issue.

Can you please confirm if you are running any other application parallel with game and load on system during crash.
Any specific settings done on game ?
Also let us know gpu utilization during crash and does it gets overheated.
Please share video clip depicting issue.

any other application parallel with game
No. Only the steam UI. Which can cause issues it left in the foreground but for the tests it was minimised. CPU was basically nothing 10-20%. GPU load and temp is in vid below.

specific settings done on game
High performance or high quality both have issues. Current settings which still kill it:

Youtube vid of the issue. Just did it off my phone so I didn’t cause any extra load while screen recording.

https://youtu.be/j897VEqOh4I

Last lines of the kernel log you see in the video.

Mar 22 17:23:06 homer kernel: [774525.879713] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000033 intr 02000000
Mar 22 17:23:06 homer kernel: [774526.681429] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: Class 0x0 Subchannel 0x0 Mismatch
Mar 22 17:23:06 homer kernel: [774526.681437] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x4041b0=0x0
Mar 22 17:23:06 homer kernel: [774526.681441] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x404000=0x80000002
Mar 22 17:23:06 homer kernel: [774526.681581] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ChID 0033, Class 0000b197, Offset 00000ff8, Data 04600000
Mar 22 17:23:07 homer kernel: [774527.340759] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: Class 0x0 Subchannel 0x0 Mismatch
Mar 22 17:23:07 homer kernel: [774527.340767] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x4041b0=0x0
Mar 22 17:23:07 homer kernel: [774527.340771] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x404000=0x80000002
Mar 22 17:23:07 homer kernel: [774527.340911] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ChID 0033, Class 0000b197, Offset 00000804, Data 32640000
Mar 22 17:23:07 homer kernel: [774527.341145] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000033 intr 02000000
Mar 22 17:23:08 homer kernel: [774528.337072] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: Class 0x0 Subchannel 0x0 Mismatch
Mar 22 17:23:08 homer kernel: [774528.337076] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x4041b0=0x0
Mar 22 17:23:08 homer kernel: [774528.337078] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x404000=0x80000002
Mar 22 17:23:08 homer kernel: [774528.337185] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ChID 0033, Class 0000b197, Offset 00001b0c, Data 1000f010
Mar 22 17:23:08 homer kernel: [774528.337354] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000033 intr 02000000
Mar 22 17:23:11 homer kernel: [774531.062036] NVRM: Xid (PCI:0000:09:00): 12, Ch 00000033 Cl 0000b197 Off 000023d0 Data 3a87f781
Mar 22 17:23:11 homer kernel: [774531.090928] show_signal_msg: 12 callbacks suppressed
Mar 22 17:23:11 homer kernel: [774531.090933] RocketLeague[695]: segfault at bcd8fba0 ip 00007fe0cb5ee3f7 sp 00007fffda8d5c70 error 4 in libc-2.28.so[7fe0cb5aa000+1df000]
Mar 22 17:23:11 homer kernel: [774531.090945] Code: 0e 75 07 eb 1b 41 ff 0e 74 16 49 8d 3e 48 81 ec 80 00 00 00 e8 ea 4c 0e 00 48 81 c4 80 00 00 00 48 89 d0 48 c1 e0 05 4c 01 f8 <48> 8b 50 10 48 83 fa 03 74 37 48 83 fa 04 0f 85 55 ff ff ff 48 8b

Another crash with an error I haven’t seen before

Mar 22 17:47:34 homer kernel: [775994.800150] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0023, Class 0000b197, Offset 00001688, Data 01000000, ErrorCode 0000000c
Mar 22 17:47:34 homer kernel: [775994.844754] RenderingThread[3164]: segfault at ffffffffffffffff ip ffffffffffffffff sp 00007f10dd913918 error 15
Mar 22 17:47:34 homer kernel: [775994.844758] Code: Bad RIP value.
Mar 22 17:47:55 homer kernel: [776015.672434] NVRM: Xid (PCI:0000:09:00): 12, Ch 00000023 Cl 0000b197 Off 00000758 Data 200104b8

I updated to the new driver 418.56 though I went back to the ppa. The NVIDIA bin file install was just too painful in this day and age.

Reading through the arch wiki:
https://wiki.archlinux.org/index.php/NVIDIA/Troubleshooting#1._GL_threads

Enabling the Force Composition Pipeline and setting one of my monitors as the primary display for the X screen plus adding these following two options to the xorg.conf seemed to make it more stable. I managed to get 3 games in on lowest to medium settings. Though maybe it was just because it was a fresh boot.

Option         "TripleBuffer" "on"
Option         "AllowIndirectGLXProtocol" "off"

I’ll check in again in a couple of days when the computer has been idling for a while.

Hi,
We tried out the system that has an X370 chipset with Ubuntu 18.04 + Kernel 5.0.06 + NV Driver 418.56 + 1070 Card + Dual Monitors and observed no issues.
I switched over to a 970 card to see if I can achieve a repro but no luck.

Let us know if you are still facing issues after making some configuration changes on xorg.cof file.

Hi ElitistPhoenix,

Please confirm if you are still facing the issue or making changes in Xorg file fixed the issue.

Hi amrits,

Yes, still facing the issues. Sorry for the late reply I’ve been on holidays.

Could you tell me the exact ram you’re using in the test systems. I’m thinking about buying 4gb of different ram and seeing the issue still happens.

Hi ElitistPhoenix,

We have been using Corsair LPX 32GB DRAM 3000MHz C15 Memory Kit in our test system.

Thank you.

I’ll try the ubuntu 19.04 upgrade in a bit and go with their default kernel and drivers and see if things improve.

Cheapest I can find that same type of ram is a bit over $100usd so I might keep trying some other options.

Hi ElitistPhoenix,

Did you get chance to test with Ubuntu upgrade or procure new RAM as per comment #34.
Please provide us update.

Hi amrits,
Thanks for reminding me. Sorry, got busy and forgot. The upgrade didn’t help.
Going to try and borrow some ram from work and see if that helps.

It was f’ing ram. Grabbed two different sticks from work. Tried one and it didn’t boot. Tried the second and it booted and the card was stable and has been for a day or so.
Thanks again everyone for your help.