New Ryzen 3950X XID errors / segfault

Couple of weeks ago I upgraded my Ryzen 2700X to Ryzen 3950X and am seeing regular freezes on my Ubuntu setup. I have 2 RTX 2080 TIs in the system, and 64GB RAM running on 2100Mhz using an X570 board.

Checking the logs reveals the following:

kernel: [82370.884942] NVRM: GPU at PCI:0000:09:00: GPU-5611c32e-db74-0e2b-6dfe-d46e8112e337
kernel: [82370.884947] NVRM: GPU Board Serial Number:
kernel: [82370.884955] NVRM: Xid (PCI:0000:09:00): 61, pid=1562, 0cec(3098) 00000000 00000000
kernel: [82395.164640] GpuWatchdog[6657]: segfault at 0 ip 000055e6d15f5ecd sp 00007fdd7cccf6d0 error 6 in chrome[55e6ccf49000+785a000]
kernel: [82395.164648] Code: 00 79 09 48 8b 7d b0 e8 b1 94 6c fe c7 45 b0 aa aa aa aa 0f ae f0 41 8b 84 24 e0 00 00 00 89 45 b0 48 8d 7d b0 e8 f3 59 ba fb 04 25 00 00 00 00 37 13 00 00 48 83 c4 38 5b 41 5c 41 5d 41 5e
kernel: [82395.192542] GpuWatchdog[6672]: segfault at 0 ip 0000560b81c25479 sp 00007fe9110c8680 error 6 in slack[560b7e2e0000+5caf000]

Can this be a ryzen bug? I ran kill-ryzen a bit but did not see any segfaults. Otherwise maybe it could be also RAM? They have been a bit flaky for me as I am running two different 32GB kits on lower Mhz.

The whole thing only started after getting the Ryzen 3950 though. It happens mostly in low compute situations, but I also observed segfaults while running cuda python code.

One more segfault in python/chrome:

[Mo Aug 24 13:49:21 2020] python3[31794]: segfault at 7b660d3048f8 ip 00005637c6fdbe30 sp 00007f6531796510 error 4 in python3.8[5637c6ef5000+206000]
[Mo Aug 24 13:49:21 2020] Code: 00 4c 8b 74 24 08 48 8d 05 8d 45 24 00 4f 8d 3c 76 4e 8b a4 f8 70 01 00 00 49 39 ec 74 78 66 66 2e 0f 1f 84 00 00 00 00 00 90 <49> 8b 5c 24 08 49 8b 54 24 10 83 e3 01 48 8d 3c 95 00 00 00 00 48
[Mo Aug 24 13:49:43 2020] chrome[16973]: segfault at 521c315016c0 ip 0000561c2b974a4c sp 00007ffc38593770 error 4 in chrome[561c29461000+7a53000]
[Mo Aug 24 13:49:43 2020] Code: d0 0f 85 81 00 00 00 48 81 c4 48 01 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 48 83 c3 10 49 39 df 74 9f 48 8b 3b <83> 3f 01 75 ef 41 c6 46 20 00 49 8b 46 18 48 8b b0 d0 00 00 00 48

Have you updated the bios to latest? The kill-ryzen was if I remember correctly related to the older gen.

You can run memtest86+ and maybe some prime95 in blend to see that the memory/cpu is fine.

This seems to be similar issue tracked in below thread -

We are actively working on it, please refer above thread for more updates.