This is a copy and paste from my original post here https://forums.geforce.com/default/topic/1078396/geforce-drivers/xid-errors-on-gtx-1070-linux/
The problem
What :Screen freezes, Xid code appears in system logs (kernel, xorg).
Which codes : 31, 13, 69, 32, 12, 32, 56.
When : Always, soon or later. Occurs faster on steam games.
Why : Driver. While on Windows I can play the exactly same games. Sometimes I get “device lost” or “device hung”.
Who : Galax Geforce GTX 1070 OC Mini 8GB GDDR5 256-Bit, S/N 70NSH6DVO5MN .
Where : GNU/Linux .
How : Left for Dead 2 in-game, Metro 2033 redux any part of the game, CS:GO in-game (less errors). Rarely with Unigine Valley.
How many times : For steam games : always. For the rest : sporadic.
System info
Platform : Ryzen, chipset X370 .
Driver version : 410.73 x86_64 (.run, ubuntu 16) .
Kernel : 4.18.0-2-amd64 (debian X86_64) .
libs : checked.
cuda : 10.0 .
Personal observations
CPU : I had to send my CPU (1800X) to the warranty for replacement, since I thought it could be an issue caused my the PCI-Ex controller. Also my old CPU (batch 21) presented segfault issues. The current batch is 35.
Chipset : I replaced my MOBO because first of all I thought it was an issue on the PCI-Ex slot or related. From a X370 Taichi to a X370GT7.
GPU : seems to work well, tested on CUDA applications and Unigine Valley runs fine most part of the time.
PSU : not an issue, tested on 2 different PSUs, Strike-X 800W Silver and Corsair AX860i Platinum.
Raw output (from dmesg, for example)
[ 4514.185733] NVRM: GPU at PCI:0000:09:00: GPU-3cafb039-8cf0-4c61-20ff-cc44042e1c48
[ 4514.185737] NVRM: GPU Board Serial Number:
[ 4514.185741] NVRM: Xid (PCI:0000:09:00): 31, Ch 00000030, engmask 00000101, intr 10000000
[ 4515.966830] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: MISSING_MACRO_DATA
[ 4515.966838] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x404490=0x80000001
[ 4515.966872] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ChID 0030, Class 0000c197, Offset 00002390, Data 00000000
[ 5188.266947] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: EXTRA_MACRO_DATA
[ 5188.266958] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x404490=0x80000002
[ 5188.267001] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ChID 0030, Class 0000c197, Offset 00001618, Data 00000007
[ 5932.412935] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: EXTRA_MACRO_DATA
[ 5932.412944] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x404490=0x80000002
[ 5932.412979] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ChID 0030, Class 0000c197, Offset 00002390, Data 00000310
[ 5948.377628] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0030, Class 0000c197, Offset 00002388, Data 00fcb101, ErrorCode 00000004
[ 6010.341325] warning: process `metro’ used the deprecated sysctl system call with 10.1.
[ 6013.618188] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000043 intr 00040000
[ 6016.037934] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: MISSING_MACRO_DATA
[ 6016.037943] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x404490=0x80000001
[ 6016.037978] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ChID 0043, Class 0000c197, Offset 0000342c, Data 00000001
[ 6024.162134] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000043 intr 00040000
[ 6028.145277] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000043 intr 00040000
[ 6033.169969] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000043 intr 00040000
[ 6040.681824] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000040 intr 00040000
[ 6045.169454] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0040, Class 0000c197, Offset 00000754, Data 00000000, ErrorCode 00000004
[ 6046.481502] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0040, Class 0000c197, Offset 00002388, Data 03ddbc01, ErrorCode 00000004
[ 6046.674237] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0040, Class 0000c197, Offset 00002380, Data 00000202, ErrorCode 00000004
[ 6046.985544] NVRM: Xid (PCI:0000:09:00): 12, Ch 00000040 Cl 0000c197 Off 00001928 Data 00000001
[ 6084.260382] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000040 intr 00040000
[ 6113.019498] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0040, Class 0000c197, Offset 0000238c, Data 00000002, ErrorCode 00000004
[ 6130.407651] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000040 intr 00040000
[ 6130.407786] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000040 intr 00040000
[ 6147.743023] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000040 intr 00040000
[ 6147.743158] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000040 intr 00040000
[ 6148.279135] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 0040, Class 0000c197, Offset 00002040, Data 00016530, ErrorCode 0000000c
[ 6172.322880] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000040 intr 00040000
Adding more info:
IOMMU
Enabling IOMMU is not a solution.
“Left for Dead 2” : it delays part of the Xid Codes.
“Metro 2033 Redux” : it quickens the Xid Codes a lot.
RAM timings
Using either 1600MHz, 2133MHz, 2667MHz with either XMP or JEDEC results in the same Xid Codes.
Chips already tested and already sent for warranty. They found any problems and returned the chips.
Everything leads me to believe that it is an driver issue.
More raw outputs
IOMMU disabled, 2 different attempts
[ 6693.145473] NVRM: GPU at PCI:0000:09:00: GPU-3cafb039-8cf0-4c61-20ff-cc44042e1c48
[ 6693.145481] NVRM: GPU Board Serial Number:
[ 6693.145487] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 003b, Class 0000c197, Offset 00001614, Data 00000000, ErrorCode 0000000d
[ 6696.212796] NVRM: Xid (PCI:0000:09:00): 56, CMDre 00000001 00000080 00000000 00000005 00000034
[ 6719.437984] NVRM: Xid (PCI:0000:09:00): 56, CMDre 00000001 00000080 00000000 00000005 00000034
[ 6742.542683] NVRM: Xid (PCI:0000:09:00): 56, CMDre 00000001 00000080 00000000 00000005 00000034
[ 7922.685307] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000038 intr 00040000
[ 7922.892342] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000038 intr 00040000
[ 7923.099539] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000038 intr 00040000
[ 7923.475077] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000038 intr 00040000
[ 7923.794759] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000038 intr 00040000
[ 7928.012294] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000038 intr 00040000
[ 7929.984515] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000038 intr 00040000
[ 7934.455201] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000038 intr 00040000
[ 7938.531087] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000038 intr 00040000
IOMMU enabled, pay attention to the wider interval of failures while on the same attempt (first attempt was on l4d2, until 1705)
[ 663.479114] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000028 intr 00040000
[ 677.610775] NVRM: Xid (PCI:0000:09:00): 56, CMDre 00000001 00000080 00000000 00000005 00000034
[ 716.549281] NVRM: Xid (PCI:0000:09:00): 32, Channel ID 00000028 intr 00040000
[ 718.634529] NVRM: Xid (PCI:0000:09:00): 56, CMDre 00000001 00000080 00000000 00000005 00000034
[ 741.763267] NVRM: Xid (PCI:0000:09:00): 56, CMDre 00000001 00000080 00000000 00000005 00000034
[ 764.875447] NVRM: Xid (PCI:0000:09:00): 56, CMDre 00000001 00000080 00000000 00000005 00000034
[ 784.997477] NVRM: Xid (PCI:0000:09:00): 56, CMDre 00000001 00000080 00000000 00000005 00000034
[ 1503.091368] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 002b, Class 0000c197, Offset 00000214, Data 00001011, ErrorCode 00000004
[ 1505.156677] NVRM: Xid (PCI:0000:09:00): 56, CMDre 00000001 00000080 00000000 00000005 00000034
[ 1525.240677] NVRM: Xid (PCI:0000:09:00): 56, CMDre 00000001 00000080 00000000 00000005 00000034
[ 1548.308032] NVRM: Xid (PCI:0000:09:00): 56, CMDre 00000001 00000080 00000000 00000005 00000034
[ 1636.692415] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 002b, Class 0000a140, Offset 000001b0, Data 00001001, ErrorCode 00000053
[ 1638.749038] NVRM: Xid (PCI:0000:09:00): 56, CMDre 00000001 00000080 00000000 00000005 00000034
[ 1679.857352] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: EXTRA_MACRO_DATA
[ 1679.857362] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ESR 0x404490=0x80000002
[ 1679.857418] NVRM: Xid (PCI:0000:09:00): 13, Graphics Exception: ChID 002b, Class 0000c197, Offset 00002380, Data 00007000
[ 1681.927364] NVRM: Xid (PCI:0000:09:00): 56, CMDre 00000001 00000080 00000000 00000005 00000034
[ 1705.014863] NVRM: Xid (PCI:0000:09:00): 56, CMDre 00000001 00000080 00000000 00000005 00000034
[ 1972.628439] NVRM: Xid (PCI:0000:09:00): 69, Class Error: ChId 002b, Class 0000c197, Offset 00001538, Data 00000002, ErrorCode 0000000c
[ 1974.730907] NVRM: Xid (PCI:0000:09:00): 56, CMDre 00000001 00000080 00000000 00000005 00000034
[ 1997.827641] NVRM: Xid (PCI:0000:09:00): 56, CMDre 00000001 00000080 00000000 00000005 00000034
[ 2020.910600] NVRM: Xid (PCI:0000:09:00): 56, CMDre 00000001 00000080 00000000 00000005 00000034
Also, cuda-memcheck reports nothing.