GPU faults on brand new RTX A6000 (NV172)

Hi there,
I am experiencing recurrent Xorg hangs with associated GPU faults on a brand new machine (Precision 7875 manufacturer DELL) coming with a RTX A6000 (Ampere) when I run the display at native screen resolution of 5120x2160.

Drivers: nvidia 550.135 (also present on 560.35.03)

The error logs report Xid 32 and 13 errors, sometimes 11, they look like this

[46503.401744] NVRM: Xid (PCI:0000:01:00): 11, pid=‘’, name=, Ch 0000000a Cl 0000a140 Off 00001a00 Data 00000000

[46505.255173] NVRM: Xid (PCI:0000:01:00): 13, pid=‘’, name=, Graphics Exception: ILLEGAL_OPCODE
[46505.255243] NVRM: Xid (PCI:0000:01:00): 13, pid=‘’, name=, Graphics Exception: ESR 0x404490=0x80000004
[46505.256150] NVRM: Xid (PCI:0000:01:00): 13, pid=‘’, name=, Graphics Exception: ChID 000a, Class 0000c7c0, Offset 00000000, Data 00000000
[46505.376316] NVRM: Xid (PCI:0000:01:00): 32, pid=‘’, name=, Channel ID 0000000a intr0 00040000
[46505.417705] NVRM: Xid (PCI:0000:01:00): 32, pid=‘’, name=, Channel ID 0000000a intr0 00040000
[46505.539618] NVRM: Xid (PCI:0000:01:00): 32, pid=‘’, name=, Channel ID 0000000a intr1 00000008 HCE_DBG0 00002d00 HCE_DBG1 00000000

The trigger is very easy. Concentrated 2D pixel rendering with the screensaver when displaying fractals :-O

For instance:

xlock -mode strange
xlock -mode ifs

leads to GPU faults after 1 or 2 seconds of displaying fractals.

The bug is most likely not in xlock as, on any other computers I’ve tested, it works very fine by forcing it to a resolution identical to the one of my machine (xlock -geometry 5120x2160 -mode strange).

If I use a lower resolution for my screen, for instance using xrandr to lower it to 4096x2160, no GPU fault, if I force xlock to use a lower resolution than the native one (xlock -geometry 4096x2160 -mode strange), no GPU fault neither.

Finally, I’ve updated system bios to latest version, no change. I’ve run a full set of DELL diagnostic on that machine, nothing faulty. I have downloaded the dcgmi suite and run the level 4 tests (dcgmi diag -r 4), they all PASS.

What remains is: driver bug, or hardware bug. Both ways, I would be very very happy if my super expansive NVIDIA card would just withstand my xlock screensaver in terms of graphical rendering…

I am happy to do any test,
Thanks,
cheers,
Chris.

nvidia-bug-report.log.gz (615.2 KB)