Nvidia-driver 535 bug report

Hi there,

Reporting a potential bug in driver 535-dkms (535.54.03) installed from cuda-rhel8-x86_64 repo.

[  596.254276] NVRM: GPU at PCI:0000:01:00: GPU-b5c29e20-916a-c151-c378-41b661bbd2b6
[  596.254282] NVRM: Xid (PCI:0000:01:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
[  596.254285] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
[  596.254307] NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
[  601.374178] NVRM: Error in service of callback

I’m on a Dell Precision 5560 with T1200 Laptop GPU (TU117GL-A).
Tried 3 different kernels (4.18, 6.1, 6.4) and nothing changed.

Never happened before, and reverting to 530.30.02 solves the problem.
Even with earlier driver versions, something like this never occurred.
Occasionally CPU clock gets throttled, but with all previous driver versions this wasn’t followed by the GPU falling off the bus killing my X session (correlation is not causation, just mentioning for clarity).

Attaching nvidia-bug-report.log.gz

Let me know if you need me to provide further details.

Thanks,
Fabio

nvidia-bug-report.log.gz (231.2 KB)