$ dmesg -T
[Tue Apr 1 16:39:18 2025] NVRM: GPU at PCI:0000:b1:00: GPU-40f90754-6e68-502b-4a05-f1f7df8d092b
[Tue Apr 1 16:39:18 2025] NVRM: GPU Board Serial Number: 1324323012725
[Tue Apr 1 16:39:18 2025] NVRM: Xid (PCI:0000:b1:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
[Tue Apr 1 16:39:18 2025] NVRM: GPU 0000:b1:00.0: GPU has fallen off the bus.
[Tue Apr 1 16:39:18 2025] NVRM: GPU 0000:b1:00.0: GPU serial number is 1324323012725.
[Tue Apr 1 16:39:18 2025] NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
$ nvidia-debugdump --list
Found 2 NVIDIA devices
Device ID: 0
Device name: NVIDIA RTX A6000
GPU internal ID: 1324223019966
Error: nvmlDeviceGetHandleByIndex(): Unknown Error
FAILED to get details on GPU (0x1): Unknown Error
$ lsmod | grep nvidia
nvidia_uvm 1323008 14
nvidia_drm 65536 0
nvidia_modeset 1298432 1 nvidia_drm
nvidia 56778752 908 nvidia_uvm,nvidia_modeset
drm_kms_helper 184320 4 ast,nvidia_drm
drm 495616 7 drm_kms_helper,drm_vram_helper,ast,nvidia,nvidia_drm,ttm
I have reinstall the driver, now version is 535.230.02, and tried changing the graphics card slot, but the problem still occurs.
nvidia-bug-report.log.gz (742.4 KB)