H100 GPU not being shown with nvidia-smi on Dell server

nvidia-bug-report_09-27-2024.log (8.9 MB)
I’ve uploaded the bug report from the server running Linux. We’ve tried reinstalling drivers but it did not work. The GPU shows up when running lspci and in the bug report.

NVRM: Xid (PCI:0000:0d:00): 140, pid='<unknown>', name=<unknown>, An uncorrectable ECC error detected (possible firmware handling failure) DRAM:0, LTC:0, MMU:0, PCIE:0
Damaged?