The reboot occurred the system when the 'nvidia-smi' command is entered

spec.

  • H/W : Dell PowerEdge R740(Server) + Nvidia Quadro RTX 5000(GPU)
  • OS : CentOS7.5
  • Driver Version : 430.34(linux 64bit)

hello,

some time ago, If you enter the ‘nvidia-smi’ command, a hang occurs for about 20 seconds, and then the server reboots.
The service has been running fine for 3 months, but I’ve had problems since reboot for maintenance purposes.

This is what I checked.

  1. Nvidia driver related
    1] I can use and check the nvidia driver in lsmod |grep nvidia, lshw -class display, cat /proc/driver/nvidia/version
    2] Reinstall the driver of the same version (install after --uninstall)
    3] Driver version upgrade (430.34 -> 440.100)
    –> Same issue after action

  2. Check OS log
    1] /var/log/messages
    2] /var/log/dmesg
    3] Collected nvidia-bug-repost.sh
    –> No log related before and after entering ‘nvidia-smi’ command, server boot related No error log

  3. H/W diag LED normal (no LED alarm)

No specifics TT_TT…

I’m trying to replace it with a spare GPU card. Do you have any additional check point before working?
Answers I’ll wait!! thank you