Dear Team,
HW Specification:
Model: PowerEdge R760xa
OS:Debian 11.7
Kernel Version:5.10.191
BIOS Version:2.2.7
iDRAC Firmware Version: 7.10.50.00
GPU Controller in Slot 36:
Model: NVIDIA L40S
Firmware Version:95.02.66.00.02
Driver Version:NVIDIA 535.129.03 / CUDA 12.2
When I run the nvidia-smi
command, the server hangs for about 20 seconds and then reboots.
Steps Taken:
- Checked OS logs:
/var/log/messages
/var/log/dmesg
- Collected output from
nvidia-bug-report.sh
.
No relevant logs related to the nvidia-smi
command or server boot errors were found.
I have tested this card in another server also the same problem. However, when I used a spare L40S in the same server, it worked fine. Please advise on how to resolve this issue as soon as possible.