Hi and hello,
we have several XL190 Gen8 servers with Tesla M60 adapters running vSphere 6.5.
The cards have not been in use until now - we are preparing for a PoC.
The Adapters were listed in vSphere client and by nvidia-smi:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 367.106 Driver Version: 367.106 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 0000:89:00.0 Off | Off |
| N/A 36C P8 24W / 150W | 19MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla M60 On | 0000:8A:00.0 Off | Off |
| N/A 31C P8 24W / 150W | 19MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+
No symptoms they were in compute mode, we also had a VM running and using the card.
Then we updated the driver to this version:
NVIDIA-kepler-VMware_ESXi_6.5_Host_Driver 367.128-1OEM.650.0.0.4598673
What we did was a procedure that worked well in our other datacenter:
-
Host -> maintenance
-
esxcli software vib remove -n NVIDIA-VMware_ESXi_6.5_Host_Driver
Removal Result
Message: Operation finished successfully.
Reboot Required: false
VIBs Installed:
VIBs Removed: NVIDIA_bootbank_NVIDIA-VMware_ESXi_6.5_Host_Driver_367.106-1OEM.650.0.0.4598673
VIBs Skipped: -
reboot
-
installed new driver NVIDIA-kepler-VMware_ESXi_6.5_Host_Driver 367.128-1OEM.650.0.0.4598673
-
reboot
But after that:
[root@VI:~] nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running
[root@VI:~] esxcli software vib list | grep -i nvidia
NVIDIA-kepler-VMware_ESXi_6.5_Host_Driver 367.128-1OEM.650.0.0.4598673 NVIDIA VMwareAccepted 2018-12-13
In vSphere client the "Graphics Adapter" has changed from "NVIDIA Tesla M60" to "GM204GL [Tesla M60]"
[root@VI:~] lspci -n | grep 10de
0000:89:00.0 Class 0300: 10de:13f2
0000:8a:00.0 Class 0300: 10de:13f2
This seems to show it the card is still in grapohics mode, IIRC.
Please help!
Kind regards
ZPPO