Supported Servers with Tesla M60 & ESXi 6.0 (Dell PowerEdge R720xd)

JRR1 · December 30, 2015, 10:38pm

I want to confirm or find a list of supported servers for the Tesla M60 card with ESXi 6.0. I’ve installed on a Dell PowerEdge R720 the 352.54 vib, and am getting the "nvidia-smi has failed because it couldn’t communicate with the nvidia driver. make sure that the latest nvidia driver is installed and running." message. I’ve validated that using vmkload_mod - l that the driver appears to not be starting either. I’ve checked dmesg, but am unsure as to what to look for to indicate any error, but am not seeing much. I’ve also checked to see if the card shows up using "lspci | grep -i vga" and other variations, but do not see the card, which is why I am suspecting that an R720 will not work.

The driver installs fine, and I’ve done the necessary steps of Maintenance mode, install, reboot, and exit maintenance mode, now multiple times to no avail. I have an R730, that I can try next, but I want to validate that it is worth the effort first.

I’m hoping it’s a misconfiguration, and that I’ve missed a cable connection in the chassis, as I really would like to get this 720 to work. Please let me know.

JasonSouthernNV · December 31, 2015, 10:15am

You can filter the list of certified servers by card type.

R720 is not certified by Dell, R730 is, as long as you have the relevant psu’s, power cable etc. Best to check with Dell on the specific requirements to retrofit the card.

JRR1 · December 31, 2015, 3:44pm

Jason,

  Thank you for that.  That is extremely helpful.  

  Now for the next part, is there a difference in supported R730's?  i.e. an R730xd versus an R730?  You have listed an R730, and I can get my hands on one of those in the future, but I have an R730xd now, that appears to be exhibiting the same behavior.  Would some form of logs be useful?

JasonSouthernNV · January 1, 2016, 11:39am

You should check with Dell. It’s possible that it’s simply a BIOS issue or may well not be supported in the xd chassis.

Do you have the enablement kit for the R730 including power cables and the relevant PSU’s?

JRR1 · January 4, 2016, 8:04pm

Could you direct me to information on the "enablement kit" ?

JasonSouthernNV · January 5, 2016, 9:03am

You need to speak to Dell.

Most servers don’t ship with the required PSU, PCIe risers, cables etc and some may require modified heatsinks or airflow baffles. Each OEM has a different set of additional components which may be required for retrofit. In some cases it’s just a power cable, in others it’s a complete set of PSU’s, risers, baffles, heatsinks and cables so depending on what you already have, depends on what you need to acquire.

The OEM (in your case Dell) are the best people to ask for the details of what you require.

DEL · May 17, 2016, 11:20am

JRR,

Did you have any success in getting the M60 to work on the R720xd? I have a R720 and I am experiencing the same problem.

Thanks

David

DEL · May 17, 2016, 11:21am

JRR,

Did you have any success in getting the M60 to work on the R720xd? I have a R720 and I am experiencing the same problem.

Thanks

David

RachelBerry · May 17, 2016, 2:44pm

I know for the R720xd Dell chose not to certify whereas they did the R720 - the R720xd has some extra room for storage which makes everything else a bit more squashed and affected the thermal cooling iirc…

For anyone with a new M60 - I would storngly advise checkign it is in Graphics and not compute mode as per: Having problems with new M6/M60 like VMs fail to power on, NVRM BAR1 error, ECC is enabled, or nvidia-smi fails | NVIDIA

You might want to search the KB database for other reasons nvidia-smi fails: Incorrect BIOS settings on a server when used with a hypervisor can cause MMIO address issues that result in GRID GPUs failing to be recognized. | NVIDIA

BUT as you are dealing with a possibly unsupported server I think as JAson suggests you really need to talk to DELL and your hypervisor vendor as you could be left unsupported even if it works.

DEL · May 17, 2016, 3:07pm

Hi Rachel,

I was able to install the gpumodeswitch vib in my ESX6.0u2 host and I was able to successfully change the mode over to graphics. Oddly the gpumodeswitch is able to see the cards. After the mode was changed I installed the software, rebooted the system and receive nothing when I run "vmkload_mod -l | grep nvidia"

David

RachelBerry · May 17, 2016, 3:20pm

As it is uncertified I think you need to go back to the server OEM and talk to them. I’m afraid with uncertified configurations this can happen.

segreen · May 22, 2016, 12:09am

Hi All,

I have two questions. I was able to install the GRID 3.0 VIB into ESXi 6.0 U2, (NVIDIA-vGPU-VMware_ESXi_6.0_Host_Driver_361.40-1OEM.600.0.0.2494585.vib) no issue and everything came up properly. However after the installation, the guide that I should use the gpumodeswitch to switch modes.

Interestingly the instruction in the switchmode doc said to remove any NVIDIA drivers - which was bit weird. But I did that.

I tried to install the modeswitch vib (NVIDIA-GpuModeSwitch-1OEM.600.0.0.2494585.x86_64.vib) and it gave me an InstallationError, saying the vib does not contain a signature. I lowered the acceptance level to community supported, but no luck to install.

any thoughts.

Thanks
Segreen

JasonSouthernNV · May 22, 2016, 8:45am

Did you follow the process in the documentation exactly?

Put the ESXi host into maintenance mode.

vim-cmd hostsvc/maintenance_mode_enter

If an NVIDIA driver is already installed on the ESXi host, remove the driver.
a) Get the name of the VIB package that contains the NVIDIA driver.

esxcli software vib list | grep -i nvidia

b) Remove the VIB package that contains the NVIDIA driver.

esxcli software vib remove -n NVIDIA-driver-package

NVIDIA-driver-package is the VIB package name that you got in the previous step.
3. Run the esxcli command to install the VIB.

esxcli software vib install -v /NVIDIA-GpuModeSwitch-1OEM.600.0.0.2494585.x86_64.vib

Take the host out of maintenance mode.

vim-cmd hostsvc/maintenance_mode_exit

Reboot the ESXi host.

There are several versions of the modeswitch utility and the .vib version does require the removal of the vGPU Manager.

I personally would not recommend using the .vib version unless you are unable to use the bootable iso tool using the host remote managment software. The reason being that the ISO is a much simpler tool to work with and only requires 2 reboots, one to start the ISO and one to switch back to the hypervisor.

Using the .vib you need to

remove vGPU manager
restart
install mode switch .vib
restart
switch mode
restart
remove modeswitch .vib
restart
install vGPU .vib
restart

That’s 4 extra restarts to allow you to stay within ESXi. I find it so much faster to simply boot to the ISO.

Topic		Replies	Views
Dell R730 with Tesla M60 on XenServer 7.0 unexpectedly reboot when a few VMs with vGPU are started NVIDIA Virtual GPU Technology	31	39626	February 24, 2017
Dell 730 with M60 - nvidia-smi throwing power error OEM Resources dell	7	22097	November 18, 2020
ESXi 6.5 + Tesla M60 - Not working anymore after driver update NVIDIA Virtual GPU Drivers	3	8584	December 21, 2018
GRID 3.0 Successfully installs on ESXI 6.0.2 with M60 GPU but fails to verify via nvidia-smi NVIDIA Virtual GPU Drivers	16	46068	May 11, 2016
Centos 7.7 Installation Tesla v100 graphics card driver failed Linux	18	1424	October 12, 2021
Unable to load driver for Tesla M60 - HP Proliant DL580 gen 9 Linux	1	325	November 17, 2023
Dell R740 with Tesla M10: nvidia-smi Failed to initialize NVML: Unknown Error OEM Resources dell	3	3100	October 12, 2021
Grid K2 on ESXi 6.0.0 NVIDIA Virtual GPU Drivers	2	10396	December 30, 2015
Help !!! Nvidia Tesla M60 PassThought NVIDIA Virtual GPU Drivers	1	3180	August 24, 2018
M60 on ESXi: No Profiles General Discussion	4	9359	February 26, 2021

Supported Servers with Tesla M60 & ESXi 6.0 (Dell PowerEdge R720xd)

vim-cmd hostsvc/maintenance_mode_enter

esxcli software vib list | grep -i nvidia

esxcli software vib remove -n NVIDIA-driver-package

esxcli software vib install -v /NVIDIA-GpuModeSwitch-1OEM.600.0.0.2494585.x86_64.vib

vim-cmd hostsvc/maintenance_mode_exit

Related topics