Hoping somebody can advise or point me in the right direction at least.
I am trying to use 2 M60 cards in Compute mode with Vsphere 6.5, and having issues with the VMs.
I have installed the gpumodeswitch vib and successfully changed the mode to compute and all the tests come back ok.
if I then install the Nvidia drivers VIB I cant boot the VM’s i get an error saying Device cannot Power on.
if I don’t install the vibs or change the gpumode I can successfully boot the VM and can then change the mode in windows but I cant get any of the CUDA tools to recognise the card.
This is my first experience with the Grid cards so would appreciate any advice, my goodle serahcs find plenty of information about using the cards in graphics mode but I’m struggling a little with compute
Compute mode is Passthrough only with the M60. You don’t need a .vib in the Hypervisor for it. Just change the mode and attach it as a normal PCIe device.
Also, there’s a gpumodeswitch.iso that comes with the driver download. Boot your server directly from that and use it to change the GPU mode, much easier than installing the gpumodeswitch.vib ;-)
Thanks for the reply, I have tried it without the vib too and am getting the same problem at the minute.
it seems as soon as I switch the cards into compute I experience the issues with the machines booting
so as it stands I have the 2 cards in the esxi host, both in compute mode, no vibs installed, I make the PCI devices available to the machine and I get the error "Module ‘DevicePowerOn’ power on failure
So you specifically set both M60s (4 individual GPUs) to Passthrough mode within ESXi (you know how to configure PCIe devices into Passthrough mode in ESXi, yes?) and rebooted the host? Then checked to make sure they’re available as Passthrough devices, then attached all 4 of them to the VM?
It should just work. There’s no additional configuration involved other passing the GPUs through and attaching them to the VM.
I would assume this depends on the server hardware. In compute mode the GPU has a 8GB BAR1 size and some hardware doesn’t work with the huge BAR size or needs BIOS update. You didn’t mention the hardware vendor. I have seen this for example with Supermicro servers.