Dell 730 with M60 - nvidia-smi throwing power error

technicalmt · July 19, 2016, 7:25pm

Hello,

3 brand new dell r730 with an m60 in each (factory shipped) each of them throwing the same error with nvidia-smi:

Unable to determine the device handle for GPU 0000:05:00.0: Unable to communicate with GPU because it is insufficiently powered.
This may be because not all required external power cables are attached, or the attached cables are not seated properly.

running latest esxi 3825889, applied Dell recommended BIOS settings as per http://www.nvidia.com/content/grid/pdf/grid-vgpu-deployment-guide.pdf

installed latest grid driver: 361.45.09-1OEM.600.0.0.2494585 from licensing portal

ran gpumodeswitch on all hosts to switch all gpu’s to graphics mode and confirmed with lspci -n | grep 10de:
0000:05:00.0 Class 0300: 10de:13f2 [vmgfx0]
0000:06:00.0 Class 0300: 10de:13f2 [vmgfx1]

A VM with vGPU starts on one host but not on another > Failed to start the virtual machine.
Module DevicePowerOn power on failed.
Could not initialize plugin ‘/usr/lib64/vmware/plugin/libnvidia-vgx.so’ for vGPU ‘grid_m60-0b’.
No graphics device is available for vGPU ‘grid_m60-0b’.

maybe it’s just a wiring issue, I will have to check that tomorrow when going on site

Thanks

JasonSouthernNV · July 19, 2016, 11:21pm

Check they’re cabled correctly.

The M60’s should have a 300W power connector, and not a standard 8 pin PCIe cable. I’m not sure if Dell has a specific cable, or whether they use the adapter cable that takes the feed from 2x PCIe cables to deliver 300W.

That or underpowered PSU’s are the likely cause.

RachelBerry · July 21, 2016, 11:17am

Jason is right the Dell R720 and R730 require a GPU enablement kit including power cables https://qrl.dell.com/Files/en-us/Html/Manuals/R730/GPU%20Card%20Installation%20Guidelines=GUID-C3605F65-C4AE-4BEB-9A32-907A90753B81=1=en-us=.html

I seem to recall it was something called a "8pin to 8pin+6pin" but this is one you need to go back to Dell on and check you have the right power supply and cables as per the GPU enablement kit.

technicalmt · July 22, 2016, 12:56pm

Hi Jason and Rachel,

Thanks for your replies,

It was a wiring issue, after connecting them as you instructed we don’t see the nvidia-smi issues anymore and we are now able to use vGPU on all hosts.

They came from dell direct, so strange how one host was wired correctly and others not.

Thanks,

J. Wirth

HereJohnny · August 10, 2016, 12:28pm

Hi @Technicalmt have you got any feedback on the specifics or a reference SR with Dell. I think I have one with a similar issue at the moment and am going back and forth with Dell support trying to resolve. Can you please provide and explanation or a photo of how the GPUs are cabled on a working config?

VM-Master · February 6, 2017, 4:48pm

Hi Everybody,

I have the same issue with brand new DELL R730 server and vSphere 6.0 U2.
Each Server has 2x Nvidia M60 factory installed. But only one of the server is able to see and use both gpu boards. The others show the message "Unable to communicate with GPU because it is insufficiently powered.".
I checked the BIOS and software components. I couldn’t find any difference. Please help. It’s for me actually not possible to check the cables inside the servers.

Regards,
VM_master

Arankaspar1 · January 28, 2020, 7:11pm

Both power supplies need to be working together (not redundant) wish I knew this before buying my r730xd. I’m running a K80 on esx6.7 2012vm. I would get purple screen of death about 30 min after starting the VM and my vendor explained this power caveat.

scp096 · November 18, 2020, 3:28pm

Thanks for the sharing scp 096