[SOLVED] M10 with ESXi 6.5 - vGPU: Device not supported

Hello All,

at the moment we are evaluating the vGPU feature for a customer to get a 3D-CAD-VDI-Environment up an running.
But for now I’m stuck with the basic installtion/configuration of the NVIDIA-driver.

I installed the newest version (NVIDIA-VMware_ESXi_6.5_Host_Driver_384.73-1OEM.650.0.0.4598673.vib) sucessfully.
Also the output of ‘nvidia-smi’ looks quite good:

±----------------------------------------------------------------------------+
| NVIDIA-SMI 384.73 Driver Version: 384.73 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M10 On | 00000000:0A:00.0 Off | N/A |
| N/A 38C P8 10W / 53W | 18MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla M10 On | 00000000:0B:00.0 Off | N/A |
| N/A 40C P8 10W / 53W | 18MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla M10 On | 00000000:0C:00.0 Off | N/A |
| N/A 33C P8 10W / 53W | 18MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla M10 On | 00000000:0D:00.0 Off | N/A |
| N/A 35C P8 10W / 53W | 18MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 68391 G Xorg 4MiB |
| 1 68412 G Xorg 4MiB |
| 2 68428 G Xorg 4MiB |
| 3 68446 G Xorg 4MiB |
±----------------------------------------------------------------------------+

But when I try to run a VM with a vGPU assigned to, it won’t start.
After some research it seems that the M10 is not supported:

[root@HV04:~] nvidia-smi vgpu
#0, Device not supported
#1, Device not supported
#2, Device not supported
#3, Device not supported
Not supported on the device(s)

So what can I do now? Can somebody please help or give a hint?

Cheers
Benjamin

Hi

Is it a brand new M10?

If yes, have you changed the GPU from "Compute Mode" to "Graphics Mode"?

Regards

Hi,

thanks for your reply.
Yes it is a brand new M10.

I haven’t checked for the GPU-Mode, because every documentation just mentions this for M60 and M6, not for M10.
But now I’ve tried it, with no success:

-----------Begin cli output-----------
[root@HV04:~] gpumodeswitch --listgpumodes

NVIDIA GPU Mode Switch Utility Version 1.23.0
Copyright (C) 2015, NVIDIA Corporation. All Rights Reserved.

ERROR: Read card info failed by using character device based.

[root@HV04:~] gpumodeswitch --gpumode graphics --auto

NVIDIA GPU Mode Switch Utility Version 1.23.0
Copyright (C) 2015, NVIDIA Corporation. All Rights Reserved.

ERROR: Read card info failed by using character device based.
-----------End cli output-----------

But, because I had to remove the Host_Driver Package to use the gpuswitch-tool, I reinstalled it afterwards.
Then I tested it BEFORE a reboot:

-----------Begin cli output-----------

[root@HV04:~] nvidia-smi
Thu Oct 12 09:43:54 2017
±----------------------------------------------------------------------------+
| NVIDIA-SMI 384.73 Driver Version: 384.73 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M10 On | 00000000:0A:00.0 Off | N/A |
| N/A 37C P8 10W / 53W | 18MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla M10 On | 00000000:0B:00.0 Off | N/A |
| N/A 38C P8 10W / 53W | 18MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla M10 On | 00000000:0C:00.0 Off | N/A |
| N/A 33C P8 10W / 53W | 18MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla M10 Off | 00000000:0D:00.0 Off | N/A |
| N/A 35C P8 10W / 53W | 18MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 68925 G Xorg 4MiB |
| 1 68945 G Xorg 4MiB |
| 2 68965 G Xorg 4MiB |
±----------------------------------------------------------------------------+
[root@HV04:~] nvidia-smi vgpu
#0, Device not supported
#1, Device not supported
#2, Device not supported
Thu Oct 12 09:44:00 2017
±----------------------------------------------------------------------------+
| NVIDIA-SMI 384.73 Driver Version: 384.73 |
|-------------------------------±-------------------------------±-----------+
| GPU Name | Bus-Id | GPU-Util |
| vGPU ID Name | VM ID VM Name | vGPU-Util |
|===============================+================================+============|
| 3 Tesla M10 | 00000000:0D:00.0 | 0% |
±------------------------------±-------------------------------±-----------+

[root@HV04:~] nvidia-smi vgpu -s
#0, Device not supported
#1, Device not supported
#2, Device not supported
GPU 00000000:0D:00.0
GRID M10-0B
GRID M10-0Q
GRID M10-1A
GRID M10-1B
GRID M10-1Q
GRID M10-2A
GRID M10-2Q
GRID M10-4A
GRID M10-4Q
GRID M10-8A
GRID M10-8Q
-----------End cli output-----------

It seems that now one GPU-Core is running fine.
But after reboot everything is back as it was before - all of the GPUs are in "not supported"-state.
I have got the impression, that it has something to do with the Xorg-Process.

BR
Benjamin

Just use the .iso and boot the server from that? No need to remove any .vibs then.

So 1 of the GPUs is now working and the other 3 aren’t? … Try running the changemode utility again and verify all 4 GPUs using the utility afterwards.

Regards

Hi,

it seems not related to the GPU-mode-settings by the cli-tool "gpumodeswitch".

Because of the last steps/results I change my google-search and found quit a helpful Post (by User "jmain"):
https://gridforums.nvidia.com/default/topic/1030/nvidia-virtual-gpu-technology/nvidia-vmware-vsphere-6-5/post/3713/#3713

After changing the Setting in vCenter (flash-version!), everything is up and running now.

Thanks for your Help!

BR
Benjamin

PS @Nvidia
Please update your documentation (as told nearly a year ago).

@Ben:

There is no GPUModeSwitch for M10. It is clearly documented that this is not necessary for M10 as this is pure graphics board. BTW, even Tesla M60 is delivered in Graphics mode for more than 1 year now so that it shouldn’t be necessary to use GPUmodeSwitch at all.

@Benjamin:
Please let me know which documentation you think is not accurate and I will trigger the right people to update.
When I check as example our quick start guide it is documented correctly that GPUmodeSwitch is only valid for M6 and M60…

Regards

Simon

Yes, I was referring to the quick-start-guide, and yes, you are right, in there the information about GPUmodeSwitch is correct.
But this was not the problem nor the solution and not what I meant.

Read this post by ‘jmain’ (already mentioned it before):
https://gridforums.nvidia.com/default/topic/1030/nvidia-virtual-gpu-technology/nvidia-vmware-vsphere-6-5/post/3713/#3713
There he says, that since an update (especially) for ESXi 6.5, you have to change some GPU settings:

So this part should be added to the quick-start-guide.
Hope I expressed myself a little bit better this time (sorry for that, I’m not a native English speaker).

BR
Benjamin

Hi Benjamin,

thanks for your comments and clarification.
I will ask if this is something that should be in the quick start guide. We have this documented in the user guide so I don’t think there is a need to add this also in the quick start guide.

BTW: If you are not aware yet we have this new docu page:

Best regards

Simon