vGPU of Telsa T4 not seen on ESX 6.7

Hi

Ok, great. So you’re running the correct vGPU driver and you’re running the correct vSphere licensing. There’s no need to change anything there again. Forget about them and look at other areas.

The BIOS is a really important step. I’m not saying it’s the issue, but it could certainly cause an issue and does need to be set correctly if it isn’t already. Do you not have any out of band management for the Server hardware? Is there no way to remotely get to the BIOS on server boot? How do you remotely power it on? No remote management Console to watch it’s progress?

Regarding vCenter, first I want you to double check something …

1: Log into vCenter as administrator.
2: Locate the vSphere Host that has the T4 installed and select it.
3: In the centre, click Configure, under Hardware select PCI Devices.
4: In the tab that says Passthrough-enabled devices, make sure the T4 is not listed here.
5: If it is listed here, it must be removed. Click All PCI devices and Configure Passthrough. Scroll down and unselect the T4 (there are multiple T4 options in here, none of them should be selected).
6: Reboot the Host afterwards.
7: Once rebooted, using the above steps ensure that the T4 is no longer showing in the Passthrough-enabled devices.

Now follow these steps:

1: Log into vCenter as administrator (if you aren’t already).
2: Locate the vSphere Host that has the T4 installed and select it.
3: In the centre, click Configure, under Hardware select Graphics.
4: You now have 2 Tabs (Graphics Devices and Host Graphics), select Host Graphics.
5: Select the Edit tab to the right.
6: In the Window that opens there are 2 sets of 2 options, at the moment you’re only interested in the tops ones (Shared and Shared Direct). You must select Shared Direct.
7: Reboot the vSphere Host. Once it comes back up, using the steps above make sure that Shared Direct has been accepted and is still set.
8: Connect to the Host using SSH and run nvidia-smi vgpu. If vGPU is available, it will list the T4.

Try that and see how you get on

Regards

MG

Hi MG,

This project is in POC stage and then, unfortunetly, I don’t have the IPMI (supermicro) or IDRAC (HP) to remotely get the bios.

For the other point, I don’t have vcenter (I won’t be able to install it on my windows 10). Then I just can use the hypervisor web-UI and then, I didn’t find with that interface, the way to configure hardware PCI-device (that was what I said when I told about the shared direct; I’m not sure to have activated later…).

Is there a way to do so using the web-UI ? And if not, how can I install vCenter on my windows 10 ?

Thanx
Regards

John

apparently, vSphere client is no longer available for ESX later than 6.0 and I’m using 6.7 version…

only the web-UI version…

Hi

You need vCenter for vGPU.

I don’t mean any offence, but it sounds like you’ve not used VMware before, or at the very least are not familiar with it or its components. With that in mind, as you’ve mentioned this is urgent, I would strongly advise that you forget about using vGPU and just use the T4 in Passthrough to a single RDSH VM and give all users access to that. That’s the quickest way to build a usable VM and give multiple users a platform to work from (if that is your objective). You can get vCenter installed and look at vGPU after that’s done when your users are working.

Passthrough doesn’t need vCenter and you can configure that directly on the vSphere Host. If this is an acceptable alternative, then reverse the steps I mentioned above about checking for Passthrough-enabled devices. Please note, that the T4 will still require a license in Passthrough, so make sure you have the NVIDIA License Server up and running.

If you still want to go down the vGPU route, then you will need vCenter. In which case, deploy it on another physical Server in your environment. Download the vCenter .iso, this will then allow you to deploy a vCenter VSA to your other Server, but you’ll need to know how to configure it.

As you’re unfamiliar with VMware and you’ve stated this is a priority, I would advise you just run the T4 in Passthrough and go down that route for the time being, unless there is a specific reason for wanting vGPU in the current situation.

Regards

MG

Hi MG,

To be perfectly clear, the POC was already done but using a KVM hypervisor. Before to deploy it in production stage, the company-IT ask us to do the same but with VMWare. We have to create a windows 10 vm, using vGPU, give the result to IT-team which will configure it in term of security group compliance for the next deployment (we are planing to deploy 15 vm on theses 3 ESX server). As you say, I’m not familiar with vmware (and actualy our ESX experts have just litle knowledge on grid and vGPU, unfortunetely…).
But I don’t want to give up this project. The goal is to give the oportunity to our coleague to start specific applications (gaz industry) and to use kind of VNC (DCV actually) to see the screen on their computers.

I will try the vCenter way… Many thanx for your advises.

John

Hi, I already advised to check your vCenter settings in my first post. If you are not familiar with VMWare it is very hard to give you proper advise.

Hi

The reason I suggested using Passthrough for now, is because as you’re unfamiliar with VMware (which is no problem at all, there’s plenty of technologies I’m unfamiliar with) it will be the quickest way to give your users access to a GPU accelerated desktop and allow them to work, albeit from the same RDSH VM. While they’re working on that RDSH VM, you can then look at how you install and set up vCenter and then in the background, build the Windows 10 VMs that you’ll migrate them to, rather than give them nothing in the short term until you’ve resolved this issue.

This approach is the easiest and quickest way to bring up service :-)

When you say "gaz industry", which Applications are your users running? Petrel? Kingdom? …

Regards

MG

Petrel, Techlog, eclipse, gocad … (Petrel is a very graphical consuming !)

Hi

Very interesting …

Depending on how your testing goes, you may want to seriously consider looking at other GPUs. Although it may work, the T4 isn’t really suited to seismic interpretation or eclipse runs, and certainly not with multiple users running on it.

Personally, I’ve found Petrel to be CPU limited, so it’s worth making sure you have plenty of high speed CPU Cores for each VM as well, but I guess this depends on your workflow and which modules you’re using.

The minimum GPU I’d be looking at for these type of workloads would be a P40, then either an RTX 6000 or 8000 if you need more performance than that. If you’re doing more Computational processing than 3D, then a V100 / V100S may be a better option than an RTX (although a V100 / V100S will do a great job with 3D as well if needed, and an RTX will do a great job with Computational, but they do have their more specific use cases). Obviously once vCenter is sorted and you’re able to use vGPU, you can put multiple users on all of those so they’re not a 1:1 relationship.

How are you testing the performance? Seagull?

Regards

MG

Hi all,

Thanx for the advices M.G !

News:
I deployed a vCenter on my ESX-1 using a windows vm and, I’m now able to configure the Shared Direct and, I now access to the vGPU on my vms (!)

Many thanks to both of you !
All the Best
John

Hi.

Im running ESX 6.7 on QuantaGrid D52G-4U which is compatible with Tesla T4 installed on it.
I tried all the steps mentioned in the thread, but still can’t make it work properly.

Errors that I have:

Passthrough device 'pciPassthru0' vGPU 'grid_t4-1a' disallowed by vmkernel: Failure.
PCIPassthru: 3242: IOMMU support is not enabled

Theis is also missing PCI Devices menu item in the vCenter esx host Configure Tab.

Do you have any idea what could be wrong?

Hi

Not sure what you plan on using the 1A profile for?, but have you enabled SRIOV in the BIOS and enabled Above 4G decoding?

Regards

MG

Hi.

Yes, SRIOV and 4G decoding are enabled.

You need to enable IOMMU (AMD) or VT-d (Intel) in the BIOS; SRIOV is for something else.