GRID 3.0 Successfully installs on ESXI 6.0.2 with M60 GPU but fails to verify via nvidia-smi

We have a clean SuperMicro server and installed VMWare ESXI 6.0.2 build 360759. Entered “Maintenance Mode”, then followed the steps and installed the NVDIA Host Driver from the latest guide docs, rebooted and turned off “Maintenance Mode”. Once it came back up and we ssh’d in to verify it via various commands they all verified except when using the nvidia-smi command which returns: Failed to initialize NVML: Unknown Error

NOTE: The same hardware worked properly in GRID 2.0

Hardware/Software list:

Supermicro Chassis 1028GQ-TRT
Dual Xeon E5-2600v3 2.60
256 gib memory
(4) NVIDIA M-60 cards installed and in graphics mode
(2) 480 gig SSD drives
ESXI 6.0.2 build 360759
NVIDIA-vGPU-VMware_ESXi_6.0_Host_Driver_361.40-1OEM.600.0.0.2494585.vib

Steps taken to instal:

•	Installed ESXI 6 on a clean system
•	Enabled SSH
•	No vm’s or Datastores setup yet
•	Setup clock on server:  ntp.org.pool via vSphere
•	Checked off under Configuration/Software/Advanced Settings/VMkernel/Boot: ”VMkernel.Boot.disableACSCheck

" and clicked "OK"
• Entered Maintenance Mode
• Downloaded Grid software from NVIDIA License center under Recent Product Releases from this link:
https://nvidia.flexnetoperations.com/control/nvda/viewRecentProductReleases
• Grabbed the April 4th release of Grid 3.0 for vSphere 6.0
• Copied: NVIDIA-vGPU-VMware_ESXi_6.0_Host_Driver_361.40-1OEM.600.0.0.2494585.vib to tmp folder
• SSH’d into server
• Ran: esxcli software vib install -v /tmp/NVIDIA-vGPU-VMware_ESXi_6.0_Host_Driver_361.40-1OEM.600.0.0.2494585.vib

Result:

Installation Result
Message: Operation finished successfully.
Reboot Required: false
VIBs Installed: NVIDIA_bootbank_NVIDIA-vGPU-VMware_ESXi_6.0_Host_Driver_361.40-1OEM.600.0.0.2494585
VIBs Removed:
VIBs Skipped:

?	Rebooted Server and turned off Maintenance mode
?	SSH'd into server and verified install

Verify Results:

[root@localhost:~] esxcli software vib list | grep -i nvidia

NVIDIA-vGPU-VMware_ESXi_6.0_Host_Driver 361.40-1OEM.600.0.0.2494585 NVIDIA VMwareAccepted 2016-05-04

[root@localhost:~] vmkload_mod -l | grep nvidia

nvidia 0 10012

[root@localhost:~]esxcfg-module -l | grep nvidia

nvidia 0 10012

[root@localhost:~] nvidia-smi

Failed to initialize NVML: Unknown Error

No passthru’s setup on ESXI it’s just a clean server with no vm’s or datastores

If we continue and go forward try to create VM’s and setup via vCenter to add the M60 card, none of the profiles are listed.

We’ve gone through this on 2 other servers with different CPU’s but the same software and M60 cards several times with the exact same results.

Please help.

Thanks! Alex

Hi Alex,

I’m afraid I’m not a VMware expert myself. But I’m checking for known issues with the support and product teams. You are entitled to full support with M60 and GRID 3.0 - have you raised a support case yet?

Best wishes,
Rachel

No I haven’t raised a support case. Should I go ahead and create one?

Yes that would be good, raise a support case and pm (personal message) me the number and I’ll keep an eye on it. One of our engineers is already looking into this. Please add my name to the ticket so frontline don’t have to chase info and I’ll fil them in.

Rachel

At the ESXi host CLI please run

lspci –n | grep 10de

then post the result here.

Hi Jason, here are the results

[root@localhost:~] lspci -n | grep 10de
0000:04:00.0 Class 0300: 10de:13f2 [vmgfx6]
0000:05:00.0 Class 0300: 10de:13f2 [vmgfx7]
0000:08:00.0 Class 0300: 10de:13f2 [vmgfx4]
0000:09:00.0 Class 0300: 10de:13f2 [vmgfx5]
0000:83:00.0 Class 0300: 10de:13f2 [vmgfx2]
0000:84:00.0 Class 0300: 10de:13f2 [vmgfx3]
0000:87:00.0 Class 0300: 10de:13f2 [vmgfx0]
0000:88:00.0 Class 0300: 10de:13f2 [vmgfx1]

were the GPU’s factory fitted?

These GPU’s were provided from NVIDIA directly as we are an NVIDIA partner and we installed them ourselves. The same GPU’s just successfully completed the NVQUAL on this server.

This shows the M60 / M6 GPU is correctly set in graphics mode - anyone else experiencing M60 / M6 issues with can double-check this easily, following this advice http://nvidia.custhelp.com/app/answers/detail/a_id/4106/. I’m afraid I haven’t got any suggestions for this case though.

More things I’ve tried

I took out 3 of the 4 GPUs and disabled "Above 4G Decoding" in the SuperMicro BIOS.

(1) GPU and “Above 4g Decoding” disabled and ran nvidia-smi to properly return:

[root@localhost:~] nvidia-smi
Thu May 5 19:18:01 2016
±-----------------------------------------------------+
| NVIDIA-SMI 361.40 Driver Version: 361.40 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 0000:83:00.0 Off | Off |
| N/A 38C P8 24W / 150W | 19MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla M60 On | 0000:84:00.0 Off | Off |
| N/A 33C P8 24W / 150W | 19MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±--------------------------------------

I then enabled “Above 4g Decoding” and left the (1) card and got the: “Failed to initialize NVML: Unknown Error” when running nvidia-smi

I installed a 2nd GPU and disabled “Above 4g Decoding” and booted and get the BIOS ERROR: “Insufficient PCI Resources Detected” So it won’t boot unless the “Above 4g Decoding” is enabled with more than (1) GPU installed.

So I rebooted with “Above 4g Decoding” enabled and tried the nvidia-smi and got the same ”Failed to initialize NVML: Unknown Error”

Basically it’s only working with only (1) card installed and “Above 4g Decoding” disabled.

https://gridforums.nvidia.com/default/topic/526/necessary-to-disable-quot-above-4g-decoding-quot-for-view-with-vgpu-/ ?
https://gridforums.nvidia.com/default/topic/546/nvidia-grid-vgpu/mmio-above-4-gb-esxi-6-0u1-vgpu/ ?
( http://www.supermicro.com/support/faqs/faq.cfm?faq=20016 ? )

NVQUAL: For NVIDIA vGPU application, the GPUs should be mapped below the 4GB address space (BAR1<32b).
JS: In ESXi you have to have MMIO set to below 4G. The VMware article is correct. Although ESXi is a 64bit hypervisor it still has this restriction.

Are these statements still valid today ?

You can probably use GPU as passthrough (vDGA) with enabled “Above 4g Decoding” ( https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2139299 ).

Yes ESXi still requires below 4G.

Now we’re getting somewhere.

Are there any other PCI devices installed?

Are you at the latest BIOS revision for the chassis?

Have you checked the VMware HCL for this configuration?

Do you have the means to try a hypervisor that is not limited to the below 4G decoding limit? (XenServer is the easiest).

This requirements leads to only few cards can be enabled (eg. one card needs >2x 128MB+16MB+32MB). The 512MB-1.5GB of mmio hole is standard but I do not see any options to configure mmio hole size in bios. My supermicro X9DR3-F has 2GB mmio hole and I cannot add more than 1-2 cards (under Xen dom0 and disabled "Above 4g Decoding"). I use XenServer (>=6.5) and it works fine with enabled "Above 4g Decoding".

Try to analyze this programs output for memory PCI BAR mappings if available in ESXi:

# lspci -vv | egrep "^[a-f0-9]|Memory at"
# cat /proc/iomem
# dmesg | grep "available for PCI devices"
# dmesg | grep "pci_bus"
# dmesg | grep "\[mem "

After hearing back from SuperMicro and doing some tweaking here we now have 2 GPUs working. They suggested we change the MMIOHBase setting to “2T”. The setting is located under Advanced PCIe/PCI/PnP Configuration. We also changed the following in the BIOS:
https://dl.dropboxusercontent.com/u/4009063/PCIe_PCI_PnP_Part_2%20Changes.jpg

The above BIOS settings work with 3 cards but when you add the 4th card then only 3 are recognized. I tested the one card itself and it works by itself with all the others removed. I just can’t get all 4 to be seen via nvidia-smi and lspci -n | grep 10de

[root@localhost:~] lspci -n | grep 10de
0000:04:00.0 Class 0300: 10de:13f2 [vmgfx4]
0000:05:00.0 Class 0300: 10de:13f2 [vmgfx5]
0000:83:00.0 Class 0300: 10de:13f2 [vmgfx2]
0000:84:00.0 Class 0300: 10de:13f2 [vmgfx3]
0000:87:00.0 Class 0300: 10de:13f2 [vmgfx0]
0000:88:00.0 Class 0300: 10de:13f2 [vmgfx1]

More fun stuff to figure out.

I think that the settings leads to activate "64bit" PCI mmio BAR addressing starting from 2TB (eg. bellow 4TB 32bit address limit). The first few cards get lucky and mmio BAR is under 4TB 32bit limit and visible to kernel/driver/nvidia-smi but 4th card is beyond this limit and invisible. Can you try and study memory assignments of mmio BAR of your cards (10de:13f2) with following command ?

# lspci -nvv | egrep "^[a-f0-9]|Memory at"

Here’s what I get with the 4 cards installed:

[root@localhost:~] lspci -nvv | egrep "^[a-f0-9]|Memory at"
0000:00:00.0 Class 0600: 8086:2f00 [PCIe RP[0000:00:00.0]]
0000:00:01.0 Class 0604: 8086:2f02 [PCIe RP[0000:00:01.0]]
0000:00:02.0 Class 0604: 8086:2f04 [PCIe RP[0000:00:02.0]]
0000:00:03.0 Class 0604: 8086:2f08 [PCIe RP[0000:00:03.0]]
0000:00:04.0 Class 0880: 8086:2f20 
0000:00:04.1 Class 0880: 8086:2f21 
0000:00:04.2 Class 0880: 8086:2f22 
0000:00:04.3 Class 0880: 8086:2f23 
0000:00:04.4 Class 0880: 8086:2f24 
0000:00:04.5 Class 0880: 8086:2f25 
0000:00:04.6 Class 0880: 8086:2f26 
0000:00:04.7 Class 0880: 8086:2f27 
0000:00:05.0 Class 0880: 8086:2f28 
0000:00:05.1 Class 0880: 8086:2f29 
0000:00:05.2 Class 0880: 8086:2f2a 
0000:00:05.4 Class 0800: 8086:2f2c 
0000:00:11.0 Class ff00: 8086:8d7c 
0000:00:11.4 Class 0106: 8086:8d62 [vmhba0]
0000:00:14.0 Class 0c03: 8086:8d31 
0000:00:16.0 Class 0780: 8086:8d3a 
0000:00:16.1 Class 0780: 8086:8d3b 
0000:00:1a.0 Class 0c03: 8086:8d2d 
0000:00:1c.0 Class 0604: 8086:8d10 [PCIe RP[0000:00:1c.0]]
0000:00:1c.4 Class 0604: 8086:8d18 [PCIe RP[0000:00:1c.4]]
0000:00:1d.0 Class 0c03: 8086:8d26 
0000:00:1f.0 Class 0601: 8086:8d44 
0000:00:1f.2 Class 0106: 8086:8d02 [vmhba1]
0000:00:1f.3 Class 0c05: 8086:8d22 
0000:02:00.0 Class 0604: 10b5:8747 
0000:03:08.0 Class 0604: 10b5:8747 
0000:03:10.0 Class 0604: 10b5:8747 
0000:04:00.0 Class 0300: 10de:13f2 [vmgfx4]
0000:05:00.0 Class 0300: 10de:13f2 [vmgfx5]
0000:07:00.0 Class 0200: 8086:1528 [vmnic0]
0000:07:00.1 Class 0200: 8086:1528 [vmnic1]
0000:08:00.0 Class 0604: 1a03:1150 
0000:09:00.0 Class 0300: 1a03:2000 
0000:7f:08.0 Class 0880: 8086:2f80 
0000:7f:08.2 Class 1101: 8086:2f32 
0000:7f:08.3 Class 0880: 8086:2f83 
0000:7f:09.0 Class 0880: 8086:2f90 
0000:7f:09.2 Class 1101: 8086:2f33 
0000:7f:09.3 Class 0880: 8086:2f93 
0000:7f:0b.0 Class 0880: 8086:2f81 
0000:7f:0b.1 Class 1101: 8086:2f36 
0000:7f:0b.2 Class 1101: 8086:2f37 
0000:7f:0c.0 Class 0880: 8086:2fe0 
0000:7f:0c.1 Class 0880: 8086:2fe1 
0000:7f:0c.2 Class 0880: 8086:2fe2 
0000:7f:0c.3 Class 0880: 8086:2fe3 
0000:7f:0c.4 Class 0880: 8086:2fe4 
0000:7f:0c.5 Class 0880: 8086:2fe5 
0000:7f:0c.6 Class 0880: 8086:2fe6 
0000:7f:0c.7 Class 0880: 8086:2fe7 
0000:7f:0d.0 Class 0880: 8086:2fe8 
0000:7f:0d.1 Class 0880: 8086:2fe9 
0000:7f:0d.2 Class 0880: 8086:2fea 
0000:7f:0d.3 Class 0880: 8086:2feb 
0000:7f:0d.4 Class 0880: 8086:2fec 
0000:7f:0d.5 Class 0880: 8086:2fed 
0000:7f:0f.0 Class 0880: 8086:2ff8 
0000:7f:0f.1 Class 0880: 8086:2ff9 
0000:7f:0f.2 Class 0880: 8086:2ffa 
0000:7f:0f.3 Class 0880: 8086:2ffb 
0000:7f:0f.4 Class 0880: 8086:2ffc 
0000:7f:0f.5 Class 0880: 8086:2ffd 
0000:7f:0f.6 Class 0880: 8086:2ffe 
0000:7f:10.0 Class 0880: 8086:2f1d 
0000:7f:10.1 Class 1101: 8086:2f34 
0000:7f:10.5 Class 0880: 8086:2f1e 
0000:7f:10.6 Class 1101: 8086:2f7d 
0000:7f:10.7 Class 0880: 8086:2f1f 
0000:7f:12.0 Class 0880: 8086:2fa0 
0000:7f:12.1 Class 1101: 8086:2f30 
0000:7f:12.4 Class 0880: 8086:2f60 
0000:7f:12.5 Class 1101: 8086:2f38 
0000:7f:13.0 Class 0880: 8086:2fa8 
0000:7f:13.1 Class 0880: 8086:2f71 
0000:7f:13.2 Class 0880: 8086:2faa 
0000:7f:13.3 Class 0880: 8086:2fab 
0000:7f:13.6 Class 0880: 8086:2fae 
0000:7f:13.7 Class 0880: 8086:2faf 
0000:7f:14.0 Class 0880: 8086:2fb0 
0000:7f:14.1 Class 0880: 8086:2fb1 
0000:7f:14.2 Class 0880: 8086:2fb2 
0000:7f:14.3 Class 0880: 8086:2fb3 
0000:7f:14.4 Class 0880: 8086:2fbc 
0000:7f:14.5 Class 0880: 8086:2fbd 
0000:7f:14.6 Class 0880: 8086:2fbe 
0000:7f:14.7 Class 0880: 8086:2fbf 
0000:7f:16.0 Class 0880: 8086:2f68 
0000:7f:16.1 Class 0880: 8086:2f79 
0000:7f:16.2 Class 0880: 8086:2f6a 
0000:7f:16.3 Class 0880: 8086:2f6b 
0000:7f:16.6 Class 0880: 8086:2f6e 
0000:7f:16.7 Class 0880: 8086:2f6f 
0000:7f:17.0 Class 0880: 8086:2fd0 
0000:7f:17.1 Class 0880: 8086:2fd1 
0000:7f:17.2 Class 0880: 8086:2fd2 
0000:7f:17.3 Class 0880: 8086:2fd3 
0000:7f:17.4 Class 0880: 8086:2fb8 
0000:7f:17.5 Class 0880: 8086:2fb9 
0000:7f:17.6 Class 0880: 8086:2fba 
0000:7f:17.7 Class 0880: 8086:2fbb 
0000:7f:1e.0 Class 0880: 8086:2f98 
0000:7f:1e.1 Class 0880: 8086:2f99 
0000:7f:1e.2 Class 0880: 8086:2f9a 
0000:7f:1e.3 Class 0880: 8086:2fc0 
0000:7f:1e.4 Class 0880: 8086:2f9c 
0000:7f:1f.0 Class 0880: 8086:2f88 
0000:7f:1f.2 Class 0880: 8086:2f8a 
0000:80:02.0 Class 0604: 8086:2f04 [PCIe RP[0000:80:02.0]]
0000:80:03.0 Class 0604: 8086:2f08 [PCIe RP[0000:80:03.0]]
0000:80:04.0 Class 0880: 8086:2f20 
0000:80:04.1 Class 0880: 8086:2f21 
0000:80:04.2 Class 0880: 8086:2f22 
0000:80:04.3 Class 0880: 8086:2f23 
0000:80:04.4 Class 0880: 8086:2f24 
0000:80:04.5 Class 0880: 8086:2f25 
0000:80:04.6 Class 0880: 8086:2f26 
0000:80:04.7 Class 0880: 8086:2f27 
0000:80:05.0 Class 0880: 8086:2f28 
0000:80:05.1 Class 0880: 8086:2f29 
0000:80:05.2 Class 0880: 8086:2f2a 
0000:80:05.4 Class 0800: 8086:2f2c 
0000:81:00.0 Class 0604: 10b5:8747 
0000:82:08.0 Class 0604: 10b5:8747 
0000:82:10.0 Class 0604: 10b5:8747 
0000:83:00.0 Class 0300: 10de:13f2 [vmgfx2]
0000:84:00.0 Class 0300: 10de:13f2 [vmgfx3]
0000:85:00.0 Class 0604: 10b5:8747 
0000:86:08.0 Class 0604: 10b5:8747 
0000:86:10.0 Class 0604: 10b5:8747 
0000:87:00.0 Class 0300: 10de:13f2 [vmgfx0]
0000:88:00.0 Class 0300: 10de:13f2 [vmgfx1]
0000:ff:08.0 Class 0880: 8086:2f80 
0000:ff:08.2 Class 1101: 8086:2f32 
0000:ff:08.3 Class 0880: 8086:2f83 
0000:ff:09.0 Class 0880: 8086:2f90 
0000:ff:09.2 Class 1101: 8086:2f33 
0000:ff:09.3 Class 0880: 8086:2f93 
0000:ff:0b.0 Class 0880: 8086:2f81 
0000:ff:0b.1 Class 1101: 8086:2f36 
0000:ff:0b.2 Class 1101: 8086:2f37 
0000:ff:0c.0 Class 0880: 8086:2fe0 
0000:ff:0c.1 Class 0880: 8086:2fe1 
0000:ff:0c.2 Class 0880: 8086:2fe2 
0000:ff:0c.3 Class 0880: 8086:2fe3 
0000:ff:0c.4 Class 0880: 8086:2fe4 
0000:ff:0c.5 Class 0880: 8086:2fe5 
0000:ff:0c.6 Class 0880: 8086:2fe6 
0000:ff:0c.7 Class 0880: 8086:2fe7 
0000:ff:0d.0 Class 0880: 8086:2fe8 
0000:ff:0d.1 Class 0880: 8086:2fe9 
0000:ff:0d.2 Class 0880: 8086:2fea 
0000:ff:0d.3 Class 0880: 8086:2feb 
0000:ff:0d.4 Class 0880: 8086:2fec 
0000:ff:0d.5 Class 0880: 8086:2fed 
0000:ff:0f.0 Class 0880: 8086:2ff8 
0000:ff:0f.1 Class 0880: 8086:2ff9 
0000:ff:0f.2 Class 0880: 8086:2ffa 
0000:ff:0f.3 Class 0880: 8086:2ffb 
0000:ff:0f.4 Class 0880: 8086:2ffc 
0000:ff:0f.5 Class 0880: 8086:2ffd 
0000:ff:0f.6 Class 0880: 8086:2ffe 
0000:ff:10.0 Class 0880: 8086:2f1d 
0000:ff:10.1 Class 1101: 8086:2f34 
0000:ff:10.5 Class 0880: 8086:2f1e 
0000:ff:10.6 Class 1101: 8086:2f7d 
0000:ff:10.7 Class 0880: 8086:2f1f 
0000:ff:12.0 Class 0880: 8086:2fa0 
0000:ff:12.1 Class 1101: 8086:2f30 
0000:ff:12.4 Class 0880: 8086:2f60 
0000:ff:12.5 Class 1101: 8086:2f38 
0000:ff:13.0 Class 0880: 8086:2fa8 
0000:ff:13.1 Class 0880: 8086:2f71 
0000:ff:13.2 Class 0880: 8086:2faa 
0000:ff:13.3 Class 0880: 8086:2fab 
0000:ff:13.6 Class 0880: 8086:2fae 
0000:ff:13.7 Class 0880: 8086:2faf 
0000:ff:14.0 Class 0880: 8086:2fb0 
0000:ff:14.1 Class 0880: 8086:2fb1 
0000:ff:14.2 Class 0880: 8086:2fb2 
0000:ff:14.3 Class 0880: 8086:2fb3 
0000:ff:14.4 Class 0880: 8086:2fbc 
0000:ff:14.5 Class 0880: 8086:2fbd 
0000:ff:14.6 Class 0880: 8086:2fbe 
0000:ff:14.7 Class 0880: 8086:2fbf 
0000:ff:16.0 Class 0880: 8086:2f68 
0000:ff:16.1 Class 0880: 8086:2f79 
0000:ff:16.2 Class 0880: 8086:2f6a 
0000:ff:16.3 Class 0880: 8086:2f6b 
0000:ff:16.6 Class 0880: 8086:2f6e 
0000:ff:16.7 Class 0880: 8086:2f6f 
0000:ff:17.0 Class 0880: 8086:2fd0 
0000:ff:17.1 Class 0880: 8086:2fd1 
0000:ff:17.2 Class 0880: 8086:2fd2 
0000:ff:17.3 Class 0880: 8086:2fd3 
0000:ff:17.4 Class 0880: 8086:2fb8 
0000:ff:17.5 Class 0880: 8086:2fb9 
0000:ff:17.6 Class 0880: 8086:2fba 
0000:ff:17.7 Class 0880: 8086:2fbb 
0000:ff:1e.0 Class 0880: 8086:2f98 
0000:ff:1e.1 Class 0880: 8086:2f99 
0000:ff:1e.2 Class 0880: 8086:2f9a 
0000:ff:1e.3 Class 0880: 8086:2fc0 
0000:ff:1e.4 Class 0880: 8086:2f9c 
0000:ff:1f.0 Class 0880: 8086:2f88 
0000:ff:1f.2 Class 0880: 8086:2f8a

[root@localhost:~] nvidia-smi
Fri May 6 20:03:28 2016
±-----------------------------------------------------+
| NVIDIA-SMI 361.40 Driver Version: 361.40 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 0000:04:00.0 Off | Off |
| N/A 38C P8 24W / 150W | 19MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla M60 On | 0000:05:00.0 Off | Off |
| N/A 34C P8 23W / 150W | 19MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla M60 On | 0000:83:00.0 Off | Off |
| N/A 33C P8 24W / 150W | 19MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla M60 On | 0000:84:00.0 Off | Off |
| N/A 30C P8 23W / 150W | 19MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 4 Tesla M60 On | 0000:87:00.0 Off | Off |
| N/A 33C P8 25W / 150W | 19MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 5 Tesla M60 On | 0000:88:00.0 Off | Off |
| N/A 30C P8 23W / 150W | 19MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |

lspci @ ESXi has probably different output formatting try to modify regexp. My output (XenServer 6.5, not full output, but include K1,K2,K2200, with enabled "Above 4g Decoding", with 64bit mmio BAR, not forced to start @ 2GB (MMIOHBase)):

06:00.0 0300: 10de:0ff2 (rev a1) (prog-if 00 [VGA controller])
	Region 0: Memory at dd000000 (32-bit, non-prefetchable) 
	Region 1: Memory at 380ff0000000 (64-bit, prefetchable) 
	Region 3: Memory at 380ff8000000 (64-bit, prefetchable) 
07:00.0 0300: 10de:0ff2 (rev a1) (prog-if 00 [VGA controller])
	Region 0: Memory at db000000 (32-bit, non-prefetchable) 
	Region 1: Memory at 380fe0000000 (64-bit, prefetchable) 
	Region 3: Memory at 380fe8000000 (64-bit, prefetchable) 
08:00.0 0300: 10de:0ff2 (rev a1) (prog-if 00 [VGA controller])
	Region 0: Memory at d9000000 (32-bit, non-prefetchable) 
	Region 1: Memory at 380fd0000000 (64-bit, prefetchable) 
	Region 3: Memory at 380fd8000000 (64-bit, prefetchable) 
09:00.0 0300: 10de:0ff2 (rev a1) (prog-if 00 [VGA controller])
	Region 0: Memory at d7000000 (32-bit, non-prefetchable) 
	Region 1: Memory at 380fc0000000 (64-bit, prefetchable) 
	Region 3: Memory at 380fc8000000 (64-bit, prefetchable) 
82:00.0 0300: 10de:13ba (rev a2) (prog-if 00 [VGA controller])
	Region 0: Memory at fa000000 (32-bit, non-prefetchable) 
	Region 1: Memory at 381fc0000000 (64-bit, prefetchable) 
	Region 3: Memory at 381fd0000000 (64-bit, prefetchable) 
85:00.0 0302: 10de:11bf (rev a1)
	Region 0: Memory at f8000000 (32-bit, non-prefetchable) 
	Region 1: Memory at 381fe8000000 (64-bit, prefetchable) 
	Region 3: Memory at 381ff0000000 (64-bit, prefetchable) 
86:00.0 0302: 10de:11bf (rev a1)
	Region 0: Memory at f6000000 (32-bit, non-prefetchable) 
	Region 1: Memory at 381fd8000000 (64-bit, prefetchable) 
	Region 3: Memory at 381fe0000000 (64-bit, prefetchable) 

It is expected that your 3 cards has mmio BAR mapped under 4GB (32bit boundary) and the last one over 4GB.

Thank you for taking the time to update everyone on supermicro’s recommendations. Our support org will look to improve the documentation around BIOS needs on MMIO for hypervisors so that your experience and time is used to improve the experience for others.

I’ll update the thread when support write this up.

Thank you,
Rachel