Is it possible to present multiple vGPU's to a single VM from a Tesla T4 card on ESXi 6.7?

Hi guy’s anyone know why I cant present more than one vGPU to my VM Server 2019 (VM hardware version 15)?

I can only load 1 and the VM will start fine, if I load a second vGPU the VM fails to start with error: Could not initialize plugin ‘/usr/lib64/vmware/plugin/libnvidia-vgx.so’ for vGPU ‘grid_t4-4c’.

Spec background: 1 X Tesla T4 16GB card in Dell vXrail V570, vCenter 6.7 Enterprise plus license running single 64GB Mem VM (server 2019). The GRID driver installed successfully. Host Graphics device is using Shared Direct Vendor shared passthrough graphics.

Host ECC has been disabled. The VM has these two settings on/off makes no difference pciPassthru.use64bitMMIO=TRUE
pciPassthru.64bitMMIOSizeGB=64

Below is the output of a few of our favourite commands.

[root@vxesxi5:~] nvidia-smi -i 00000000:3B:00.0 -e 0
ECC support is already Disabled for GPU 00000000:3B:00.0.
All done.
[root@vxesxi5:~] nvidia-smi
Wed Jul 8 09:56:45 2020
±----------------------------------------------------------------------------+
| NVIDIA-SMI 440.87 Driver Version: 440.87 CUDA Version: N/A |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:3B:00.0 Off | Off |
| N/A 38C P8 17W / 70W | 79MiB / 16383MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+
[root@vxesxi5:~] esxcli software vib list | grep -i nvidia
NVIDIA-VMware_ESXi_6.7_Host_Driver 440.87-1OEM.670.0.0.8169922 NVIDIA VMwareAccepted 2020-07-08
[root@vxesxi5:~] lspci -n | grep 10de
0000:3b:00.0 Class 0302: 10de:1eb8 [vmgfx0]

Any help appreciated, this is doing my head in!! I understood this card could present up to 4 vGPU’s to a single VM.

[root@vxesxi5:~] nvidia-smi -q

==============NVSMI LOG==============

Timestamp : Wed Jul 8 11:36:19 2020
Driver Version : 440.87
CUDA Version : Not Found

Attached GPUs : 1
GPU 00000000:3B:00.0
Product Name : Tesla T4
Product Brand : Tesla
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
Accounting Mode : Enabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 1561120009254
GPU UUID : GPU-166df7a5-7a83-f1ac-bc58-313305b331d5
Minor Number : 0
VBIOS Version : 90.04.38.00.03
MultiGPU Board : No
Board ID : 0x3b00
GPU Part Number : 900-2G183-0100-001
Inforom Version
Image Version : G183.0200.00.02
OEM Object : 1.1
ECC Object : 5.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization Mode : Host VGPU
Host VGPU Mode : Non SR-IOV
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x3B
Device : 0x00
Domain : 0x0000
Device Id : 0x1EB810DE
Bus Id : 00000000:3B:00.0
Sub System Id : 0x12A210DE
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : N/A
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 16383 MiB
Used : 86 MiB
Free : 16297 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Disabled
Pending : Disabled
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending Page Blacklist : No
Temperature
GPU Current Temp : 38 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : 85 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 17.26 W
Power Limit : 70.00 W
Default Power Limit : 70.00 W
Enforced Power Limit : 70.00 W
Min Power Limit : 60.00 W
Max Power Limit : 70.00 W
Clocks
Graphics : 300 MHz
SM : 300 MHz
Memory : 405 MHz
Video : 540 MHz
Applications Clocks
Graphics : 585 MHz
Memory : 5001 MHz
Default Applications Clocks
Graphics : 585 MHz
Memory : 5001 MHz
Max Clocks
Graphics : 1590 MHz
SM : 1590 MHz
Memory : 5001 MHz
Video : 1470 MHz
Max Customer Boost Clocks
Graphics : 1590 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None

Hi SSD

You need to have multiple (physical) GPUs to be able to use Multi-vGPU. So in your case, you’d need more than 1 T4 in your Server. You could then allocate 2, 3, 4 etc … T4-16* Profiles to a single VM.

Even if it were possible, adding more than 1 vGPU from the same physical GPU wouldn’t do anything, as it’s the same physical GPU.

Make sure you’re not confusing having Multiple GPUs on a single VM, vs Multiple VMs on a single GPU …

Regards

MG

Ah ok, I’m used to the GRID K1 cards having multiple GPUs.

So this solution we have bought cant present multiple vGPU’s to a single VM then, so this means we either buy more T4 cards to split up the processing or try to present other VMs to share the GPU, but as you say they will fight for contention/resources if both VM’s are hitting it at the same time correcct?

So sounds like its not the preferred GPU card for this solution then, can you recommend card that would do what I’m asking of it?

Hi MG, so the solution for us is to replace the single Tesla T4 card with 2 X RTX-5000 cards in the Host and simply do a direct pasthrough assignment presenting them to the 1 VM.

Thanks for your assistance to date.

Regards,
SSD