vGPU of 2X RTX8000 on ESX 7.0 - low Utilization

Hi,

i getting low utilization on my test VM’s,

I’m using VM Profile of :

  1. NVIDIA GRID vGPU grid_rtx8000p-4q
  2. NVIDIA GRID vGPU grid_rtx8000p-2q

Cinebench R15 : OpenGL 59 +/- no more !!!
image

Does someone have the same problem ???

NVSMI LOG:

nvidia-smi -q

==============NVSMI LOG==============

Timestamp : Wed Dec 23 09:04:45 2020
Driver Version : 450.89
CUDA Version : Not Found

Attached GPUs : 2
GPU 00000000:25:00.0
Product Name : Quadro RTX 8000
Product Brand : Tesla
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Enabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 1325219085576
GPU UUID : GPU-9c213c49-3b61-10dc-4e1b-d32b2e9c1ae7
Minor Number : 0
VBIOS Version : 90.02.4E.00.03
MultiGPU Board : No
Board ID : 0x2500
GPU Part Number : 900-2G150-0150-030
Inforom Version
Image Version : G150.0231.00.02
OEM Object : 1.1
ECC Object : 5.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization Mode : Host VGPU
Host VGPU Mode : Non SR-IOV
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x25
Device : 0x00
Domain : 0x0000
Device Id : 0x1E7810DE
Bus Id : 00000000:25:00.0
Sub System Id : 0x13D810DE
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : N/A
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 46079 MiB
Used : 5995 MiB
Free : 40084 MiB
BAR1 Memory Usage
Total : 32768 MiB
Used : 45 MiB
Free : 32723 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending Page Blacklist : No
Remapped Rows : N/A
Temperature
GPU Current Temp : 31 C
GPU Shutdown Temp : 87 C
GPU Slowdown Temp : 84 C
GPU Max Operating Temp : 82 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 26.68 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 150.00 W
Max Power Limit : 250.00 W
Clocks
Graphics : 300 MHz
SM : 300 MHz
Memory : 405 MHz
Video : 540 MHz
Applications Clocks
Graphics : 1230 MHz
Memory : 6501 MHz
Default Applications Clocks
Graphics : 1230 MHz
Memory : 6501 MHz
Max Clocks
Graphics : 1620 MHz
SM : 1620 MHz
Memory : 6501 MHz
Video : 1500 MHz
Max Customer Boost Clocks
Graphics : 1620 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 2104744
Type : C+G
Name : GPU-TEST-1
Used GPU Memory : 1900 MiB
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 2105231
Type : C+G
Name : GPU-TEST-2
Used GPU Memory : 1900 MiB
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 2105630
Type : C+G
Name : GPU-TEST-5
Used GPU Memory : 1900 MiB

GPU 00000000:81:00.0
Product Name : Quadro RTX 8000
Product Brand : Tesla
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Enabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 1325219085762
GPU UUID : GPU-20880b45-4e0e-bf33-ee89-9277cfe9e04c
Minor Number : 1
VBIOS Version : 90.02.4E.00.03
MultiGPU Board : No
Board ID : 0x8100
GPU Part Number : 900-2G150-0150-030
Inforom Version
Image Version : G150.0231.00.02
OEM Object : 1.1
ECC Object : 5.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization Mode : Host VGPU
Host VGPU Mode : Non SR-IOV
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x81
Device : 0x00
Domain : 0x0000
Device Id : 0x1E7810DE
Bus Id : 00000000:81:00.0
Sub System Id : 0x13D810DE
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : N/A
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 46079 MiB
Used : 7895 MiB
Free : 38184 MiB
BAR1 Memory Usage
Total : 32768 MiB
Used : 31 MiB
Free : 32737 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 1
Average FPS : 301
Average Latency : 3023
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending Page Blacklist : No
Remapped Rows : N/A
Temperature
GPU Current Temp : 31 C
GPU Shutdown Temp : 87 C
GPU Slowdown Temp : 84 C
GPU Max Operating Temp : 82 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 26.26 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 150.00 W
Max Power Limit : 250.00 W
Clocks
Graphics : 300 MHz
SM : 300 MHz
Memory : 405 MHz
Video : 540 MHz
Applications Clocks
Graphics : 1230 MHz
Memory : 6501 MHz
Default Applications Clocks
Graphics : 1230 MHz
Memory : 6501 MHz
Max Clocks
Graphics : 1620 MHz
SM : 1620 MHz
Memory : 6501 MHz
Video : 1500 MHz
Max Customer Boost Clocks
Graphics : 1620 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 2105606
Type : C+G
Name : GPU-TEST-3
Used GPU Memory : 3800 MiB
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 2105645
Type : C+G
Name : GPU-TEST-4
Used GPU Memory : 3800 MiB

Hi

Either disable the frame rate limiter in the Hypervisor for each VM, or change the Scheduler on the GPU which will automatically remove the FRL.

If you disable it individually for each VM, then remember to enable it again after running your benchmarks.

Regards

MG

1 Like