Overview of my system and envirotment:
Ubuntu 18.10 VM running on VMware VSphere 6.7 hypervisor. The host provides one GPU, Nvidia Tesla V100-PCIE-32GB (We do not use PCI passthrough but we use Nvidia vGPU technology). The Ubuntu VM is configured with one "NVIDIA GRID vGPU" device with grid_v100d-16q profile.
Also at Ubuntu VM fresh boot, 1GB (of 16GB) of GPU Memory is always used (in FB). I cannot find a reason (and I can not see running processes). I have read about Processes that may be not listed but I’m very new to "Nvidia vGPU" tecnology and I fear it may be related to tricky misconfiguration.
I like to execute GPU accelerated CUDA processes on this Ubuntu VM, I suppose that ‘used’ memory is not memory available for computations… for this reason I like to reduce FB used space.
Question:
Why is 1GB used with no reason? Where should I look for proofs?
There are sub-questions:
May this memory usage be related to wrong Xorg configurations?
Can be a problem with the selected vGPU profile (16q)?
Regards
Other details about my current host/VM configuration:
vSphere and Ubuntu Linux Nvidia drivers come from NVIDIA-GRID_vSphere-6.7-418.66-418.70-425.31.zip provided package.
Installed Linux driver (filename): NVIDIA-GRID_vSphere-6.7-418.66-418.70-425.31/NVIDIA-Linux-x86_64-418.70-grid.run
Rebooting does not solve the issue, 1GB is always used.
nvidia-smi output for the vSphere host:
:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.66 Driver Version: 418.66 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... On | 00000000:3B:00.0 Off | Off |
| N/A 34C P0 26W / 250W | 16402MiB / 32767MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2338177 C+G XXX-ubuntu-XXX 16352MiB |
+-----------------------------------------------------------------------------+
nvidia-smi output for Guest Ubuntu VM:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.70 Driver Version: 418.70 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID V100D-16Q On | 00000000:02:02.0 Off | N/A |
| N/A N/A P0 N/A / N/A | 1040MiB / 16384MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
The chosen profile for the "NVIDIA GRID vGPU" device is grid_v100d-16q.
This is the configuration in the gridd.conf file of Ubuntu VM (interesting parts):
# Description: Set Feature to be enabled
# Data type: integer
# Possible values:
# 0 => for unlicensed state
# 1 => for GRID vGPU
# 2 => for Quadro Virtual Datacenter Workstation
# 4 => for NVIDIA Virtual Compute Server
# All other values reserved
FeatureType=4
# Description: Parameter to enable or disable Grid Licensing tab in nvidia-settings
# Data type: boolean
# Possible values: TRUE or FALSE, default is FALSE
#EnableUI=TRUE
# Description: Set license borrow period in minutes
# Data type: integer
# Possible values: 10 to 10080 mins(7 days), default is 1440 mins(1 day)
#LicenseInterval=1440
# Description: Set license linger period in minutes
# Data type: integer
# Possible values: 0 to 10080 mins(7 days), default is 0 mins
#LingerInterval=10
I have rebooted the VM lots of times, and cannot found processes that consumes GPU.
For your convenience the full nvidia-smi -q output:
==============NVSMI LOG==============
Timestamp : Wed Jul 3 12:26:42 2019
Driver Version : 418.70
CUDA Version : 10.1
Attached GPUs : 1
GPU 00000000:02:02.0
Product Name : GRID V100D-16Q
Product Brand : Grid
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-0edbdf6b-28c2-11b2-812a-b44a2987914b
Minor Number : 0
VBIOS Version : 00.00.00.00.00
MultiGPU Board : No
Board ID : 0x202
GPU Part Number : N/A
Inforom Version
Image Version : N/A
OEM Object : N/A
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : VGPU
GRID Licensed Product
Product Name : Quadro Virtual Data Center Workstation
License Status : Licensed
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x02
Device : 0x02
Domain : 0x0000
Device Id : 0x1DB610DE
Bus Id : 00000000:02:02.0
Sub System Id : 0x12C310DE
GPU Link Info
PCIe Generation
Max : N/A
Current : N/A
Link Width
Max : N/A
Current : N/A
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : N/A
Replay Number Rollovers : N/A
Tx Throughput : N/A
Rx Throughput : N/A
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons : N/A
FB Memory Usage
Total : 16384 MiB
Used : 1040 MiB
Free : 15344 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 0 MiB
Free : 256 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending : N/A
Temperature
GPU Current Temp : N/A
GPU Shutdown Temp : N/A
GPU Slowdown Temp : N/A
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : N/A
Power Draw : N/A
Power Limit : N/A
Default Power Limit : N/A
Enforced Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 135 MHz
SM : 135 MHz
Memory : 877 MHz
Video : 555 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : N/A
SM : N/A
Memory : N/A
Video : N/A
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None