vSphere 6.7, Linux Guest, V100 vGPU and memory used problem

Overview of my system and envirotment:

Ubuntu 18.10 VM running on VMware VSphere 6.7 hypervisor. The host provides one GPU, Nvidia Tesla V100-PCIE-32GB (We do not use PCI passthrough but we use Nvidia vGPU technology). The Ubuntu VM is configured with one "NVIDIA GRID vGPU" device with grid_v100d-16q profile.

Also at Ubuntu VM fresh boot, 1GB (of 16GB) of GPU Memory is always used (in FB). I cannot find a reason (and I can not see running processes). I have read about Processes that may be not listed but I’m very new to “Nvidia vGPU” tecnology and I fear it may be related to tricky misconfiguration.

I like to execute GPU accelerated CUDA processes on this Ubuntu VM, I suppose that ‘used’ memory is not memory available for computations… for this reason I like to reduce FB used space.

Question:

Why is 1GB used with no reason? Where should I look for proofs?

There are sub-questions:

May this memory usage be related to wrong Xorg configurations?
Can be a problem with the selected vGPU profile (16q)?

Regards

Other details about my current host/VM configuration:

vSphere and Ubuntu Linux Nvidia drivers come from NVIDIA-GRID_vSphere-6.7-418.66-418.70-425.31.zip provided package.

Installed Linux driver (filename): NVIDIA-GRID_vSphere-6.7-418.66-418.70-425.31/NVIDIA-Linux-x86_64-418.70-grid.run

Rebooting does not solve the issue, 1GB is always used.

nvidia-smi output for the vSphere host:

:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.66       Driver Version: 418.66       CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  On   | 00000000:3B:00.0 Off |                  Off |
| N/A   34C    P0    26W / 250W |  16402MiB / 32767MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0   2338177    C+G   XXX-ubuntu-XXX                       16352MiB |
+-----------------------------------------------------------------------------+

nvidia-smi output for Guest Ubuntu VM:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.70       Driver Version: 418.70       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID V100D-16Q      On   | 00000000:02:02.0 Off |                  N/A |
| N/A   N/A    P0    N/A /  N/A |   1040MiB / 16384MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

The chosen profile for the "NVIDIA GRID vGPU" device is grid_v100d-16q.

This is the configuration in the gridd.conf file of Ubuntu VM (interesting parts):

# Description: Set Feature to be enabled
# Data type: integer
# Possible values:
#    0 => for unlicensed state
#    1 => for GRID vGPU
#    2 => for Quadro Virtual Datacenter Workstation
#    4 => for NVIDIA vComputeServer
# All other values reserved
FeatureType=4

# Description: Parameter to enable or disable Grid Licensing tab in nvidia-settings
# Data type: boolean
# Possible values: TRUE or FALSE, default is FALSE
#EnableUI=TRUE

# Description: Set license borrow period in minutes
# Data type: integer
# Possible values: 10 to 10080 mins(7 days), default is 1440 mins(1 day)
#LicenseInterval=1440

# Description: Set license linger period in minutes
# Data type: integer
# Possible values: 0 to 10080 mins(7 days), default is 0 mins
#LingerInterval=10

I have rebooted the VM lots of times, and cannot found processes that consumes GPU.

For your convenience the full nvidia-smi -q output:

==============NVSMI LOG==============

Timestamp                           : Wed Jul  3 12:26:42 2019
Driver Version                      : 418.70
CUDA Version                        : 10.1

Attached GPUs                       : 1
GPU 00000000:02:02.0
    Product Name                    : GRID V100D-16Q
    Product Brand                   : Grid
    Display Mode                    : Enabled
    Display Active                  : Disabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : N/A
    GPU UUID                        : GPU-0edbdf6b-28c2-11b2-812a-b44a2987914b
    Minor Number                    : 0
    VBIOS Version                   : 00.00.00.00.00
    MultiGPU Board                  : No
    Board ID                        : 0x202
    GPU Part Number                 : N/A
    Inforom Version
        Image Version               : N/A
        OEM Object                  : N/A
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization mode         : VGPU
    GRID Licensed Product
        Product Name                : Quadro Virtual Data Center Workstation
        License Status              : Licensed
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x02
        Device                      : 0x02
        Domain                      : 0x0000
        Device Id                   : 0x1DB610DE
        Bus Id                      : 00000000:02:02.0
        Sub System Id               : 0x12C310DE
        GPU Link Info
            PCIe Generation
                Max                 : N/A
                Current             : N/A
            Link Width
                Max                 : N/A
                Current             : N/A
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays Since Reset         : N/A
        Replay Number Rollovers     : N/A
        Tx Throughput               : N/A
        Rx Throughput               : N/A
    Fan Speed                       : N/A
    Performance State               : P0
    Clocks Throttle Reasons         : N/A
    FB Memory Usage
        Total                       : 16384 MiB
        Used                        : 1040 MiB
        Free                        : 15344 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 0 MiB
        Free                        : 256 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : N/A
        Pending                     : N/A
    ECC Errors
        Volatile
            Single Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
        Aggregate
            Single Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending                     : N/A
    Temperature
        GPU Current Temp            : N/A
        GPU Shutdown Temp           : N/A
        GPU Slowdown Temp           : N/A
        GPU Max Operating Temp      : N/A
        Memory Current Temp         : N/A
        Memory Max Operating Temp   : N/A
    Power Readings
        Power Management            : N/A
        Power Draw                  : N/A
        Power Limit                 : N/A
        Default Power Limit         : N/A
        Enforced Power Limit        : N/A
        Min Power Limit             : N/A
        Max Power Limit             : N/A
    Clocks
        Graphics                    : 135 MHz
        SM                          : 135 MHz
        Memory                      : 877 MHz
        Video                       : 555 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : N/A
        SM                          : N/A
        Memory                      : N/A
        Video                       : N/A
    Max Customer Boost Clocks
        Graphics                    : N/A
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes                       : None

Why do you think this is an issue?
Operating system itself needs FB to run properly…
I don’t see any kind of issue from your description so would like to better understand. I never tested 18.10 yet as this is not supported at all but I don’t think this is related to 18.10.

[Edit] Quick test with 18.04 shows the same result. 1050MB reserved…

regards
Simon

Thanks,
You are right, the use case is important. In this Ubuntu VM I like to execute CUDA accelerated processes.

I thought that 1GB used in FB, means 1GB less for my GPU accelerated processes. Can CUDA accelerated processes take advantage also of this ‘used’ 1GB space?

Also, does 1GB framebuffer look quite high?
Where, in Ubuntu Linux guest configuration can I look in order to reduce the FB usage?
Regards,

[Edit] Interesting indeed, that you see the same reserved memory with Ubuntu 18.04