vGPU management and QoS schduler API (Pascal preemption API) ?

Hello.

NVIDIA Virtual GPU software management SDK” was updated to V2.0 (https://developer.nvidia.com/nvidia-grid-software-management-sdk - grid_nvml_sdk_384.73.tgz).
Documentation is very bad as usual - http://docs.nvidia.com/grid/5.0/grid-management-sdk-user-guide/index.html (for example “nvml_grid.h” is integrated in “nvml.h”, new license “Quadro-Virtual-DWS,5.0” missing, EncoderCapacity() description missing … and referring to NVML r352 from 2015 (http://docs.nvidia.com/deploy/nvml-api/index.html) !).

Questions:

  1. for new "encoder capacity" - nvmlDeviceGetEncoderCapacity(), nvmlVgpuInstanceGetEncoderCapacity(), nvmlVgpuInstanceSetEncoderCapacity()
    1. Is there any table of "macroblocks per second" capacity for different chips GM* and GP* ?
    2. Is it dependent on frequency (including power saving modes and frequency boosts) ?
  2. missing in "nvml.h" but exported from library:
    1. nvmlDeviceGetMPSComputeRunningProcesses() ?
    2. nvmlDeviceGetVgpuMetadata() ?
    3. nvmlVgpuInstanceGetMetadata() ?
    4. nvmlGetVgpuCompatibility() ?
  3. for vGPU schedulers "Best Effort Scheduler", "Equal Share Scheduler" (pascal only) and "Fixed Share Scheduler" (pascal only) (in doc http://docs.nvidia.com/grid/5.0/grid-management-sdk-user-guide/index.html#how-gpu-engine-use-is-reported and https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#changing-vgpu-scheduling-policy) and presented new "QoS scheduler" for Pascal chips:

    1. What do NVidia mean under name "QoS scheduler" (stupids schedulers "Equal Share Scheduler" and/or "Fixed Share Scheduler"?) ?
    2. Is it possible to set "share" value per vGPU process for true "QoS scheduler" (if exists) to guarantee minimum, cap maximum and share remaining with defined ratio ?

Due to Nvidia incapability to answer any question I reply to myself to update “scheduler” (also updated https://gridforums.nvidia.com/default/topic/743/talks-with-the-developers/gpu-scheduler-for-vgpu/).

“… Pascal has a new hardware feature called Preemption that allows Compute on vGPU profiles. Preemption is a feature that allows task Context switching. It gives the GPU the ability to essentially pause and resume a task …”

Now it is clear that NVidia rediscovered wheel - “preemption” in Pascal chip. Welcome to year 1964 ! (see https://en.wikipedia.org/wiki/Computer_multitasking#Preemptive_multitasking). This disclosure explains all pitfalls with vGPU and CUDA in previous chip generations that vGPU paravirtualized driver was unable to force switch SMX/SMM context and heavy depends on guest drivers cooperative multitasking (limited by FRL) and guest operating system. Unbelievable, shame, shame, shame on NVidia !

Nvidia updated scheduler slides. As expected “QoS” title was removed (the new preemptive schedulers are far away from true QoS). You can use old “Shared/Best Effort/Time Sliced Scheduler” with cooperative multitasking OR you can use “Fixed/Equal Scheduler” with preemptive multitasking and with card performance lost due to “empty/unused” slots. It is not possible to redistribute “unused” slots ! The “slots” per VM should be programmable (like set ratio/share (minimum guaranteed and redistribute unused) and set maximum (capping) !). (Scheduler is chosen by driver parameter (https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#changing-vgpu-scheduling-policy).)

Updated summary (removed “QoS”):

Shared/Best Effort/Time Sliced Scheduler based on cooperative multitasking:

Fixed/Equal Schedulers based on preemptive multitasking with performance lost (“empty/unused slots”!):

Update from GTC-EU-2017:

I’d like an answer number 2: “missing in “nvml.h” but exported from library”