vDWS Output Bandwidth to End User

In a document published by Nvidia titled, “NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION” (searchable file name 185532_Nvidia_Quadro_vDWS_SolutionOverview_NV_US_WEB.pdf) it is stated on Page 3 under heading NVIDIA Quadro vDWS Features that the Maximum Hardware Rendered Display is Four 4K at 4096x2160 resolution.

When that information is processed by the Quadro vDWS and pushed out to the user, whether local or remote, what is the bandwidth of the Four 4K data after it leaves the server? What refresh rate? And how is that data measured… i.e. pps (packets per second)?

You are mixing different things but I am not surprised due to marketing nature of this document.

  • virtual framebuffer - various vGPU profiles allows different number and resolution of virtual framebuffer (see https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#virtual-gpu-types-grid). This is RGBA memory that can be read with NVidia Capture SDK (NVFBC) (see https://developer.nvidia.com/capture-sdk) or Microsoft Desktop Duplication API (DDA). Rendering FPS (60 FPS) is limited by "frame rate limiter" or "vGPU scheduler" due to sharing GPU to multiple vGPU. vGPU has no LOCAL output you need remote framebuffer protocol. (also check vGPU forum https://gridforums.nvidia.com/).
  • remote framebuffer transfer protocol - like Microsoft RDP (including RDP10), VmWare Horizon (including BlastExtreme), Citrix Virtual Apps and Desktops (including HDX), Teradici PCoIP ... Bandwidth (pps) (from 1Mbit/s to 100Mbit/s) depends on resolution, codec, quality and transfer/encoding FPS (1-60 FPS) (not rendering FPS) and usually detected and changed dynamically by protocol to achieve best User eXperience (UX) (usually need 15-30 FPS).
  • hardware accelerated encoding H.264 or HEVC (H.265) - NVidia Video Codec SDK (NVENC) (see https://developer.nvidia.com/nvidia-video-codec-sdk) can be used inside remote framebuffer protocol to get LOW bandwidth, LOW latency and HIGH encoding FPS (search for RDP10, BlastExtreme, HDX ... and "click to photon").

PS1: But be aware. New Turing generation of GPU NVENC is not designed for full VDI acceleration. This is example of available average encoding FPS per VDI with hardware accelerated encoding remote framebuffer protocol (H.264, Low latency High Performance single pass, reference NVENC speeds are taken from NVidia Video Codec SDK 9.0 (NVENC_Application_Note.pdf), GPU clocks from wikipedia, see https://gridforums.nvidia.com/default/topic/8934/):

PS2: If you are “youtube generation” you should check videos from last VDI-releated TeamRGE conference https://www.youtube.com/channel/UCzsbidFeFG1K2tG7x72vtvQ/ (http://www.teamrge.com/teamrge/events/).

NVIDIA finally tell the truth about “Turing” generation encoder (“Pascal” is about 50%-100% faster in low latency scenarios due to two encoders on chip, see https://developer.nvidia.com/nvidia-video-codec-sdk hidden under “Additional Performance Results”):



FYI: Be careful of new “NVIDIA vGPU Software 10” (with Linux 440.43 driver). It is 8k resolution release with “NVIDIA engineered” limits (see https://gridforums.nvidia.com/default/topic/258/nvidia-virtual-gpu-technology/documentation-for-vgpu-configs/post/16127/#16127). There are “changes” in low latency encoder behavior. The problem is with low-bandwidth and low-framerate transfers. For example if you press “any key” the output can be delayed up to 12 frames in decoder (this is 2 seconds with 6 FPS) (tested on RaspebrryPI hardware OMX decoder). This is very bad UX !

Older drivers marks NVenc encoded h264 (with NV_ENC_PRESET_LOW_LATENCY_DEFAULT_GUID) with following SPS/VUI (from video stream analyzer):

...
 <b>num_ref_frames : 1 </b>
 vui_parameters_present_flag : 1 
...
 <b>bitstream_restriction_flag : 1 </b>
   motion_vectors_over_pic_boundaries_flag : 1 
   max_bytes_per_pic_denom : 0 
   max_bits_per_mb_denom : 0 
   log2_max_mv_length_horizontal : 1 
   log2_max_mv_length_vertical : 1 
   num_reorder_frames : 0 
   <b>max_dec_frame_buffering : 1 </b>
...

New drivers with the same binary using NVenc:

...
 <b>num_ref_frames : 3 </b>
 vui_parameters_present_flag : 1 
...
 <b>bitstream_restriction_flag : 0</b> 
   motion_vectors_over_pic_boundaries_flag : 0 
   max_bytes_per_pic_denom : 0 
   max_bits_per_mb_denom : 0 
   log2_max_mv_length_horizontal : 0 
   log2_max_mv_length_vertical : 0 
   num_reorder_frames : 0 
   max_dec_frame_buffering : 0 
...

So decoder is allowed to buffer output of decoded frames. RaspberryPI uses 9-12 additional buffer frames and this is not modifiable with OMX_IndexParamImagePoolSize !

Now you must explicitly enable “bitstreamRestrictionFlag” and set “numRefL0” on encoder side to rollback to old low-latency encoder-decoder behavior (and use headers from new CodecSDK).

...encodeCodecConfig.h264Config.h264VUIParameters.bitstreamRestrictionFlag = 1;
...encodeCodecConfig.h264Config.numRefL0 = NV_ENC_NUM_REF_FRAMES_1;

PF 2020 !